CAS CS 585 Image and Video Computing

CAS CS 585 Image and Video Computing - Spring 2024

Project Ideas

Gesture Recognition or Gesture Interaction System
Extend your A2 work substantially, e.g., but adding more gestures, collecting an extensive dataset that you could share publicly, by developing interesting expressive outputs, etc.
Face Expression Analysis of 2024 Presidential Election Candidates
Use OpenFace to evaluate the facial expressions of presidential candidates during debates or in commercials. Since OpenFace solves a lot of the computer vision for you, you need to think about what your contributions could be. Here are some ideas. Select just one of these, decide to do a couple, or come up with your own ideas.
1. OpenFace provides per-frame information. Focus on how to accumulate the information (per shot, per 30 seconds, per question answered) so that you can analyze large video datasets.
2. Detect situations where OpenFace fails. It would be important information for us to know the percentage of time (and which type of footage) when OpenFace fails to analyze the candidate expressions correctly during a debate.
3. Can you extend OpenFace to do an analysis of what candidates look at? How often do candidates look into the camera versus towards the audience/other candidate/moderator?
4. Can you add hand gesture detection to your output and combine it in a meaningful way?
5. In commercials, analyze foreground/background color schemes. If it is a negative/positive commercial, what facial expressions does the candidate make?
Car Pose Estimation
A Multi-View Car Dataset contains car images acquired at a car show. They were taken as the cars were rotating on a platform and cover the whole 360 degree range with a sample every 3 to 4 degrees. There are around 2000 images in the dataset belonging to 20 very different car models. Using the first 10 sequences for training purposes and the rest for testing purposes, please try to design an algorithm to estimate the rotation angle of a car given an image of it in the test set.
Single or Multiview 3D Pose Recognition of Animals or People
Join our research team for specific ideas and to access data. One subproject is developing a model that can help correct physical exercises in home-based physical therapy if they are not performed right.
Mitigating Bias of Generative AI
Our research group has been analyzing DALL-E 3 and other models for gender and racial biases and proposed solutions (prompt engineering, creation of new dataset). You could contribute by developing your own dataset and analysis tools.
Camera Mouse -- Helping People with Motion Disabilities
The Camera Mouse is a computer-vison interface for people who cannot use their hands or voice to communicate, type, browse the web, or play games. Our old implementation is used all over the world: www.cameramouse.org. Our new implementation is being tested by users with severe motion disabilities this semester. You could help design user-specific interaction schemes. Here's the link to the new Camera Mouse: Github page. Our current implementation uses the ResNet SSD face detector from OpenCV to find the user's face in the video frame and uses the nose tip as a location for mapping the mouse pointer to screen coordinates. It tracks the subimage that contains the nose tip using template matching with the normalized correlation coefficient as the match function. If the software believes the nose tip is lost, i.e., some other facial feature is tracked by mistake, a "re-initialization" of the mouse pointer to the nose tip, which is detected by the face detector, occurs. This re-initialization makes the mouse pointer jump to the point on the screen that matches the location of the nose tip. This jump is often confusing and unpleasant to the user, as it interrupts normal interaction.
Specific projects ideas:
- Change the Camera Mouse code to visualize the full face of the user with the landmarks overlaid and to store this visualization video. Then run experiments with different users, asking them to click through the 10 targets of our test interface. Check how often re-initializations occur, particularly when the user moves their head up to reach the top corners of the screen. View the stored videos and check whether the landmarks track the face properly. Try to figure out what might go wrong when lots of re-initializations occur. Maybe a Bayesian approach would help, where the previous movement direction is helping with the nose detection in the current frame, stabilizing the tracking.
- Re-initialization may occur when the tracked subimage has drifted too far from its initial start, e.g., drifted to the user's cheek or up the nose. To prevent this, you could require that the initial subimage has to have high-information content, as measured with the Fisher information matrix (i.e., sum of brightness differences in x and y, see formula in lecture notes). Another ideas is to track more than one subimage at a time and combine the results to map the mouse pointer. Maybe the reinitializations would be less jarring if the mouse pointer was not allowed to jump large distances. A large screen distance comes from the distance between the currently tracked and initially tracked positions. This could correspond to a drifted location maybe at the top of the nose and the tip of the nose. So what if the tracked position was forced back down the nose, i.e., "reach" the desired nose tip in small steps.
- Search for and compare different face and facial landmark detectors. Can they run in real time on a regular laptop? Which one is most suited for the Camera Mouse interface?
- Users with multiple sclerosis (MS) often overshoot click-targets on the screen. Can you track the face "normally" during movements of the pointer to the target, but change the tracker when the user approaches a target? Maybe approaching the target can be anticipated based on the facial movements and gaze. If so, maybe the problem of overshooting could be alleviated by slowing down the conversion from facial feature to screen location.
- The current Camera Mouse uses a 2D mapping: The X,Y coordinates of a facial feature are mapped to the x,y coordinates of the mouse pointer on the screen. However, facial features have X,Y,Z coordinates, and our heads rotate up/down and sideways, rather than translate. Can you find a model that provides 3D facial feature locations in real time, and then design a mapping into 2D screen coordinates that maybe allows for more "natural" movements and interaction?
3D Reconstruction of Monuments from Multi-view Imagery
An archeology professor at BU is interested in software that creates a 3D visualization of monuments. A local example is the Bunker Hill Monument. "Structure from motion" is a technique in computer vision that enables you to reconstruct an object in 3D from 2D video. Your task would be to apply existing code on a local monument:
Bundler
PMVS
Participate in a Kaggle Competition: Plant Trait Prediction
There are interesting datasets to be found on the Kaggle website. A current competition asks users to predict plant traits. The dataset has 20 million images of plants, and 30,000 have been labeled for training.
Document Image Analysis for Arabic and Chinese
Help our project team to develop tools for interpreting languages that are not written with Latin font.
Computer Vision Analysis of Paintings
Help art historians to analyze paintings. You can process a painter's color scheme, recognize objects in paintings, analyze location/orientation of faces, etc. Prolific painters may change their style throughout their life time. We will have access to a large collection of paintings, and we would collaborate with a BU art history professor in this exciting project. Here is an example image database: Titian paintings.
Document Image Analysis: The International Ikea Catalog Collection
A BU economy professor has collected Ikea catalogs from all over the world and from over two daces to evaluate pricing strategies of this successful company. Use the document image analysis software Tesseract on the Ikea dataset, which will provide you with character recognition. Your task is to recognize objects in the photographs (e.g., with ImageNet) and connect your results with the pricing text.
Detection and Tracking of 3D Face Location and Orientation from a Two Cameras
Use OpenFace (see above) or other face detectors on two synchronized video streams and match up the interpretation. This requires camera calibration and spatial data association.
Blind Person Assistant
Develop a computer vision system that helps a blind person analyze images to support their everyday life (thermostat readings, cooking instructions on packages, or pregnancy tests).
Segmentation of Living Cells in Microscope Images
In our research, we recorded hundreds of images of living cells in time-lapse phase-contrast microscopy video. You could help automate the interpretation of these vast amounts of data, which is too time-consuming, costly, and prone to human error when done by hand. You could try to develop a segmentation method based on adaptive thresholding and active contours. You could also try variations of the basic active contour algorithm that we discussed in class. We have made a library of cell images publicly available: BU Biomedical Image Library (BU-BIL) The library contains "ground-truth" segmentations against which you can compare your results.
Registration of Lungs in Computed Tomography Scans

The lungs deform during inspiration and expiration. Modeling the deformations is of clinical interest as it facilitates the diagnosis and treatment of lung disease. For example, lung cancer is often treated with radiotherapy. Research in medical image analysis has focused on methods to determine the position of a tumor that moves with respiration during radiation treatment and thus reduce the amount of undesirable radiation to surrounding healthy tissue. Here is a paper that describes rigid-body registration of lung surfaces: M. Betke, H. Hong, D. Thomas, C. Prince, J. P. Ko, "Landmark Detection in the Chest and Registration of Lung Surfaces with an Application to Nodule Registration." Medical Image Analysis, 7:3, pp. 265-281, September 2003. pdf. The goal of the project is to extend this work to deformable registration.
Tracking Large Groups of Flying Bats

Censusing natural populations of bats is important for understanding the ecological and economic impact of these animals on terrestrial ecosystems. It is challenging to census bats accurately, since they emerge in large numbers at night from their day-time roosting sites. We have used infrared thermal cameras to record Brazilian free-tailed bats in California, Massachusetts, New Mexico, and Texas and have developed an automated image analysis system that detects, tracks, and counts the emerging bats.
The traditional detection and tracking methods we have developed could use some updating with modern techniques. We have hours of data that has not been analyzed by any method.
American Sign Language Recognition

Develop a system that can detect facial features motion used in American Sign Language (ASL) communications. For example, an eyebrow raise is an important grammatical tool in American Sign Language to indicate a question. BU's ASL data is available from three view points. You could use SignStream's ASL video annotations that include events such as "eyebrow raises," which were manually determined by linguists and can serve as your "ground truth."
License Plate Recognition

Develop a character recognition program to recognize the letters and numbers on license plates.

Margrit Betke, Professor

Computer Science Department

Email: betke@cs.bu.edu