Extend your A2 work substantially, e.g., but adding more gestures, collecting an extensive dataset that you could share publicly, by developing interesting expressive outputs, etc.
Use OpenFace to evaluate the facial expressions of presidential candidates during debates or in commercials. Since OpenFace solves a lot of the computer vision for you, you need to think about what your contributions could be. Here are some ideas. Select just one of these, decide to do a couple, or come up with your own ideas.
A Multi-View Car Dataset contains car images acquired at a car show. They were taken as the cars were rotating on a platform and cover the whole 360 degree range with a sample every 3 to 4 degrees. There are around 2000 images in the dataset belonging to 20 very different car models. Using the first 10 sequences for training purposes and the rest for testing purposes, please try to design an algorithm to estimate the rotation angle of a car given an image of it in the test set.
Join our research team for specific ideas and to access data. One subproject is developing a model that can help correct physical exercises in home-based physical therapy if they are not performed right.
Our research group has been analyzing DALL-E 3 and other models for gender and racial biases and proposed solutions (prompt engineering, creation of new dataset). You could contribute by developing your own dataset and analysis tools.
The Camera Mouse is a computer-vison interface for people who cannot use their hands or voice to communicate, type, browse the web, or play games. Our old implementation is used all over the world: www.cameramouse.org. Our new implementation is being tested by users with severe motion disabilities this semester. You could help design user-specific interaction schemes. Here's the link to the new Camera Mouse: Github page. Our current implementation uses the ResNet SSD face detector from OpenCV to find the user's face in the video frame and uses the nose tip as a location for mapping the mouse pointer to screen coordinates. It tracks the subimage that contains the nose tip using template matching with the normalized correlation coefficient as the match function. If the software believes the nose tip is lost, i.e., some other facial feature is tracked by mistake, a "re-initialization" of the mouse pointer to the nose tip, which is detected by the face detector, occurs. This re-initialization makes the mouse pointer jump to the point on the screen that matches the location of the nose tip. This jump is often confusing and unpleasant to the user, as it interrupts normal interaction.
Specific projects ideas:
An archeology professor at BU is interested in software that creates a 3D
visualization of monuments. A local example is the Bunker Hill Monument.
"Structure from motion" is a technique in computer vision that enables you to
reconstruct an object in 3D from 2D video. Your task would be to apply existing
code on a local monument:
Bundler
PMVS
There are interesting datasets to be found on the Kaggle website. A current competition asks users to predict plant traits. The dataset has 20 million images of plants, and 30,000 have been labeled for training.
Help our project team to develop tools for interpreting languages that are not written with Latin font.
Help art historians to analyze paintings. You can process a painter's color scheme, recognize objects in paintings, analyze location/orientation of faces, etc. Prolific painters may change their style throughout their life time. We will have access to a large collection of paintings, and we would collaborate with a BU art history professor in this exciting project. Here is an example image database: Titian paintings.
A BU economy professor has collected Ikea catalogs from all over the world and from over two daces to evaluate pricing strategies of this successful company. Use the document image analysis software Tesseract on the Ikea dataset, which will provide you with character recognition. Your task is to recognize objects in the photographs (e.g., with ImageNet) and connect your results with the pricing text.
Use OpenFace (see above) or other face detectors on two synchronized video streams and match up the interpretation. This requires camera calibration and spatial data association.
Develop a computer vision system that helps a blind person analyze images to support their everyday life (thermostat readings, cooking instructions on packages, or pregnancy tests).
In our research, we recorded hundreds of images of living cells in time-lapse phase-contrast microscopy video. You could help automate the interpretation of these vast amounts of data, which is too time-consuming, costly, and prone to human error when done by hand. You could try to develop a segmentation method based on adaptive thresholding and active contours. You could also try variations of the basic active contour algorithm that we discussed in class. We have made a library of cell images publicly available: BU Biomedical Image Library (BU-BIL) The library contains "ground-truth" segmentations against which you can compare your results.
The lungs deform during inspiration and expiration. Modeling the deformations is of clinical interest as it facilitates the diagnosis and treatment of lung disease. For example, lung cancer is often treated with radiotherapy. Research in medical image analysis has focused on methods to determine the position of a tumor that moves with respiration during radiation treatment and thus reduce the amount of undesirable radiation to surrounding healthy tissue. Here is a paper that describes rigid-body registration of lung surfaces: M. Betke, H. Hong, D. Thomas, C. Prince, J. P. Ko, "Landmark Detection in the Chest and Registration of Lung Surfaces with an Application to Nodule Registration." Medical Image Analysis, 7:3, pp. 265-281, September 2003. pdf. The goal of the project is to extend this work to deformable registration.
Censusing natural populations of bats is important for understanding the ecological and economic impact of these animals on terrestrial ecosystems. It is challenging to census bats accurately, since they emerge in large numbers at night from their day-time roosting sites. We have used infrared thermal cameras to record Brazilian free-tailed bats in California, Massachusetts, New Mexico, and Texas and have developed an automated image analysis system that detects, tracks, and counts the emerging bats.
The traditional detection and tracking methods we have developed could use some updating with modern techniques. We have hours of data that has not been analyzed by any method.
Develop a system that can detect facial features motion used in American Sign Language (ASL) communications. For example, an eyebrow raise is an important grammatical tool in American Sign Language to indicate a question. BU's ASL data is available from three view points. You could use SignStream's ASL video annotations that include events such as "eyebrow raises," which were manually determined by linguists and can serve as your "ground truth."
Develop a character recognition program to recognize the letters and numbers on license plates.
Margrit Betke, Professor Computer Science Department Email: betke@cs.bu.edu |