CAS CS 585 Image and Video Computing - Spring 2024

Project Ideas

  1. Gesture Recognition or Gesture Interaction System

    Extend your A2 work substantially, e.g., but adding more gestures, collecting an extensive dataset that you could share publicly, by developing interesting expressive outputs, etc.

  2. Face Expression Analysis of 2024 Presidential Election Candidates

    Use OpenFace to evaluate the facial expressions of presidential candidates during debates or in commercials. Since OpenFace solves a lot of the computer vision for you, you need to think about what your contributions could be. Here are some ideas. Select just one of these, decide to do a couple, or come up with your own ideas.

    1. OpenFace provides per-frame information. Focus on how to accumulate the information (per shot, per 30 seconds, per question answered) so that you can analyze large video datasets.
    2. Detect situations where OpenFace fails. It would be important information for us to know the percentage of time (and which type of footage) when OpenFace fails to analyze the candidate expressions correctly during a debate.
    3. Can you extend OpenFace to do an analysis of what candidates look at? How often do candidates look into the camera versus towards the audience/other candidate/moderator?
    4. Can you add hand gesture detection to your output and combine it in a meaningful way?
    5. In commercials, analyze foreground/background color schemes. If it is a negative/positive commercial, what facial expressions does the candidate make?

  3. Car Pose Estimation

    A Multi-View Car Dataset contains car images acquired at a car show. They were taken as the cars were rotating on a platform and cover the whole 360 degree range with a sample every 3 to 4 degrees. There are around 2000 images in the dataset belonging to 20 very different car models. Using the first 10 sequences for training purposes and the rest for testing purposes, please try to design an algorithm to estimate the rotation angle of a car given an image of it in the test set.

  4. Single or Multiview 3D Pose Recognition of Animals or People

    Join our research team for specific ideas and to access data. One subproject is developing a model that can help correct physical exercises in home-based physical therapy if they are not performed right.

  5. Mitigating Bias of Generative AI

    Our research group has been analyzing DALL-E 3 and other models for gender and racial biases and proposed solutions (prompt engineering, creation of new dataset). You could contribute by developing your own dataset and analysis tools.

  6. Camera Mouse -- Helping People with Motion Disabilities

    The Camera Mouse is a computer-vison interface for people who cannot use their hands or voice to communicate, type, browse the web, or play games. Our old implementation is used all over the world: www.cameramouse.org. Our new implementation is being tested by users with severe motion disabilities this semester. You could help design user-specific interaction schemes. Here's the link to the new Camera Mouse: Github page. Our current implementation uses the ResNet SSD face detector from OpenCV to find the user's face in the video frame and uses the nose tip as a location for mapping the mouse pointer to screen coordinates. It tracks the subimage that contains the nose tip using template matching with the normalized correlation coefficient as the match function. If the software believes the nose tip is lost, i.e., some other facial feature is tracked by mistake, a "re-initialization" of the mouse pointer to the nose tip, which is detected by the face detector, occurs. This re-initialization makes the mouse pointer jump to the point on the screen that matches the location of the nose tip. This jump is often confusing and unpleasant to the user, as it interrupts normal interaction.

    Specific projects ideas:

  7. 3D Reconstruction of Monuments from Multi-view Imagery

    An archeology professor at BU is interested in software that creates a 3D visualization of monuments. A local example is the Bunker Hill Monument. "Structure from motion" is a technique in computer vision that enables you to reconstruct an object in 3D from 2D video. Your task would be to apply existing code on a local monument:
    Bundler
    PMVS

  8. Participate in a Kaggle Competition: Plant Trait Prediction

    There are interesting datasets to be found on the Kaggle website. A current competition asks users to predict plant traits. The dataset has 20 million images of plants, and 30,000 have been labeled for training.

  9. Document Image Analysis for Arabic and Chinese

    Help our project team to develop tools for interpreting languages that are not written with Latin font.

  10. Computer Vision Analysis of Paintings

    Help art historians to analyze paintings. You can process a painter's color scheme, recognize objects in paintings, analyze location/orientation of faces, etc. Prolific painters may change their style throughout their life time. We will have access to a large collection of paintings, and we would collaborate with a BU art history professor in this exciting project. Here is an example image database: Titian paintings.

  11. Document Image Analysis: The International Ikea Catalog Collection

    A BU economy professor has collected Ikea catalogs from all over the world and from over two daces to evaluate pricing strategies of this successful company. Use the document image analysis software Tesseract on the Ikea dataset, which will provide you with character recognition. Your task is to recognize objects in the photographs (e.g., with ImageNet) and connect your results with the pricing text.

  12. Detection and Tracking of 3D Face Location and Orientation from a Two Cameras

    Use OpenFace (see above) or other face detectors on two synchronized video streams and match up the interpretation. This requires camera calibration and spatial data association.

  13. Blind Person Assistant

    Develop a computer vision system that helps a blind person analyze images to support their everyday life (thermostat readings, cooking instructions on packages, or pregnancy tests).

  14. Segmentation of Living Cells in Microscope Images

    In our research, we recorded hundreds of images of living cells in time-lapse phase-contrast microscopy video. You could help automate the interpretation of these vast amounts of data, which is too time-consuming, costly, and prone to human error when done by hand. You could try to develop a segmentation method based on adaptive thresholding and active contours. You could also try variations of the basic active contour algorithm that we discussed in class. We have made a library of cell images publicly available: BU Biomedical Image Library (BU-BIL) The library contains "ground-truth" segmentations against which you can compare your results.

  15. Registration of Lungs in Computed Tomography Scans

    The lungs deform during inspiration and expiration. Modeling the deformations is of clinical interest as it facilitates the diagnosis and treatment of lung disease. For example, lung cancer is often treated with radiotherapy. Research in medical image analysis has focused on methods to determine the position of a tumor that moves with respiration during radiation treatment and thus reduce the amount of undesirable radiation to surrounding healthy tissue. Here is a paper that describes rigid-body registration of lung surfaces: M. Betke, H. Hong, D. Thomas, C. Prince, J. P. Ko, "Landmark Detection in the Chest and Registration of Lung Surfaces with an Application to Nodule Registration." Medical Image Analysis, 7:3, pp. 265-281, September 2003. pdf. The goal of the project is to extend this work to deformable registration.

  16. Tracking Large Groups of Flying Bats

    Censusing natural populations of bats is important for understanding the ecological and economic impact of these animals on terrestrial ecosystems. It is challenging to census bats accurately, since they emerge in large numbers at night from their day-time roosting sites. We have used infrared thermal cameras to record Brazilian free-tailed bats in California, Massachusetts, New Mexico, and Texas and have developed an automated image analysis system that detects, tracks, and counts the emerging bats.

    The traditional detection and tracking methods we have developed could use some updating with modern techniques. We have hours of data that has not been analyzed by any method.

  17. American Sign Language Recognition

    Develop a system that can detect facial features motion used in American Sign Language (ASL) communications. For example, an eyebrow raise is an important grammatical tool in American Sign Language to indicate a question. BU's ASL data is available from three view points. You could use SignStream's ASL video annotations that include events such as "eyebrow raises," which were manually determined by linguists and can serve as your "ground truth."

  18. License Plate Recognition

    Develop a character recognition program to recognize the letters and numbers on license plates.


Margrit Betke, Professor
Computer Science Department
Email: betke@cs.bu.edu