A Multi-View Car Dataset contains car images acquired at a car show. They were taken as the cars were rotating on a platform and cover the whole 360 degree range with a sample every 3 to 4 degrees. There are around 2000 images in the dataset belonging to 20 very different car models. Using the first 10 sequences for training purposes and the rest for testing purposes, please try to design an algorithm to estimate the rotation angle of a car given an image of it in the test set.
Use OpenFace to evaluate the facial expressions of presidential candidates during debates or in commercials. Since OpenFace solves a lot of the computer vision for you, you need to think about what your contributions could be. Here are some ideas. Select just one of these, decide to do a couple, or come up with your own ideas.
My research team is interested in learning how to improve learning software by visually monitoring a student. For example, if the student looks frustrated, the software is supposed to give hints. Our collaborators have collected video data for children. Does OpenFace work as well on children's faces as on adults? Address one or several of the tasks listed above for the presidential debate analysis (1.-3.).
This is a new research collaboration with a neuroscientist at BU who records neural activity in the brain of mice and rats while they forage for food. The goal of the computer vision project would be to collect some videos in the neuro lab and then develop a 2D Kalman filter that tracks the animal head and body regions in the video while the animal is moving through the experimental space.
The Camera Mouse is a computer-vison interface for people who cannot use their hands or voice to communicate, type, browse the web, or play games. Our current Windows implementation is used all over the world: www.cameramouse.org. I regularly receive requests from care givers and users who need it for the iPad or Mac. One of my PhD students, Andrew Kurauchi, started implementing a platform-independent version. The code is almost ready to be tested on a Mac and iPad. The goal of the course project would be to finalize the implementation with Andrew's help, and then test it, so that it can be made available to users who depend on assistive technology.
An archeology professor at BU is interested in software that creates a 3D
visualization of monuments. A local example is the Bunker Hill Monument.
"Structure from motion" is a technique in computer vision that enables you to
reconstruct an object in 3D from 2D video. Your task would be to apply existing
code on a local monument:
Help art historians to analyze paintings. You can process a painter's color scheme, recognize objects in paintings, analyze location/orientation of faces, etc. Prolific painters may change their style throughout their life time. We will have access to a large collection of paintings, and we would collaborate with a BU art history professor in this exciting project. Here is an example image database: Titian paintings.
A BU economy professor has collected Ikea catalogs from all over the world and from over two daces to evaluate pricing strategies of this successful company. Use the document image analysis software Tesseract on the Ikea dataset, which will provide you with character recognition. Your task is to recognize objects in the photographs (e.g., with ImageNet) and connect your results with the pricing text.
Use OpenFace (see above) or other face detectors on two synchronized video streams and match up the interpretation. This requires camera calibration and spatial data association.
Develop a computer vision system that helps a blind person analyze images to support their everyday life (thermostat readings, cooking instructions on packages, or pregnancy tests).
In our research, we recorded hundreds of images of living cells in time-lapse phase-contrast microscopy video. You could help automate the interpretation of these vast amounts of data, which is too time-consuming, costly, and prone to human error when done by hand. You could try to develop a segmentation method based on adaptive thresholding and active contours. You could also try variations of the basic active contour algorithm that we discussed in class. We have made a library of cell images publicly available: BU Biomedical Image Library (BU-BIL) The library contains "ground-truth" segmentations against which you can compare your results.
For this project, you are asked to implement a system that would follow the movements of a living cell in time-lapse phase-contrast microscopy video. A first task would be to track the position of the centroid of the cell. Another task would be to analyze the changes of the shape of the cell, following the analysis discussed in class that uses axes of first intertia. An exciting extra step would be to register the outline of the cell from image to image, using deformable models or non-rigid registration techniques discussed in class. For data, look at Cell Tracking Dataset.
The lungs deform during inspiration and expiration. Modeling the deformations is of clinical interest as it facilitates the diagnosis and treatment of lung disease. For example, lung cancer is often treated with radiotherapy. Research in medical image analysis has focused on methods to determine the position of a tumor that moves with respiration during radiation treatment and thus reduce the amount of undesirable radiation to surrounding healthy tissue. Here is a paper that describes rigid-body registration of lung surfaces: M. Betke, H. Hong, D. Thomas, C. Prince, J. P. Ko, "Landmark Detection in the Chest and Registration of Lung Surfaces with an Application to Nodule Registration." Medical Image Analysis, 7:3, pp. 265-281, September 2003. pdf. The goal of the project is to extend this work to deformable registration.
Implement a method to recognize different hand shapes or gestures for human-computer interaction, or simply tracks hands for behavior understanding.
Please see our webpage on Thermal Video Analysis of Bats. Censusing natural populations of bats is important for understanding the ecological and economic impact of these animals on terrestrial ecosystems. It is challenging to census bats accurately, since they emerge in large numbers at night from their day-time roosting sites. We have used infrared thermal cameras to record Brazilian free-tailed bats in California, Massachusetts, New Mexico, and Texas and have developed an automated image analysis system that detects, tracks, and counts the emerging bats.
We need help in improving our detection and tracking methods. In particular, we would like to automatically evaluate the shape of flying bats. Using the material we covered in CS 585 on thresholding, segmentation, and active contour methods, this project would build a system that analyzes the shapes of flying bats.
Please see our webpage on Thermal Video Analysis of Bats. This project uses thermal videos of wind turbines. The goal is to detect the bats and birds that might be flying by. Bats and birds have been killed by wind turbines, and maybe there is a way to help wildlife avoid the deadly blades. Here is an example of a sequence of thermal videos that you could analyze for this project (the blade is moving upwards, the bat is not hit):
Develop a system that can detect facial features and/or their motion, for example, eyebrow raises, in video sequences of American Sign Language (ASL) communications. An eyebrow raise is an important grammatical tool in American Sign Language to indicate a question. You could built upon a system that was developed by one of my students in CS 585. The ASL data is available from three view points:
You could use SignStream's ASL video annotations that include events such as "eyebrow raises," which were manually determined by linguists and can serve as your "ground truth." Check out a previous CS 585 course project: T. Castelli, M. Betke, C. Neidle, "Facial feature tracking and occlusion recovery in American Sign Language." In A. Fred and A. Lourenço, editors, Pattern Recognition in Information Systems: Proceedings of the 6th International Workshop on Pattern Recogntion in Information Systems - PRIS 2006, pages 81-90, Paphos, Cyprus, May 2006. INSTICC Press. pdf. See also Technical Report BU-CS-2005-024.
Develop a people tracking program. Videotape people walking on campus (you can use our cameras). Can you automatically detect which moving "blobs" are people and not cars, bikes, dogs etc? Could you apply your method to improve airport security? You may reimplement: P. K. TrakulPong and R. Bowden. "A real time adaptive visual surveillance system for tracking low-resolution colour targets in dynamically changing scenes,"Image and Vision Computing, Volume 21, Issue 10, September 2003, Pages 913-929. pdf. There are some very interesting newer papers. Please ask me for references.
Images often must be resized to fit in PDA or cell phone displays. If the aspect ratio changes, image content may be deformed or important objects in the image may be cropped. For example, if image A must be resized into a 100-pixel height image, a simple downsampling algorithm would result in image B. A content-aware resizing algorithm would produce image C instead. If the original image must be resized into 200x150-pixel image, a simple downsampling algorithm would deform the image content and yield image D. A content-aware resizing algorithm would produce image E instead.
A B C D E
For ideas how to approach the problem of designing a context-aware resizing algorithm, you may read the 2007 SIGGRAPH paper by Avidan and Shamir or the 2008 SIGGRAPH paper by Rubinstein et al. about seam craving.
The goal of this project is to develop a vision-based system that detects if a driver is falling asleep behind the wheel. You may take one of our video cameras and film a friend behind the wheel. Try to detect the blinking eyes of a driver using image differencing techniques. Can you detect the difference between "normal" blinking and closing the eyes for a longer period of time? Test this only when the car is parked! The challenge of this project is to detect eye closures under various lighting conditions. So park your car in various locations and capture images at different times of the day and in various weather situations. Please ask me for references.
Driver Face Analysis for Intelligent Vehicles
The car industry is curious to find out as much as possible about driver behavior. What does a driver focus on while driving? How long does a driver look at the dashboard or the rear mirror? The goal is to design smarter, safer cars. Drivers are videotaped in simulators and in the real world. The videos are then annotated by hand in a painstaking process. Automation of this process is needed, because it would allow a larger test population. In addition, automatic recognition of driver intentions or mistakes may trigger safety features in our future "intelligent cars." The goal of this project is to implement a system that automatically analyzes the driver's face. The system should detect and track facial features. This would be a good group project, since it can be combined with the "Head Tracker" and "Warning System for Tired Drivers" projects. Please ask me for references.
You would reimplement an optical flow algorithm, for example, Horn and Schunk's algorithm (see textbook or journal paper). To test the algorithm, you should set up some experiments. Move the camera or the objects and try to recover the direction of motion.
This would be a good programming project for anyone who is interested in computer vision and computer graphics and enjoys geometry. The goal is to take images from a hand-held video camera and stitch them seamlessly into a panoramic mosaic. Image mosaics are needed in virtual reality designs and for video conferencing. You could try to come up with your own algorithm or reimplement an existing algorithm, for example, H.-Y. Shum's and R. Szeliski's algorithms (ICCV'98, ICCV'99).
Here's a simple way to embed a secret message in an digital image: Convert the message into a string of zeros and ones. Assume the message contains 50 one-bit letters. Use the first 50 pixels in your image and, for each pixel, substitute the lowest bit of the pixel gray level with a letter of the message. Our eyes are not sensitive enough to notice that the image changed. Implement and test this algorithm. Find another, more sophisticated method. See, for example, a paper by Farid. This project requires knowledge in cryptography.
Develop a character recognition program to recognize the letters and numbers on license plates. You can treat the characters as binary images and use correlation, Euler number, and/or thinning techniques for recognition. You may want to simplify the problem by only trying to recognize certain fonts or just digits.
Use two images of an object to compute the 3D "structure" of the object. If you set up your own camera system, try to keep the camera geometry simple, so that the epipolar lines are parallel to the image rows. You can then search along the epipolar lines to find corresponding points in the two images. You may also use the stereo images provided by Scharstein. Illustrate the results of your algorithm on a few examples. Warning: We may use this project as the last homework programming assignment. If you choose this project, you will have to demonstrate that your work significantly improves upon your homework solution.
|Margrit Betke, Professor Computer Science Department Email: email@example.com|