Boston University Computer Science Department


Student: Sarah Dubauskas

B.A. Computer Science, Political Science - May 2003

CREW Final Report

Advisor: Margrit Betke


Project Title: "Data Mining of Medical Image Databases"

Project Description and Background from CREW Proposal:

Data mining uses database queries to search for hidden patterns in data.  Little work has been done in searching medical image databases for hidden patterns [Brodley 1999].  A large number of computed tomography (CT) scans are produced regularly to follow the 8.2 million patients with a history of cancer in the US.  Lung cancer screening of smokers is still controversial.  If accepted, it would result in an explosion of the number of chest CT scans to be analyzed.

Preliminary computer-aided diagnosis (CAD) systems have been developed that attempt to copy the rules that radiologists use in evaluating chest CT scans and detecting pulmonary nodules.  However, a "gold istandard'' for these rules has not been established.  More sophisticated and advanced database and data mining systems may be able to optimally use the information and knowledge stored in CAD systems and potentially improve the diagnostic capabilities of radiologists.

We plan to design indexing and data mining algorithms for a database of chest CT scans.  Database searches will be based on spatial and temporal properties of nodules, such as location, shape, and volumetric changes in consecutive CT studies.  Queries such as "Where are the majority of stable nodules located?"  and "Find a patient with a nodule that has a similar growth pattern" would be run on the database.  These queries may reveal information about the differences between malignant and benign nodules.  Our long-term goal is to discover properties and characteristics that can be used to assist
physicians in interpreting diagnostic imaging studies.

First Semester Work:  Lung Segmentation


For most of the first semester, I spent a lot of time being "busy", but learning a lot along the way. I took the Medical Image Processing program currently under development by different members of the IVC group and used it to segment lungs out of numerous data files provided to us by New York University. While at first this seemed a simple task, I encountered numerous obstacles along the way. The MIP program was constantly under revision and I had to start over many times to get accurate data. When I finally got these problems soleved, I learned that segmentation wasn't just about sitting there and pressing the buttons. Many times the program became "confused", mislabeling the trachea as part of a lung or simply missing a lung entirely. I then had to change all of the Hounsfield units the program was using until I got an accurate segmentation. Even when recording what worked best on the previous frame, this proved to be a long, long process. I was surprised at how varied the units could be between successive frames. Once an entire header file was accurately segmented, I converted the data into a text file of the lung contour.

The above screenshot shows a set of lungs before and after the segmentation process was run.


M. Ankerst, H.P. Kriegel, Thomas Seidl, "A Multistep Approach for Shape Similarity Search in Image Databases." TKDE 10(6): pp. 996-1004, (1998).

M. Betke and J. P. Ko, "Detection of Pulmonary Nodules on CT and Volumetric Assessment over Time." In C. Taylor and A. Colchester, editors, Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 245--252, Cambridge, UK, September 1999, Springer-Verlag, Berlin.

C. Brodley, A. Kak, C. Shyu, J. Dy, "Content-Based Retrieval from Medical Image Databases: A Synergy of Human Interaction, Machine Learning, and Computer Vision," Proceedings of the Sixteenth National Conference on Artificial Intelligence July 18-22, 1999, Orlando, FL, pp.  760-767

G. Kollios, D. Gunopulos, N. Koudas and S. Berchtold, "An Efficient Approximation Scheme for Data Mining Tasks." Proc. of the 17th IEEE International Conference on Data Engineering, Heidelberg, Germany, April 2-6, 2001.

E.G.M. Petrakis and C. Faloutsos: "Similarity Searching in Medical Image Databases", IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 3, pp. 435-447, May/June 1997.


Second Semester Work: Bat Tracking

Project Description and Background:

Millions of Brazillian free-tailed bats form colonies that feed on enormous quantities of insects each night throughout the summer in south-central Texas. These bats likely constitute one of North America's most important pest control services. The goal is to develop computer vision methods that compute a bat census and thus facilitate quantification of the natural pest control service that millions of bats provide in the summer.

    --excerpted from Steve Crampton's ICCV 2003 Submission

The above screenshot shows a typical frame of data as analyzed by Steve Crampton's tracking program: EcoTracker.


These are close up shots of a bat in flight. As you can see, it appears that a bat is changing size and therefore distance from the camera as its wings open and close, making accurate tracking quite difficult. My first task that I undertook was to manually track or "ground truth" numerous bats as they moved from frame to frame. This proved to be more daunting than I anticipated, as the film sequences got quite busy at times and it was hard to even follow bat trajectories with the human eye. I had to restart the program many times, sometimes looking at the same bat over and over again before I was sure I had it right. I saved all of my tracking information into text files like the one shown below:

X Y Frame
314.25 80.5 295
301 85.5 296
287.75 87.5 297
272.75 90.25 298
258.5 92.5 299
243 95 300
228.25 99.75 301
212.25 104.75 302
194.5 108.25 303
177.75 112.25 304
159.5 113.5 305
140.5 114.25 306
126.5 116.25 307
100.75 118 308
85.75 122.5 309
65 127 310
44.25 134.5 311
23.5 146.75 312
3.5 156.75 313

For the data sequences that I did not manually track, I marked key frames that I thought important for future students who may undertake the ground-truthing project to look at.

Right now, I am still working on how to analyze these text files. I have purchased the O'Reilly book, sed and awk to help me with this process. Awk is a programming language that allows the generation of formatted reports; it is essentially a command-line spreadsheet program. I have studied the basics of writing awk scripts and am currently trying to design one that will help me combine all of my text files and perform basic mathematical analysis on them
to help predict the average trajectory of a bat.

I am also trying to modify code that Steve Crampton has written to perform a linear regression on the bat data we have so far. Once this regression is done and average trajectory has been computed in some way, it will help to make the EcoTracker much more efficient.


In February, I had the opportunity to present Schweikard and Glosser's "Robotic Motion Compensation for Respiratory Movement during Radiosurgery" to the weekly IVC Reading Group with two fellow undergraduates.  We learned how difficult it is to determine in real time a tumor's spatial position, mostly due to movement caused by patient respiration.  The authors of this paper proposed a new method that involves the combination of stereo X-ray internal imaging with external infrared tracking to guide the radiation source, an Accuray Cyberknife.  X-ray images are taken every ten seconds and synchronized by timestamp with real-time infrared images.  A motion pattern specific to the patient is then developed and then used to guide the Cyberknife during radiation beam activation.


This meeting gave me a chance to explore a new area of study, as well as work on my presentation skills, something which I know will prove useful in the near future as I take on my first job. 




I have found this past year to be quite a unique experience. Even though I could not work CS585 into my schedule, I have had the opportunity to learn from those around me and see how computer vision techniques can be applied not only in research, but in the world around us. By completing some of my daily tasks, I also feel I became more proficient in using Linux and Unix commands. The IVC reading group gave me a chance to get to know some of the graduate students in the Computer Science department and how they work. Being a graduate student is quite different than being an undergrad, which is something I don't think I quite understood before. The freedom to choose your own projects, set your own timetables, and produce deliverables that you are proud of is something that I would like to take part in some day. This project has renewed my interest in computer science. After working for a year, I plan to look into graduate schools for computer science, rather than pursue an advanced business degree like I always planned. Perhaps I will end up in the world of computer vision again, tackling my own projects and working with new undergraduate students to teach them some of the same things I have learned.