Boston University Computer Science Department
Student: Sarah Dubauskas
B.A. Computer Science, Political Science - May 2003
CREW Final Report
Advisor: Margrit Betke
Project Title: "Data Mining of Medical Image Databases"
Project Description and Background from CREW Proposal:
Data mining uses database queries to
search for hidden patterns in data. Little work has been done in searching
medical image databases for hidden patterns [Brodley 1999]. A large number of
computed tomography (CT) scans are produced regularly to follow the 8.2 million
patients with a history of cancer in the US. Lung cancer screening of smokers
is still controversial. If accepted, it would result in an explosion of the
number of chest CT scans to be analyzed.
Preliminary computer-aided diagnosis (CAD) systems have been developed that
attempt to copy the rules that radiologists use in evaluating chest CT scans and
detecting pulmonary nodules. However, a "gold istandard'' for these rules has
not been established. More sophisticated and advanced database and data mining
systems may be able to optimally use the information and knowledge stored in CAD
systems and potentially improve the diagnostic capabilities of radiologists.
We plan to design indexing and data mining algorithms for a database of chest CT
scans. Database searches will be based on spatial and temporal properties of
nodules, such as location, shape, and volumetric changes in consecutive CT
studies. Queries such as "Where are the majority of stable nodules located?"
and "Find a patient with a nodule that has a similar growth pattern" would be
run on the database. These queries may reveal information about the differences
between malignant and benign nodules. Our long-term goal is to discover
properties and characteristics that can be used to assist
physicians in interpreting diagnostic imaging studies.
First Semester Work: Lung Segmentation
For most of the first semester, I
spent a lot of time being "busy", but learning a lot along the way. I took the
Medical Image Processing program currently under development by different
members of the IVC group and used it to segment lungs out of numerous data files
provided to us by New York University. While at first this seemed a simple task,
I encountered numerous obstacles along the way. The MIP program was constantly
under revision and I had to start over many times to get accurate data. When I
finally got these problems soleved, I learned that segmentation wasn't just
about sitting there and pressing the buttons. Many times the program became
"confused", mislabeling the trachea as part of a lung or simply missing a lung
entirely. I then had to change all of the Hounsfield units the program was using
until I got an accurate segmentation. Even when recording what worked best on
the previous frame, this proved to be a long, long process. I was surprised at
how varied the units could be between successive frames. Once an entire header
file was accurately segmented, I converted the data into a text file of the lung
contour.
The above screenshot shows a set of lungs before and after the segmentation
process was run.
References:
M. Ankerst, H.P. Kriegel, Thomas
Seidl, "A Multistep Approach for Shape Similarity Search in Image Databases."
TKDE 10(6): pp. 996-1004, (1998).
M. Betke and J. P. Ko, "Detection of Pulmonary Nodules on CT and Volumetric
Assessment over Time." In C. Taylor and A. Colchester, editors, Proceedings of
the International Conference on Medical Image Computing and Computer-Assisted
Intervention, pp. 245--252, Cambridge, UK, September 1999, Springer-Verlag,
Berlin.
C. Brodley, A. Kak, C. Shyu, J. Dy, "Content-Based Retrieval from Medical Image
Databases: A Synergy of Human Interaction, Machine Learning, and Computer
Vision," Proceedings of the Sixteenth National Conference on Artificial
Intelligence July 18-22, 1999, Orlando, FL, pp. 760-767
G. Kollios, D. Gunopulos, N. Koudas and S. Berchtold, "An Efficient
Approximation Scheme for Data Mining Tasks." Proc. of the 17th IEEE
International Conference on Data Engineering, Heidelberg, Germany, April 2-6,
2001.
E.G.M. Petrakis and C. Faloutsos: "Similarity Searching in Medical Image
Databases", IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 3,
pp. 435-447, May/June 1997.
Second Semester Work: Bat Tracking
Project Description and Background:
Millions of Brazillian free-tailed bats form colonies that feed on enormous quantities of insects each night throughout the summer in south-central Texas. These bats likely constitute one of North America's most important pest control services. The goal is to develop computer vision methods that compute a bat census and thus facilitate quantification of the natural pest control service that millions of bats provide in the summer.
--excerpted from Steve Crampton's ICCV 2003 Submission
The above screenshot shows a typical frame of data as analyzed by Steve Crampton's tracking program: EcoTracker.
These are close up shots of a bat in flight. As you can see, it appears that a bat is changing size and therefore distance from the camera as its wings open and close, making accurate tracking quite difficult. My first task that I undertook was to manually track or "ground truth" numerous bats as they moved from frame to frame. This proved to be more daunting than I anticipated, as the film sequences got quite busy at times and it was hard to even follow bat trajectories with the human eye. I had to restart the program many times, sometimes looking at the same bat over and over again before I was sure I had it right. I saved all of my tracking information into text files like the one shown below:
X | Y | Frame |
314.25 | 80.5 | 295 |
301 | 85.5 | 296 |
287.75 | 87.5 | 297 |
272.75 | 90.25 | 298 |
258.5 | 92.5 | 299 |
243 | 95 | 300 |
228.25 | 99.75 | 301 |
212.25 | 104.75 | 302 |
194.5 | 108.25 | 303 |
177.75 | 112.25 | 304 |
159.5 | 113.5 | 305 |
140.5 | 114.25 | 306 |
126.5 | 116.25 | 307 |
100.75 | 118 | 308 |
85.75 | 122.5 | 309 |
65 | 127 | 310 |
44.25 | 134.5 | 311 |
23.5 | 146.75 | 312 |
3.5 | 156.75 | 313 |
For the data sequences that I did not manually track, I marked key frames that I thought important for future students who may undertake the ground-truthing project to look at.
Right now, I am still working on how
to analyze these text files. I have purchased the O'Reilly book, sed and awk
to help me with this process. Awk is a programming language that allows the
generation of formatted reports; it is essentially a command-line spreadsheet
program. I have studied the basics of writing awk scripts and am currently
trying to design one that will help me combine all of my text files and perform
basic mathematical analysis on them
to help predict the average trajectory of a bat.
I am also trying to modify code that Steve Crampton has written to perform a
linear regression on the bat data we have so far. Once this regression is done
and average trajectory has been computed in some way, it will help to make the
EcoTracker much more efficient.
In February, I had the opportunity to present Schweikard and Glosser's "Robotic Motion Compensation for Respiratory Movement during Radiosurgery" to the weekly IVC Reading Group with two fellow undergraduates. We learned how difficult it is to determine in real time a tumor's spatial position, mostly due to movement caused by patient respiration. The authors of this paper proposed a new method that involves the combination of stereo X-ray internal imaging with external infrared tracking to guide the radiation source, an Accuray Cyberknife®. X-ray images are taken every ten seconds and synchronized by timestamp with real-time infrared images. A motion pattern specific to the patient is then developed and then used to guide the Cyberknife during radiation beam activation.
This meeting gave me a chance to explore a new area of study, as well as work on my presentation skills, something which I know will prove useful in the near future as I take on my first job.
Conclusion:
I have found this past year to be quite a unique experience. Even though I could not work CS585 into my schedule, I have had the opportunity to learn from those around me and see how computer vision techniques can be applied not only in research, but in the world around us. By completing some of my daily tasks, I also feel I became more proficient in using Linux and Unix commands. The IVC reading group gave me a chance to get to know some of the graduate students in the Computer Science department and how they work. Being a graduate student is quite different than being an undergrad, which is something I don't think I quite understood before. The freedom to choose your own projects, set your own timetables, and produce deliverables that you are proud of is something that I would like to take part in some day. This project has renewed my interest in computer science. After working for a year, I plan to look into graduate schools for computer science, rather than pursue an advanced business degree like I always planned. Perhaps I will end up in the world of computer vision again, tackling my own projects and working with new undergraduate students to teach them some of the same things I have learned.