Introduction
Project Description
Data mining uses database queries to search for hidden patterns in
data. Little work has been done in searching medical image
databases for hidden patterns [Brodley 1999]. A large number of
computed tomography (CT) scans are produced regularly to follow the
8.2 million patients with
a history of cancer in the US. Lung cancer screening of smokers
is still controversial. If accepted it would result in an
explosion of the number of chest CT scans to be analyzed.
Preliminary
computer-aided diagnosis (CAD) systems have been developed that
attempt to copy the rules that radiologists use in evaluating chest CT
scans and detecting pulmonary nodules. However, a ``gold
standard'' for these rules has not been established. More
sophisticated and advanced database and data mining systems may
be able to optimally use
the information and knowledge stored in CAD systems and potentially improve the diagnostic
capabilities of radiologists.
We plan to design
indexing and data mining algorithms for a database of chest CT
scans. Database searches will be based on spatial and temporal properties of nodules,
such as location, shape, and volumetric changes in consecutive CT studies.
Queries such as "Where are the majority of stable nodules
located?" and "Find a patient with a nodule that has a similar
growth pattern" would be run on the database. These queries may reveal information
about the differences between malignant and benign nodules. Our
long-term goal is to discover properties and characteristics that can be
used to assist physicians in interpreting diagnostic imaging
studies.
Methods and Means to be Utilized
·
Transfer of
chest CT images from New York University Medical
Center. Jane P. Ko, MD, is the collaborating radiologist on this
project.
·
Use of image
analysis tools to characterize chest CT images
[Betke 1999, textbooks]
·
Definition
of similarity measure [Petrakis, Faloutsos 1997]
·
Design of
database schema to allow fast retrieval based on
similarity criterion [Ankerst 1998, Petrakis, Faloutsos 1997]
·
Implementation
of and experimentation with clustering and data
mining algorithms [Kollios 2001] (Note that George Kollios is
also a faculty member at Boston
University and interested in
collaborating)
Purpose
Segmentation is the process of dividing an image into meaningful regions. In this project, the goal was to segment out the ribs from a medical image of a Chest CT scan. The first objective to be achieved was to find a way to isolate out the rib cage from the rest of the image, and determine the exact threshold at which only the bone would appear in the image. This would enable it to be possible to find a way to segment out each rib individually. This done, it would be possible to label each rib individually and locate the same rib in each slice of the Chest CT scan. This can be helpful in the over-all goals of the project because by accomplishing the labeling of each rib individually it is possible to locate the positions of other objects in the scan in relation to a specific rib, such as a tumor. This could greatly assist in helping physicians pinpoint objects within the scan, thereby increasing the effectiveness and localization of treatments.
Overview
Thresholding
The first task undertaken was an effort to isolate the ribs in the medical images by using a threshold. Bone has a specific density, and at a certain threshold only bone will show up on the image. In this way, it would be possible to view the images with just the bone visible, and a clear picture of the rib cage would be visible. This in turn would serve as the first step to identifying what role the rib cage plays in the images and the first step in segmenting and identifying each rib as a single entity. Many factors had to be considered when choosing the threshold value. If too high of a threshold was chosen, some of the rib data itself was lost and not all of the ribs were visible on the image. Conversely, choosing a low threshold caused a fair amount of noise to become visible in the image. A program was developed to take in the data after the threshold had been applied, and based on that, output a 3D model of the rib cage.
Figure 1.1 3-D Image of the rib cage as compiled by a data
In this image, the data compiled from a threshold of 1250 was used to create this 3D image of the rib cage. However, there are problems with this approach. As seen in the image, there are numerous instances of unnecessary noise that appears. For example, the two bars in the bottom right and left corners are a direct result of unnecessary noise in the original scan. As explained before, however, the threshold cannot be increased to erase this noise because increasing the threshold will cause some of the ribs themselves not to appear in the image. Another problem with this approach is that many of the ribs appear to not be connected. Taking a look towards the top of the image, one can see that the ribs cage appears patchy and scattered. This poses problems for future attempted to try and segment each rib out individually as any algorithm will read this patchy areas as separate objects. Obviously, a new method had to be devised to counteract the noise problem while at the same time ensuring that the ribs in the image appear connected and smooth.
Erosion/Dilation
Erosion, dilation and their combined uses are ways to add or remove pixels from the boundaries of features in order to smooth them, to join separated portions of features or separate touching features, and to remove isolated pixel noise from the image.
![]() |
![]() |
(a) | (b) |
(b) The result after performing a closing operation.
These methods were used on the binary medical image generated from the original CT scan in an effort to eradicate the noise problem while at the same time improving the connectivity and smoothness. If a suitable structuring element can be found, and a correct sequence of erosion/dilation operations applied, it can be an effective method of filtering images and removing the noise. These basic operations can be combined so as to achieve optimal results. For example, an erosion followed by a dilation using the same structuring element will remove all the pixels in a region which are too small to contain the probe, and it will leave the rest. This is known as an opening. A dilation followed by an erosion will fill the holes in an image that are smaller than the probe. This is referred to as closing. As seen in the closing example above (Figure 1.2), after the closing operation was performed, many of the lines in the image appear smoother, and many of the rub objects are now closed off and connected.
Distance Algorithm and Manual Editing
Though the closing and opening operations performed on the images did assist in reducing the noise and increasing the smoothness of the image, a way still had to be found to segment out each rib individually. A distance algorithm was developed. The algorithm would find objects in the scan by computing the distance between pixels in the slices. The position of one pixel in a certain slice was determined, and based on it’s distance to the surrounding pixels in the slices would indicate whether the pixels were part of the same object or not. In this case, objects are pieces of bone such as the sternum, ribs, and vertebrae. Many problems were encountered with this method. It proved to be difficult to determine when a new rib would start in a slice, often there were noise pixels situated between ribs that would cause different ribs to be identified as one object. Another major problem that showed itself was that in many of the slices, the ribs were in close proximity to the vertebrae and sternum, and the algorithm would identify one object as consisting of a rib and the sternum together. To assess the accuracy of the distance algorithm, the automatic algorithm was compared to a manual editing, where the sternum and vertebrae were removed.
![]() |
![]() |
(a) | (b) |
Figure 1.3 (a) The original binary medical image compiled with data that was filtered with a threshold of 1250 (b) After the manual editing was performed that removed the sternum, vertebrae, and any unnecessary noise from the image
With this comparison two data sets were able to be developed, a ground truth, and an automatic one. As shown in Figure 1.3, manual removal of the sternum, vertebrae, and any additional noise was input into the distance algorithm, enabling the ribs to be individually identified as objects. Later analysis can show the accuracy of the automatic, which would help in determining the steps that need to be taken towards complete rib segmentation.
Results Achieved
Possible Future Research
Determine a method for performing an automatic segmentation of the ribs to as to avoid the problems that the sternum and vertebrae pose.
Achieve rib labeling, where the ribs are labeled and locatable in each slice of the Chest CT scan.
References
M.. Ankerst, H.P. Kriegel, Thomas Seidl, "A Multistep
Approach for
Shape
Similarity Search in Image Databases." TKDE 10(6): pp. 996-1004,
(1998)
M. Betke and J. P. Ko, "Detection of Pulmonary Nodules on CT and
Volumetric Assessment over Time." In C. Taylor and A. Colchester,
editors, Proceedings of the International Conference on Medical Image
Computing and Computer-Assisted Intervention, pp. 245--252, Cambridge,
UK, September 1999, Springer-Verlag, Berlin.
C. Brodley, A. Kak, C. Shyu, J. Dy, "Content-Based Retrieval from
Medical Image Databases: A Synergy of Human Interaction, Machine
Learning, and Computer Vision," Proceedings of the Sixteenth National
Conference on Artificial Intelligence July 18-22, 1999, Orlando, FL,
pp. 760-767
R. Jain,
K. Rangachar and B. G. Schunck. Machine Vision, McGraw-Hill, Inc, New
York, NY, 1995.
G.
Kollios, D. Gunopulos, N. Koudas and S. Berchtold,
"An Efficient
E.G.M. Petrakis and C. Faloutsos: "Similarity
Searching in Medical Image Databases", IEEE Transactions on
Knowledge and Data Engineering, Vol. 9, No. 3, pp. 435-447, May/June
1997.
Approximation Scheme for Data Mining Tasks." Proc. of the 17th IEEE
International Conference on Data Engineering, Heidelberg, Germany,
April 2-6, 2001.