ImageRover Approach

1. General Approach

ImageRover is an image content navigation tool for the world wide web that unifies visual and textual statistics associated with HTML documents. To gather HTML documents expediently, the collection subsystem utilizes a distributed fleet of WWW robots running on different computers. The robots gather information about the images they find, computing the appropriate image decompositions and textual associations. The extracted information is then stored in vector form for searches based on image and text content. At search time, users can iteratively guide the search through the selection of relevant examples. The system employs a novel relevance feedback algorithm that selects the distance metrics appropriate for a particular query.

2. Document Collection Subsystem

The document collection subsystem utilizes a distributed fleet of WWW robots running on different computers. These robots can be run on a number of computers at a single site (as has been the case in the development of our initial system) or across a number of geographically-distributed computers at volunteer sites.

$HTML Document Collection \ Subsystem Diagram$ HTML Document Collection Subsystem Diagram

Robots can contain gathering modules, and digestion modules. The gathering modules recursively parse and traverse WWW documents, collecting images. The digestion modules then process these documents to extract needed image indexing information and to compute a reduced resolution thumbnail image. The textual statistics are then associated with each image by processing the containing HTML document. The robots are dispatched and coordinated via a separate coordination layer, which also manages updates of the image index database.

2.1 Image Digestion

Each image digestion module processes an input stream of image URLs. A reduced resolution image thumbnail is computed for use as an icon during search. With preprocessing completed, the digestor then executes a series of analysis submodules that calculate information about the distributions of visual, textual or other properties associated with the image. Each submodule except for the text processor computes distributions over N subimages. In the current implementation, N = 6.

The current implementation of the system includes submodules for analysis of color, orientation and word associations. The following is a brief overview of these modules. For more details, readers are directed to [1].

Color Analysis
Image color histograms are computed in the CIE L*u*v* color space, which has been shown to correspond closely to the human perception of color. The color distribution is quantized into 64 bins (4 for each axis) using a histogram method.
Texture Orientation Analysis
The texture direction distribution is calculated using steerable pyramids. For this application, a steerable pyramid of four levels was found to be sufficient. At each level, texture direction and strength at each pixel is calculated using the outputs of seven X-Y separable, steerable quadrature pair basis filters. Orientation histograms are then computed for each level in the pyramid.
Text Analysis
The text present in the URL containing a specific image is extracted and indexed by latent semantic indexing. Words appearing in the URL are variably weighted according to their proximity to the image under consideration and their importance in that document. A weighted word frequency histogram is created for each image.

3. Image Query Subsystem

The image feature vectors stored by the robots have rather high dimension. As a preliminary step, it is therefore useful to perform a dimensionality reduction via a principal components analysis (PCA) for each of the image subvector spaces. Similarly all word frequency histograms are also subject to dimensionality reduction by latent semantic indexing.

Query Server Subsystem Diagram

3.1 Query Server

The image query subsystem is based on a client-server architecture. At startup, the server first performs a dimensionality reduction on the visual feature vectors. The LSI vectors are computed for each image and are just appended to the visual feature vectors. Once initialized, the index server runs as a process separate from the database query server, possibly on a different computer. For each query, a client connects to the server to send the query data and then waits for the resulting k nearest neighbors.

3.2 Relevance Feedback

The ImageRover system employs a relevance feedback algorithm that selects appropriate L-m Minkowski distance metrics on the fly. The resulting relevance feedback mechanism allows the user to perform queries by example based on more than one sample image. The user can collect the images he or she finds during the search, refining the result at each iteration. The main idea consists of giving more importance to the elements of the feature vectors with the lowest variances.

4. References

ImageRover: A Content-Based Image Browser for the World Wide Web
S. Sclaroff, L. Taycher, and M. La Cascia
in Proc. IEEE Workshop on Content-based Access of Image and Video Libraries, June 1997.
Image Digestion and Relevance Feedback in the ImageRover WWW Search Engine,
Taycher, L., M. La Cascia, and Sclaroff, S.,
in Proc. Visual 97, December, 1997.
Combinining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web
M. La Cascia, S. Sethi, and S. Sclaroff
in Proc. IEEE Workshop on Content-based Access of Image and Video Libraries, June 1998.
A Quantitative Evaluation of Image Search Engine Performance When Textual and Visual Cues are Combined
M. La Cascia, S. Sethi, and S. Sclaroff
submitted to Proc. IEEE Workshop on Applications of Computer Vision, October, 1998.

[ ImageRover Home | Contact | IVC Home ]

Last Modified: April 20, 1998