Author: Vassilis Athitsos
Title: Learning Embeddings for Indexing, Retrieval, and Classification, with Applications to Object and Shape Recognition in Image Databases
Date: June 14, 2006
Abstract:
Nearest neighbor retrieval is the task of identifying, given a
database of objects and a query object, the objects in the database
that are the most similar to the query. Retrieving nearest neighbors
is a necessary component of many practical applications, in fields as
diverse as computer vision, pattern recognition, multimedia databases,
bioinformatics, and computer networks. At the same time, finding
nearest neighbors accurately and efficiently can be challenging,
especially when the database contains a large number of objects, and
when the underlying distance measure is computationally expensive.
This thesis proposes new methods for improving the efficiency and
accuracy of nearest neighbor retrieval and classification in spaces
with computationally expensive distance measures. The proposed methods
are domain-independent, and can be applied in arbitrary spaces,
including non-Euclidean and non-metric spaces. In this thesis
particular emphasis is given to computer vision applications related
to object and shape recognition, where expensive non-Euclidean
distance measures are often needed to achieve high accuracy.
The first contribution of this thesis is the BoostMap algorithm for
embedding arbitrary spaces into a vector space with a computationally
efficient distance measure. Using this approach, an approximate set of
nearest neighbors can be retrieved efficiently - often orders of
magnitude faster than retrieval using the exact distance measure in
the original space. The BoostMap algorithm has two key distinguishing
features with respect to existing embedding methods. First, embedding
construction explicitly maximizes the amount of nearest neighbor
information preserved by the embedding. Second, embedding construction
is treated as a machine learning problem, in contrast to existing
methods that are based on geometric considerations.
The second contribution is a method for constructing query-sensitive
distance measures for the purposes of nearest neighbor retrieval and
classification. In high-dimensional spaces, query-sensitive distance
measures allow for automatic selection of the dimensions that are the
most informative for each specific query object. It is shown
theoretically and experimentally that query-sensitivity increases the
modeling power of embeddings, allowing embeddings to capture a larger
amount of the nearest neighbor structure of the original space.
The third contribution is a method for speeding up nearest neighbor
classification by combining multiple embedding-based nearest neighbor
classifiers in a cascade. In a cascade, computationally efficient
classifiers are used to quickly classify easy cases, and classifiers
that are more computationally expensive and also more accurate are
only applied to objects that are harder to classify. An interesting
property of the proposed cascade method is that, under certain
conditions, classification time actually decreases as the size of the
database increases, a behavior that is in stark contrast to the
behavior of typical nearest neighbor classification systems.
The proposed methods are evaluated experimentally in several different
applications: hand shape recognition, off-line character recognition,
online character recognition, and efficient retrieval of time series.
In all datasets, the proposed methods lead to significant improvements
in accuracy and efficiency compared to existing state-of-the-art
methods. In some datasets, the general-purpose methods introduced in
this thesis even outperform domain-specific methods that have been
custom-designed for such datasets.