BU CLA CS 835: Seminar in Image and Video Computing --- Class commentary on articles

BU CLA CS 835: Seminar on Image and Video Computing

Class commentary on articles: Shape I



 
 William Klippgen 
 

The paper Grosky and Mehrota suggests an effecient method for finding
the closest match between an extracted feature and set of well-defined
features in a feature index.  The construction of a index tree and the
data-drien approach are reasonable approaches when the number of
objects and features is high.  Since the focus of the paper is on the
index contruction and traversal, it might be unfair to criticize it
for the simple feature representation.

However, there is probably reason to ask if the static implementation
of the index traversals and the lack of support for any learning
mechanisms in the index are weaknesses.  A problem mentioned in the
paper is the case where the similarity measure difference is too small
between the sample feature and the reference feature of two sub-tree
root nodes. It could possibly be solved faster if there was some kind of
correlation techniques guiding the index traversal based on previously
found or hypothesized features.

I am, in other words, not convinced that the similarity measure should
be fully independent from the hypothesized range of possible objects,
or the object space. There is certainly a very interesting issue how
also the sequence of extracted features in the image might give faster
conclusions using the a priori knowledge of the set of previously found
features.  

The paper by Taubin and Cooper, "Object Recognition Based on Moment
(or Algebraic) Invariants", presents a general way for computing an
invariant measure for any 2D curve or 3D surface.  Not only can
objects in an arbitrary position be recognized, but their coordinate
transformations are also given, so that their location and orientation
in the input image can be calclulated.

By treating a number of sub-parts of object, in other words treating
local features, partially occluded objects also can be recognized.  A
combination of a feature index proposed by Grosky and Mehrotra with a
learning or smart traversal mechanism and the invariant representation
scheme with moment invariants would probably prove to be a good object
recognition scheme.

Lars Liden

"Object Recognition Based on Moment (or Algebraic) Invariants"
Taubin and Cooper

	This paper lost me on some of the mathematical formalism (eg I
never quite understood the working of an affice transformation),
however the basic idea of computing moment invariants seemed fairly
interesting.  
	Perhaps the most useful aspect of this method is its
ability to handle the occlusion of objects, by computing moment
invariants for what the authors termed 'interest regions'.  The
eigen-approach examined last week was not able to handle such
occlusion.  As long as one of the interest regions is not in the
occluded area, the authors suggest that a useful level of accuracy is
still possible.  Unfortunately, the authors don't make it clear how
one might choose what should be used as an "interest area".  They only
give the vague suggestion that interest areas should be easily
identified and provide good object discriminatory power.  It is not
clear a priori what these areas might be.
	One difficulty with this approach is that like many others it
does not handle object segmentation.  The objects used in the examples
were thresholded and binary (black or white).  Additionally, my
impression is that this method is MUCH more sensitive to accurate
segmentation than the eigen-approach.  Whereas the eigen-approach
would still be reasonably accurate for not-so-well segmented objects,
it seems the moment invariant approach would quickly lose
applicability as the quality of segmentation decreases.


"Index-Based Object Recognition in Pictorial Data Management"
Grosky and Mehrotra

	This paper was clearly written from the traditional computer
science / artificial intelligence perspective and operates on a
grossly simplified set of image data.  The objective of this paper
turned out to be more or less to find an efficient search algorithm
for a database of objects represented by selected features in a
polygonal approximation of object boundaries.  Unfortunately the
article didn't show a lot of knowledge about difficulties with image
processing as the polygonal feature representation seemed inappropriate.
	As in the previous paper, this method ignores the problem of
achieving accurate image segmentation.  It would seem to be very
sensitive to having an an accurately segmented object to perform
accurately.  On the other hand, this method does handle rotation,
translation and occlusion.

 Gregory Ganarz

Object Recognition Based on Moment (or Algebraic) Invariants by  
G. Taubin and D. Cooper:  This paper proposes a moment based approach, 
meaning that the "features" used in recognition are regions of 
mass concentration (where mass is pixels composing the object).
As an example, a barbell has mass separated from the primary object
center, while a sphere has its mass localized near the
primary center.  These mass moments can be made invariant to view.  By
computing a number of these moments, occluded objects can be recognized with
some success.  One of the difficultiess with this approach is that the
computation of algebraic invariants is computationally expensive.
Also, it is unclear how this method would fare under more realistic
image conditions (textures, uneven lighting, etc.)

Index-Based Object Recognition in Pictorial Data Management by
W. Grosky and R. Mehrotra:  In this paper, the authors use a polygonal
approximation of the boundary of each object.  For each vertex of this
polygon, four features are computed: the internal angle, the distance
from the next vertex, and the x,y coordinates of the vertex.  This
representation is invariant to rotation, translation, and
with a transform, scale.  Oddly, only a few paragraphs of the twenty page
paper are devoted to discussing this representation.  The rest deals
with how to create a tree to search for object matches efficiently.
In biology, I believe a more parallel approach has been taken (as you
learn more, it does not take longer to access data.)  On the topic of
whether the polygon representation is a good one: while it does seem to
do well on the outlines of objects, the polygon approach does not
address how to model internal features such as the eyes on a face.
Furthermore, it does not easily generalize to 3D rotations.


 Shrenik Daftary


Synopsis of Index-Based Object Recognition in Pictorial Data Management by
Grosky and Mehrotra

This paper presents an efficient access method for an image database. 
Most recognition techniques that analyze scenes having touching, or
occluded objects rely on feature based analysis. The process of
identification relies on hypothesis generation and verification. The
generation of hypothesis can be done by different methods - one assume the
identity of the object and determine its location (model by model
approach), which has a high cost of generating the hypothesis.  Another
approach relies on determining which features are present in the object
and comparing it to the list of objects with the same feature. The speed
of these approaches can be enhanced by using binary search trees. 

The collection of features can by organized according to a feature index
tree. One essential element of this tree is a similarity measure between
different features. The tree is a binary tree with the following
properties: 'The model features are stored at the leaf nodes'
	    'Each of the interior nodes contains a feature, called a reference
feature.'
	    'The members of any subtree are more similar to the reference
feature stored at the root of that subtree than to the reference feature
stored at the root of the sibling subtree. 

Such a tree can be constructed using an algorithm that first determines
model features and the finds a reference feature, which will be the root. 
Several methods to find a reference feature are mentioned. Next the
feature collection is sorted into two groups. Next the reference feature
for each of the groups is determined. The subcollections are adjusted such
that the last requirement of a feature tree is met. Finally the last three
steps are recursively applied until the collection has a size of 1. 

The best match algorithm uses a modified binary search which makes use of
the similarity between the features. These algorithms can by best
represented using a flow chart. Each object is compared with the feature
tree and if the similarity to some known feature is above a threshold then
the feature is known. The method of adding a feature is presented, as well
as a method to replace a feature. 

A couple of experiments using the algorithm is presented. The first series
used 11 objects to build up the feature space. If a feature's similarity
to any other similarity is less than .2 the object was placed in both
subtrees. According to the paper objects were successfully recognized even
when occlusion occurred. Finally a method based on feature indexing on a
multiway search tree is presented. 

The problems I saw with this paper are its reliance on purely black &
white models, with mostly sharp edges. There was not too much testing of
curved objects - another possible problem is its use of a specific subset
of objects car mechanic tools, that may not be used for this purpose. In 
general though this method seems like it would be able to find objects of 
similar characteristics.

Synopsis of Object Recognition Based on Moment Invariants by Taubin and 
Cooper

This paper presents a method to deal with object recognition that can cope
with arbitrary shaped objects in cluttered environments in arbitrary
positions. The general procedure first computes moment invariants that are
invariant to affine transformations of the data set. The techniques
presented a computationally efficient method to identify objects based on
best fitting polynomial analysis of the object, which provides computation
of algebraic invariants. The premise of this technique is to provide a
method that is completely independent of the viewer coordinate system. 

The first definition presented in this paper is one based on moments which
first provides the normalized moment which is the integral of Phi(x) x the
density function over the integral of the density function. This technique
rids M of the dependence on viewer distance. A similar method for moment
calculation is presented using summation and is presented for sparser sampled
objects. Next the definition of centered moments is presented. Finally 
moment invariants are defined which are functions of the moments of a 
shape independent of the coordinate system.

Next the need to find a sufficient number of invariants is mentioned,
along with the need to determine the effects of affine transformations. 
Lexicographical order is presented for multiindices. Finally a definition
of the matrix associated with a multiindex is presented for both vectors
M[d] and matrices M[j,k]. Next a definition is presented that deals with
coordinate transformation of the same object. The polynomials are expanded
into their Taylor series. Forms are defined - forms are sets where each
term has the same power. Next a mapping of the viewpoint parameter matrix
A of the viewer is presented which provides a d-th degree representation
matrix of A. First it is a linear representation, next transposition is
preserved, it is lower triangular, and finally the determinant is
equivalent to the matrix A raised to the m = (C(m+d-1,n-1)) power. Next
the relation of the moments invariants and algebraic invariants is
presented. 

The computation of moment invariants is presented. The first fact
mentioned is that the computation of the determinant of A is in the order
of n^4. The matrix C[j,k] is defined as a function of the moments M, and
C[j,k](M') = A[j]C[j,k](M)A[k]t is defined as the covariant matrix. If 
both C and C' are related in a certain way C will be call contravariant 
or left covariant and right contravariant. If the matrix A is orthogonal 
(Euclidean) then the matrices will be contravariant and A[d] will be 
orthogonal as well. Euclidean covariant matrices can be constructed by 
multiplying other Euclidean covariant matrices together. Next a technique 
to deal with more complex affine moment invariants is presented. 

Next methods to normalize a shape with respect to affine transformation 
is presented. The center has already been defined. The orientation in the 
Euclidean case corresponds to one of the 2^n orthonormal sets which 
diagonalizes a symmetric nxn covariant matrix. This simply provides a 
technique to calculate an orientation matrix. Next a covariant vector is 
computed. In the affine case, since there are an infinite number of 
nonsingular matrices that diagonalize the symmetric matrix, the process 
of coordinate system determination is different.

The application of these procedures is shown in the next part of the
paper, which mentions that the simplest way to use invariants is to
compute them for an entire object. Problems may occur if segmentation is
not possible, or if the object is occluded. The alternative of using
interest regions is presented. This provides the ability to deal with
occlusion. 

The actual implementation of the process for affine shapes involves the
following steps - compute the center, compute the second degree moments,
find the lower triangular matrix such that M'=LMLt=I, which is computed in
two steps (find the lower triangular matrix such that RRt=M and then
invert it to satisfy the above condition)

The benefits of the techniques presented in this paper are its ability to
use few invariants, and the ability to use a polynomial curve to fit an
object by using standard matrix computation techniques. Problems with this
technique are its inability to deal with 3d rotation, its use of non-color
models, and the relatively large distance of the camera from the object.
This method does provide the ability to identify objects that are 
rotated, scaled, or translated.

 John Isidoro 


The first reading (Mehrotra and Gary) was a reasonably simple introduction
to shape recognition.. The explanation of global vs local properties and 
data driven vs model-based approaches served as a good introduction to 
the next two papers.

The next paper (Grosky and Mehrotra) was about using a data driven approach 
to shape indexing which is the feature index tree..  I think this concept is 
powerful, but has one main weakness,  it assumes the image you are analyzing
contains one of the shapes contained in the feature index tree.

The Tubain and Cooper was mathematically overpowering..  However, the concept
of contructing a set of properties of a shape that are invariant to 
affine transforms is a very useful one.. However I think that there might be 
other ways to find such properties.

I'm kind of surprised that there was no information on using histograms in 
shape analysis.  I think that an angle or/and line segment length histograms
for shapes represented in vector form could be equally as powerful of a tool
as Moment Invarients.  Another possibility is arc radius histograms for 
shapes represented as curves.

 John Petry 

INDEX-BASED OBJECT RECOGNITION IN PICTORIAL DATA MANAGEMENT,
 ___________________________________________________________
 by Grosky and Mehrotra
 
 The authors suggest a data-driven method of organizing and searching images
 rather than the traditional model-driven method.  In the latter case, an
 image is searched for each model instance or key model feature.  This is
 computationally quite expensive at runtime, especially as the list of
 possible models grows.
 
 Instead, they recommend running a small number (perhaps only one) feature
 detector on the image, then searching for matches between each detected
 feature and a binary tree of features found in the set of models.  For each
 feature at a node in the tree there exists a list of models and locations
 containing instances of the feature.  
 
 Once sufficient features from the image have been compared with this tree,
 a viable set of candidate models for the object in the image exist.  These
 candidates can be tested more exhaustively against the image to determine
 which if any matches the object.
 
 This is certainly a fast approach, assuming the feature detector can
 accurately run on the image at high speed.  To help with accuracy, the
 authors' tests involved presegmenting the image by binarizing it and by
 limiting its content to pretrained objects.  However, they did permit
 multiple, occluding objects in the scene.
 
 They divided their models into a single feature space (interior angle of a
 polygon at a vertex combined with the distance to the next clockwise node).
 It appears that this approach would work as well with a combination of
 feature spaces, depending on the application.  For instance, if object
 overlap and scale were not concerns, models could be indexed into separate
 trees based on ratio of principal moments, perimeter, and area.  Each new
 object inspected could compute these values and three separate tree
 searches performed.  So in that sense, it's easy to expand the approach.
 
 One limitation of the method is its handling of measurement accuracy.  They
 say there are two approaches: either search down both branches of a tree if
 the nodes on both sides are close to the value of the search key, or
 duplicate information on both sides of a branch when values are within some
 limit.  The latter suggestion strikes me as strange, since it violates the
 authors' fundamental data-driven approach.  When searching, the data should
 properly be thought to consist of the key and a tolerance (maximum
 measurement inaccuracy).  Instead, the authors' would embed the tolerance
 in the feature tree, effectively hard-coding it.  The former suggestion,
 searching both branches from a node when the subnodes are similar fits
 their paradigm better.  This is effectively the same as doing two searches,
 one on  and one on  and considering all
 nodes between the two that are found to be valid candidates.  This is
 conceptually somewhat simpler than what they argue for, though it can
 obviously be up to twice as slow.  (I'm not sure if it can be faster).
 
 Their basic design approach is good, in that it permits offline computation
 of model data in order to enhance realtime performance.  I think they go
 too far in their example, in that they mix a particular embodiment of their
 approach (identifying binarized scale-invariant rigid polygons by principal
 angle/distance nodes) with their theory.  I'd rather see more discussion of
 the data-driven organization itself: for instance, how about alternatives
 to the binary tree, eg. hash tables?
 
 This is a reasonable approach if a universal feature test exists (that is,
 one that can be applied to any image shown to the system), and where the
 scores of models against that feature span a range, and not just a binary
 found / not found result.  Also, the number of feature measurements
 performed should be small compared to the number of models in the system.
 Finally, the feature test itself must be known well in advance, in order to
 organize the feature space appropriately.
 
 
 OBJECT RECOGNITION BASED ON MOMENT, by Taubin and Cooper
 __________________________________
 
 There is less to say about this paper, since the bulk of it is mathematical
 in nature.  Essentially, the authors have an improved method of computing
 invariant object moments based on the edge pixels of a 2-D object, or the
 surface pixels of a 3-D object.  Since most objects are not truly 2-D, the
 effective constraint is that the camera-to-subject distance be much greater
 than the height of the object.
 
 This is a fast and straightforward method once the source pixels (edge or
 surface) have been determined.  This determination may not be trivial,
 however.  It assumes the object has been segmented.
 
 Also, the system claims to be able to handle occlusion by running on the
 moments of sub-elements of objects (which again requires a certain
 segmentation) or by having a separate process interpolate/extrapolate
 missing pixels (quite a challenge).
 
 Within the authors' parameters, though, it seems like a very practical
 approach.



Stan Sclaroff

Created:  Oct 1, 1995

Last Modified: