BU CLA CS 835: Seminar in Image and Video Computing --- Class commentary on articles

BU CLA CS 835: Seminar on Image and Video Computing

Class commentary on articles: Shape II

Paul Dell

A. Evans, N. Thacker, and J. Mayhew. "The Use of Geometric Histograms for
Model-Based Object Recognition." Proc. Brithish Machine Vision Conference,
429--438, 1993.

A shape recognition system of rigid object in 2D grey scale images is
presented. The system utilizes geometric histograms measuring the relative
angle and perpendicular distance of various features in the object. An
advantage of this feature based system is that partially occluded objects can
be identified and that this system can utilize a parallel algorithm. The
matching algorithm requires only "simple array multiplication".

There is a very limited amount of experimental data presented in the paper, so
it is difficult to judge the real life performance of the system. A few
shortcommings are that only edge features are measured so no internal features,
greyscale or otherwise, is compaired. To detect the edges of the object, the
background is solid and is at a fairly high contrast to the object. It would
be interesting to see how the edge detection would work if the objects were of
varrying color (or shades of grey) and the background close to the color of
some of the objects.

One last criticism is that there was no cost performace data presented. Even
though this is a parallel alg. there was little mention of the cost of the
computations.

S. Sclaroff. "Deformable Prototypes for Encoding Shape Categories in Image
Databases." Submitted to Pattern Regonition, special issue on image databses.
Boston University TR95-017 September 1995.

(did not have time to read, I will before next class)

William Klippgen

The paper "The Use of Geometric Histograms for Model-Based Object
Recognition" by Evans, Thacker and Mayhew introduces a histogram-based
approach for representing shape properties.

The geometric relationships used in the representation are edge angles
and the minimum and maximum distance between line segments. The
interesting point with the representation is that a histogram is made
for every edge where the edge in question is made the base line. By
doing this, the method is claimed to be gracefully degrading under
occlusion. The other noteworthy feature of the scheme is that broken
edges (due to occlusion, noise or various lightning condintions) will
be represented very similar to the complete edge due to the histogram
properties.

As long as objects are both modeled and captured from approximately
the same perspective, this method seems to be robust by making use of
local properties quite insensitive to fragmentation and occlusion.

The matching method is very primitive and tries to match all
recognised features with all modelled features in a so-called
"parallel matching strategy". While the matching lacks any kind of
feature organization, the authors suggest a pure hardware
implementation of histogram correlation in parallel.

The method seems to work well under a controlled environment where the
viewing persepctive and object orientation can be controlled. Still,
I find that the system demonstration results could have been more
thouroughly presented and their graceful degradation claim better
documented under various conditions.

Stan Sclaroff's paper, "Deformable Prototypes for Encoding Shape
Categories in Image Databases", takes a more realistic approach in
general shape detection. The suggested modeling of various poses and
deformations of an object or a class of objects defines matching as
projecting a sample shape into the "prototype space". This projection
simultaenously seeks to recognize the object and its deformation.

The approach called "modal matching" starts with modeling a set of
shapes of an object based on a set of feature point locations. The
points are used as nodes in building a finite element model of the
shape.

A subset of significant feature points should be chosen to reduce the
workload. By finding the "modes of free vibration" of the model set,
an orthogonal object-centered coordinate system of eigenvectors is
found. Each feature point can then be described by its participation
in the various modes of deformation.

By comparing two sets of feature vectors, we will find strong
correspondece for two corresponding feature points.

Faced with a large number of object poses and deforamtions (shapes),
there is a need to represent the object as a small number of
"characteristic views". By selecting a few representative protoypes
for each category. Each shape in the database is then alligned with
each of the prototype, and the strain needed to transform itself to
the prototype is stored.

Given the feature point matching, the paper presents a method for
finding the modal deformation parameters.

The similarity measure is the amount of "strain" needed to transform a
shape A to shape B. The measure disregards translation and rotation
by not taking into account the rigid body motion modes.

The main advantages with this method seems to be that its similarity measure equals the human one. The various modes of deformation probably correspond to human perception's partitioning of shape variation. It is also "selectively" insensitive to camera viewpoints as it is insensitive to affine transformations.

It would be interesting to see how a combination of image content recognition (eigenface computation, texture, colur-histogram), motion matching (motion characteristics like speed, direction etc.) and image shape recognition could work together to provide for a generalized image recognition tool. By using the feature point matching in image content analysis, the matching of content (i.e. with eigenfaces) could possibly improve.

Lars Liden

"The Use of Geometric Histograms for Model-Based Object Recognition"
Evans, Thacker & Mayhew

This paper struck me as a very interesting approach to shape
recognition and the geometric histogram appears to be very robust to
occlusion of objects. Two issues some to mind which aren't clear to
me and weren't discussed in the paper. First, I'm not clear as to
whether this method requires segmentation for the recognition of
objects (obviously the models must be formed from segmented objects).
The occlusion examples seem to indicate that segmentation isn't
necessary, but it isn't clear that this would be true if the same
objects were placed over a texture such as a checkerboard, or were
themselves formed of textures.
Secondly, although this method handles rotation and
translation of modeled objects with ease, it doesn't seem like the
representational method would be able to handle scale. I also see no
easy way in introducing scale invariance using this method.
Normalization in an image with multiple objects would not achieve
scale invariance. One could use several models for each object at
different scales, but this would be computationally unattractive.
As with other shape based methods using linear and/or
polygonal approximations, this method would not work as well with natural
objects such as grass, hair, trees, etc.
Finally, it is interesting to note that this method is doing
more of a feature extraction than a shape extraction and is critically
effected by the size of the window chosen for measuring the geometric
relationships. For example, it seems to me these two figures would
have very similar histograms as they share similar features:

|--/\/\/\--| |--^+^+^---|
| | | |
|--^+^+^---| |--/\/\/\--|
| | | |
|--^+^+^---| |--/\/\/\--|
| | | |
|--/\/\/\--| |--^+^+^---|

(Hard to show in ascii!!)

"Deformable Prototypes for Encoding Shape Categories"
Scarloff

This paper was original among the shape recognition papers we
have read so far in that it is able to handle non-rigid shapes by
getting a point to point correspondence between an object in question
and prototype objects in a database, morphing one shape into another
and measuring the amount of deformation between the object being
examined and the models in the database.
It isn't clear to me that there is a definite correspondence
between the measurement of deformation and the perceptual distance
between two objects. For (a bad) example, the deformation between two
people, a bald male with a top hat and a female with long curly hair,
and the deformation between a male human and an ape would seem to be
quite different perceptually but not so using this type of deformation
measurement.
It would also be interesting to see a larger database. In the
current database the Red Fin Needle Fish is likely to be closer to the
shape of a missile or a pen than another type of fish. Although the
method would appear to work in most cases, it would seem there are cases
for which it will fail. Perhaps it would work best when combined with
other methods including the weighting of relevant shape features for shape
similarity. For example, a fish should have x-number of fin like
features.

Gregory Ganarz

In the paper "Deformable Prototypes for Encoding Shape Categories in Image
Databases", S. Sclaroff describes a method for image database search which uses deformable prototypes to represent shape categories. While the utility of the
method has been demonstrated on actual image databases, the deformable shape
method suffers from several problems: since the method relies on establishing
point correspondences between the test object and prototype, the method does not handle occlusion or partial objects well. Further, establishing this
correspondense is computationally expensive. Also, the number of strain
computations scales with the number of remembered prototypes. This is in
contrast to human memory speed which seems to be independent of the number of
memories.

In "The Use of Geometric Histograms for Model-Based Object Recognition"
A. Evans et al. propose a method for object recognition which is rotation and
translation invariant. Basically a structural approach, the model encodes
shape by creating "geometric" histograms, which encode both the distance
and angle between edge features of objects. The complexity of this method
scales with the number of edge features. While the model could be implemented
by a neural network, it suffers from a combinatorial explosion when creating
cells sensitive to certain distances and angles between line features.
The model also proposes a large number of geometric histograms, one per encoded
feature. It is unclear whether these could be combined into one geometric map,
and how much information would be lost.

Shrenik Daftary

Synopsis of "The Use of Geometric Histograms for Model-Based Object
Recognition" by Evans et al.

This paper shows a method to recognize and locate multiple RIGID objects
from their 2D projections in grey level images. The authors consider the
case where occlusion may occur either through self occlusion or blocking
by another object. The technique that is presented is based on geometric
relationships within a shape.

Data for the geometric histogram is based on features that help determine
the existence of an object. An example is presented of two line segments
where the two features that are represented is the relative angle between
the objects, and the distance between the two lines (at the maximum and
minimum points). In order to determine complete representations of shape,
a coordinate frame is drawn around each line feature. This is done for all
line features, and geometric relationships are determined for each feature
within a certain radius of the line. This feature-based method is
demonstrated to be robust to a segment blockout of a line, since the
histogram remains essentially the same. Some of the benefits claimed are
redundancy (take up more space than normal representation), locality (good
if not dealing with an object whose ends define it), robustness, and
facility to match.

The matching algorithm for this technique involves first the ability to
generate histograms for each line feature in all images. The histograms
can each be considered vectors and their match to an image feature is the
correlation between the image feature and the model feature. The metric
is scale invariant, and robust against spurious features. The method to
determine the object is done by comparing each of the features of the new
image to that of the model histograms. The closest model is the one with
the highest correlation and a low threshold throws out this closest model
if the correlation is too low. This technique allows the matching of only
a few of the matches in order to determine object existence, and
determine the presence of novel objects when a feature correlation falls
below a threshold.

This system demonstrated excellent functioning in their test set; however
questions of its performance can be raised when unknown objects of
similar geometric characteristics to known objects are put in. Also there
are times when the relationship between distant points is significant.
Finally though this technique seems to be good for many cases when the
object is hidden with something in the middle. This technique would be
more useful if combined with a database of multiple viewpoints - at least
2/3 to cope with different angles. This would lead to a huge memory
requirement though.

Synopsis for ~Deformable Prototypes for Encoding Shape Categories in
Image Databases" by Sclaroff

The paper begins by describing morphing, and mentioning the technique of
modal matching that allows users to specify a few example shapes for the
computer to sort based on their similarity. The requirement for the
view-based parameterization of a shape are prototype views, point
correspondences between the new shape and the prototype views, and a
method to measure deformation.

Modal matching is a method that:
* determines point correspondences using an energy-based model
* warps of morphs one shape into another using energy-based interpolants
* measures the amount of deformation between an objects' shape and
prototype views

A new image is then compared to potential shapes in terms of distance to a
prototype similarity metric. This technique allows the use of perceptual
and semantic information that is discarded by invariant statistics, and
also allows comparison for non-rigid objects.

This technique use a finite element model that does not require a priori
parameterization of the images. The FEM used provides interpolation that
reduces problems from poor sampling. The modal representation is described
in detailed in terms of the mass, damping, and stiffness matrices.

The new modal representation avoids the a priori use of a single prototype
object, and instead uses the data to define the object's deformability.
The technique works for each object by:

determining FEM mass and stiffness matrices
solving generalized eigenvalue problem
matching low-order nonrigid modes for both shapes
Using the matched modes as coordinate system

When the number of feature points exceeds a threshold the process can be
sped up by using a lower resolution FEM. Once two images are matched up,
the point correspondences can be used to determine the deformations
required to morph one image to the other (that is determine the modal
deformation parameters that take the set of points from one image to the
corresponding points in the other image).

Data representation using this technique is compressed since only a few
characteristic views are needed to represent the data. The relative
distance between a new object and the current views can be determined,
but a metric space distance between objects needs to be used instead of
strain energy which fails the symmetry axiom of metric space. A different
metric that uses areas as a scaling factor is introduced.

The decision of modal prototype shapes is discussed, in terms of both
human selection and an untested automated version. Next an alternative
method is presented that uses low-order mode vectors of both the new
shape and the prototype to compute the distance.

The technique was tested on series of tools, fishes, and rabbits. It
appeared to work well in these cases, since the highest matches at least
concurred with my perceptions. It is most beneficial compared to other
systems when an object is moving or changing shape. Parts of the limitations
with this system are the of controlled tests for the object. Although it
is suggested that objects would be identified in cases of minor occlusion
- the amount of the object that can be hidden is not mentioned. This
system like the author mentioned can be combined with other metrics in
order to get a more complete tool.

John Petry

THE USE OF GEOMETRIC HISTOGRAMS FOR MODEL-BASED OBJECT RECOGNITION,
__________________________________________________________________
by Evans, Thacker and Mayhew

This is a simple and elegant approach to a bin-picking application.
It shares properties of a general Hough transform, in that it relates
fixed object edges to each other. Unlike Hough, it is rotation independent.

It appears to share the limitations of Hough as well, namely that it is
extremely dependent on object rigidity and is susceptible to noise. It also
doesn't scale well; the modification mentioned whereby the splat size is
increased doesn't work beyond a narrow range.

It is good to see that it can tolerate the fragmentation of line segments,
since that is probably a frequent occurence. Curves would probably cause
it much more problem, though, unlike Hough which only looks at collections
of individual edge pixels.

It looks like it might be computationally expensive for complex shapes in
busy scenes.

DEFORMABLE PROTOTYPES FOR ENCODING SHAPE CATEGORIES IN IMAGE DATABASES,
______________________________________________________________________
by Sclaroff

This is a very interesting technique that differs from most of the others
we've seen so far. In particular, it's ability to handle natural
deformations is quite useful, as is the scale invariance and the absence
of any a priori shape assumptions.

As with almost all other techniques to date, it requires segmentation
before running. It also doesn't work well if the object is partially
occluded, which is a severe drawback if the database isn't well-constrained.

I'm puzzled about the way the feature points are selected. I can understand
how a user would choose them, or how they could be automatically selected
by the system. The latter approach seems to limit the algorithm to edge
points, since interior points could well be due to shadows or non-repeatable
noise. But how are the model feature points selected? If I understand it
correctly, they should correspond to the object points -- but what makes
that so? Either I'm missing an important detail, or it has been left out.

This technique seems limited to 2-D objects, in that rotation of an object
out of the image plane will produce a very different figure, which will
not match its original model. That is, the rotation will manifest itself
as a deformation, which it really isn't; the images is deformed, but not the
object. It also appears limited to edge pixels, since I don't see how it can
reliably choose interior feature points as mentioned above.

I have to admit this is one of the few techniques that can easily solve
the example cases presented in the paper, other than perhaps the eigen-object
approach. Unlike that approach, this one does not make use of internal
intensity patterns, but it's undoubtedly better at matching figures based on
their outline.



Stan Sclaroff

Created:  Oct 1, 1995

Last Modified: