BU CLA CS 835: Seminar on Image and Video Computing

Class commentary on articles: Shape II



Paul Dell

A. Evans, N. Thacker, and J. Mayhew. "The Use of Geometric Histograms for Model-Based Object Recognition." Proc. Brithish Machine Vision Conference, 429--438, 1993. A shape recognition system of rigid object in 2D grey scale images is presented. The system utilizes geometric histograms measuring the relative angle and perpendicular distance of various features in the object. An advantage of this feature based system is that partially occluded objects can be identified and that this system can utilize a parallel algorithm. The matching algorithm requires only "simple array multiplication". There is a very limited amount of experimental data presented in the paper, so it is difficult to judge the real life performance of the system. A few shortcommings are that only edge features are measured so no internal features, greyscale or otherwise, is compaired. To detect the edges of the object, the background is solid and is at a fairly high contrast to the object. It would be interesting to see how the edge detection would work if the objects were of varrying color (or shades of grey) and the background close to the color of some of the objects. One last criticism is that there was no cost performace data presented. Even though this is a parallel alg. there was little mention of the cost of the computations. S. Sclaroff. "Deformable Prototypes for Encoding Shape Categories in Image Databases." Submitted to Pattern Regonition, special issue on image databses. Boston University TR95-017 September 1995. (did not have time to read, I will before next class)

William Klippgen

The paper "The Use of Geometric Histograms for Model-Based Object Recognition" by Evans, Thacker and Mayhew introduces a histogram-based approach for representing shape properties. The geometric relationships used in the representation are edge angles and the minimum and maximum distance between line segments. The interesting point with the representation is that a histogram is made for every edge where the edge in question is made the base line. By doing this, the method is claimed to be gracefully degrading under occlusion. The other noteworthy feature of the scheme is that broken edges (due to occlusion, noise or various lightning condintions) will be represented very similar to the complete edge due to the histogram properties. As long as objects are both modeled and captured from approximately the same perspective, this method seems to be robust by making use of local properties quite insensitive to fragmentation and occlusion. The matching method is very primitive and tries to match all recognised features with all modelled features in a so-called "parallel matching strategy". While the matching lacks any kind of feature organization, the authors suggest a pure hardware implementation of histogram correlation in parallel. The method seems to work well under a controlled environment where the viewing persepctive and object orientation can be controlled. Still, I find that the system demonstration results could have been more thouroughly presented and their graceful degradation claim better documented under various conditions. Stan Sclaroff's paper, "Deformable Prototypes for Encoding Shape Categories in Image Databases", takes a more realistic approach in general shape detection. The suggested modeling of various poses and deformations of an object or a class of objects defines matching as projecting a sample shape into the "prototype space". This projection simultaenously seeks to recognize the object and its deformation. The approach called "modal matching" starts with modeling a set of shapes of an object based on a set of feature point locations. The points are used as nodes in building a finite element model of the shape. A subset of significant feature points should be chosen to reduce the workload. By finding the "modes of free vibration" of the model set, an orthogonal object-centered coordinate system of eigenvectors is found. Each feature point can then be described by its participation in the various modes of deformation. By comparing two sets of feature vectors, we will find strong correspondece for two corresponding feature points. Faced with a large number of object poses and deforamtions (shapes), there is a need to represent the object as a small number of "characteristic views". By selecting a few representative protoypes for each category. Each shape in the database is then alligned with each of the prototype, and the strain needed to transform itself to the prototype is stored. Given the feature point matching, the paper presents a method for finding the modal deformation parameters. The similarity measure is the amount of "strain" needed to transform a shape A to shape B. The measure disregards translation and rotation by not taking into account the rigid body motion modes. The main advantages with this method seems to be that its similarity measure equals the human one. The various modes of deformation probably correspond to human perception's partitioning of shape variation. It is also "selectively" insensitive to camera viewpoints as it is insensitive to affine transformations. It would be interesting to see how a combination of image content recognition (eigenface computation, texture, colur-histogram), motion matching (motion characteristics like speed, direction etc.) and image shape recognition could work together to provide for a generalized image recognition tool. By using the feature point matching in image content analysis, the matching of content (i.e. with eigenfaces) could possibly improve.

Lars Liden

"The Use of Geometric Histograms for Model-Based Object Recognition" Evans, Thacker & Mayhew This paper struck me as a very interesting approach to shape recognition and the geometric histogram appears to be very robust to occlusion of objects. Two issues some to mind which aren't clear to me and weren't discussed in the paper. First, I'm not clear as to whether this method requires segmentation for the recognition of objects (obviously the models must be formed from segmented objects). The occlusion examples seem to indicate that segmentation isn't necessary, but it isn't clear that this would be true if the same objects were placed over a texture such as a checkerboard, or were themselves formed of textures. Secondly, although this method handles rotation and translation of modeled objects with ease, it doesn't seem like the representational method would be able to handle scale. I also see no easy way in introducing scale invariance using this method. Normalization in an image with multiple objects would not achieve scale invariance. One could use several models for each object at different scales, but this would be computationally unattractive. As with other shape based methods using linear and/or polygonal approximations, this method would not work as well with natural objects such as grass, hair, trees, etc. Finally, it is interesting to note that this method is doing more of a feature extraction than a shape extraction and is critically effected by the size of the window chosen for measuring the geometric relationships. For example, it seems to me these two figures would have very similar histograms as they share similar features: |--/\/\/\--| |--^+^+^---| | | | | |--^+^+^---| |--/\/\/\--| | | | | |--^+^+^---| |--/\/\/\--| | | | | |--/\/\/\--| |--^+^+^---| (Hard to show in ascii!!) "Deformable Prototypes for Encoding Shape Categories" Scarloff This paper was original among the shape recognition papers we have read so far in that it is able to handle non-rigid shapes by getting a point to point correspondence between an object in question and prototype objects in a database, morphing one shape into another and measuring the amount of deformation between the object being examined and the models in the database. It isn't clear to me that there is a definite correspondence between the measurement of deformation and the perceptual distance between two objects. For (a bad) example, the deformation between two people, a bald male with a top hat and a female with long curly hair, and the deformation between a male human and an ape would seem to be quite different perceptually but not so using this type of deformation measurement. It would also be interesting to see a larger database. In the current database the Red Fin Needle Fish is likely to be closer to the shape of a missile or a pen than another type of fish. Although the method would appear to work in most cases, it would seem there are cases for which it will fail. Perhaps it would work best when combined with other methods including the weighting of relevant shape features for shape similarity. For example, a fish should have x-number of fin like features.

Gregory Ganarz

In the paper "Deformable Prototypes for Encoding Shape Categories in Image Databases", S. Sclaroff describes a method for image database search which uses deformable prototypes to represent shape categories. While the utility of the method has been demonstrated on actual image databases, the deformable shape method suffers from several problems: since the method relies on establishing point correspondences between the test object and prototype, the method does not handle occlusion or partial objects well. Further, establishing this correspondense is computationally expensive. Also, the number of strain computations scales with the number of remembered prototypes. This is in contrast to human memory speed which seems to be independent of the number of memories. In "The Use of Geometric Histograms for Model-Based Object Recognition" A. Evans et al. propose a method for object recognition which is rotation and translation invariant. Basically a structural approach, the model encodes shape by creating "geometric" histograms, which encode both the distance and angle between edge features of objects. The complexity of this method scales with the number of edge features. While the model could be implemented by a neural network, it suffers from a combinatorial explosion when creating cells sensitive to certain distances and angles between line features. The model also proposes a large number of geometric histograms, one per encoded feature. It is unclear whether these could be combined into one geometric map, and how much information would be lost.

Shrenik Daftary

Synopsis of "The Use of Geometric Histograms for Model-Based Object Recognition" by Evans et al. This paper shows a method to recognize and locate multiple RIGID objects from their 2D projections in grey level images. The authors consider the case where occlusion may occur either through self occlusion or blocking by another object. The technique that is presented is based on geometric relationships within a shape. Data for the geometric histogram is based on features that help determine the existence of an object. An example is presented of two line segments where the two features that are represented is the relative angle between the objects, and the distance between the two lines (at the maximum and minimum points). In order to determine complete representations of shape, a coordinate frame is drawn around each line feature. This is done for all line features, and geometric relationships are determined for each feature within a certain radius of the line. This feature-based method is demonstrated to be robust to a segment blockout of a line, since the histogram remains essentially the same. Some of the benefits claimed are redundancy (take up more space than normal representation), locality (good if not dealing with an object whose ends define it), robustness, and facility to match. The matching algorithm for this technique involves first the ability to generate histograms for each line feature in all images. The histograms can each be considered vectors and their match to an image feature is the correlation between the image feature and the model feature. The metric is scale invariant, and robust against spurious features. The method to determine the object is done by comparing each of the features of the new image to that of the model histograms. The closest model is the one with the highest correlation and a low threshold throws out this closest model if the correlation is too low. This technique allows the matching of only a few of the matches in order to determine object existence, and determine the presence of novel objects when a feature correlation falls below a threshold. This system demonstrated excellent functioning in their test set; however questions of its performance can be raised when unknown objects of similar geometric characteristics to known objects are put in. Also there are times when the relationship between distant points is significant. Finally though this technique seems to be good for many cases when the object is hidden with something in the middle. This technique would be more useful if combined with a database of multiple viewpoints - at least 2/3 to cope with different angles. This would lead to a huge memory requirement though. Synopsis for ~Deformable Prototypes for Encoding Shape Categories in Image Databases" by Sclaroff The paper begins by describing morphing, and mentioning the technique of modal matching that allows users to specify a few example shapes for the computer to sort based on their similarity. The requirement for the view-based parameterization of a shape are prototype views, point correspondences between the new shape and the prototype views, and a method to measure deformation. Modal matching is a method that: * determines point correspondences using an energy-based model * warps of morphs one shape into another using energy-based interpolants * measures the amount of deformation between an objects' shape and prototype views A new image is then compared to potential shapes in terms of distance to a prototype similarity metric. This technique allows the use of perceptual and semantic information that is discarded by invariant statistics, and also allows comparison for non-rigid objects. This technique use a finite element model that does not require a priori parameterization of the images. The FEM used provides interpolation that reduces problems from poor sampling. The modal representation is described in detailed in terms of the mass, damping, and stiffness matrices. The new modal representation avoids the a priori use of a single prototype object, and instead uses the data to define the object's deformability. The technique works for each object by: determining FEM mass and stiffness matrices solving generalized eigenvalue problem matching low-order nonrigid modes for both shapes Using the matched modes as coordinate system When the number of feature points exceeds a threshold the process can be sped up by using a lower resolution FEM. Once two images are matched up, the point correspondences can be used to determine the deformations required to morph one image to the other (that is determine the modal deformation parameters that take the set of points from one image to the corresponding points in the other image). Data representation using this technique is compressed since only a few characteristic views are needed to represent the data. The relative distance between a new object and the current views can be determined, but a metric space distance between objects needs to be used instead of strain energy which fails the symmetry axiom of metric space. A different metric that uses areas as a scaling factor is introduced. The decision of modal prototype shapes is discussed, in terms of both human selection and an untested automated version. Next an alternative method is presented that uses low-order mode vectors of both the new shape and the prototype to compute the distance. The technique was tested on series of tools, fishes, and rabbits. It appeared to work well in these cases, since the highest matches at least concurred with my perceptions. It is most beneficial compared to other systems when an object is moving or changing shape. Parts of the limitations with this system are the of controlled tests for the object. Although it is suggested that objects would be identified in cases of minor occlusion - the amount of the object that can be hidden is not mentioned. This system like the author mentioned can be combined with other metrics in order to get a more complete tool.

John Petry

THE USE OF GEOMETRIC HISTOGRAMS FOR MODEL-BASED OBJECT RECOGNITION, __________________________________________________________________ by Evans, Thacker and Mayhew This is a simple and elegant approach to a bin-picking application. It shares properties of a general Hough transform, in that it relates fixed object edges to each other. Unlike Hough, it is rotation independent. It appears to share the limitations of Hough as well, namely that it is extremely dependent on object rigidity and is susceptible to noise. It also doesn't scale well; the modification mentioned whereby the splat size is increased doesn't work beyond a narrow range. It is good to see that it can tolerate the fragmentation of line segments, since that is probably a frequent occurence. Curves would probably cause it much more problem, though, unlike Hough which only looks at collections of individual edge pixels. It looks like it might be computationally expensive for complex shapes in busy scenes. DEFORMABLE PROTOTYPES FOR ENCODING SHAPE CATEGORIES IN IMAGE DATABASES, ______________________________________________________________________ by Sclaroff This is a very interesting technique that differs from most of the others we've seen so far. In particular, it's ability to handle natural deformations is quite useful, as is the scale invariance and the absence of any a priori shape assumptions. As with almost all other techniques to date, it requires segmentation before running. It also doesn't work well if the object is partially occluded, which is a severe drawback if the database isn't well-constrained. I'm puzzled about the way the feature points are selected. I can understand how a user would choose them, or how they could be automatically selected by the system. The latter approach seems to limit the algorithm to edge points, since interior points could well be due to shadows or non-repeatable noise. But how are the model feature points selected? If I understand it correctly, they should correspond to the object points -- but what makes that so? Either I'm missing an important detail, or it has been left out. This technique seems limited to 2-D objects, in that rotation of an object out of the image plane will produce a very different figure, which will not match its original model. That is, the rotation will manifest itself as a deformation, which it really isn't; the images is deformed, but not the object. It also appears limited to edge pixels, since I don't see how it can reliably choose interior feature points as mentioned above. I have to admit this is one of the few techniques that can easily solve the example cases presented in the paper, other than perhaps the eigen-object approach. Unlike that approach, this one does not make use of internal intensity patterns, but it's undoubtedly better at matching figures based on their outline.


Stan Sclaroff
Created: Oct 1, 1995
Last Modified: