BU CLA CS 835: Seminar on Image and Video Computing

Class commentary on articles: Shape I



William Klippgen

The paper Grosky and Mehrota suggests an effecient method for finding the closest match between an extracted feature and set of well-defined features in a feature index. The construction of a index tree and the data-drien approach are reasonable approaches when the number of objects and features is high. Since the focus of the paper is on the index contruction and traversal, it might be unfair to criticize it for the simple feature representation. However, there is probably reason to ask if the static implementation of the index traversals and the lack of support for any learning mechanisms in the index are weaknesses. A problem mentioned in the paper is the case where the similarity measure difference is too small between the sample feature and the reference feature of two sub-tree root nodes. It could possibly be solved faster if there was some kind of correlation techniques guiding the index traversal based on previously found or hypothesized features. I am, in other words, not convinced that the similarity measure should be fully independent from the hypothesized range of possible objects, or the object space. There is certainly a very interesting issue how also the sequence of extracted features in the image might give faster conclusions using the a priori knowledge of the set of previously found features. The paper by Taubin and Cooper, "Object Recognition Based on Moment (or Algebraic) Invariants", presents a general way for computing an invariant measure for any 2D curve or 3D surface. Not only can objects in an arbitrary position be recognized, but their coordinate transformations are also given, so that their location and orientation in the input image can be calclulated. By treating a number of sub-parts of object, in other words treating local features, partially occluded objects also can be recognized. A combination of a feature index proposed by Grosky and Mehrotra with a learning or smart traversal mechanism and the invariant representation scheme with moment invariants would probably prove to be a good object recognition scheme.

Lars Liden

"Object Recognition Based on Moment (or Algebraic) Invariants" Taubin and Cooper This paper lost me on some of the mathematical formalism (eg I never quite understood the working of an affice transformation), however the basic idea of computing moment invariants seemed fairly interesting. Perhaps the most useful aspect of this method is its ability to handle the occlusion of objects, by computing moment invariants for what the authors termed 'interest regions'. The eigen-approach examined last week was not able to handle such occlusion. As long as one of the interest regions is not in the occluded area, the authors suggest that a useful level of accuracy is still possible. Unfortunately, the authors don't make it clear how one might choose what should be used as an "interest area". They only give the vague suggestion that interest areas should be easily identified and provide good object discriminatory power. It is not clear a priori what these areas might be. One difficulty with this approach is that like many others it does not handle object segmentation. The objects used in the examples were thresholded and binary (black or white). Additionally, my impression is that this method is MUCH more sensitive to accurate segmentation than the eigen-approach. Whereas the eigen-approach would still be reasonably accurate for not-so-well segmented objects, it seems the moment invariant approach would quickly lose applicability as the quality of segmentation decreases. "Index-Based Object Recognition in Pictorial Data Management" Grosky and Mehrotra This paper was clearly written from the traditional computer science / artificial intelligence perspective and operates on a grossly simplified set of image data. The objective of this paper turned out to be more or less to find an efficient search algorithm for a database of objects represented by selected features in a polygonal approximation of object boundaries. Unfortunately the article didn't show a lot of knowledge about difficulties with image processing as the polygonal feature representation seemed inappropriate. As in the previous paper, this method ignores the problem of achieving accurate image segmentation. It would seem to be very sensitive to having an an accurately segmented object to perform accurately. On the other hand, this method does handle rotation, translation and occlusion.

Gregory Ganarz

Object Recognition Based on Moment (or Algebraic) Invariants by G. Taubin and D. Cooper: This paper proposes a moment based approach, meaning that the "features" used in recognition are regions of mass concentration (where mass is pixels composing the object). As an example, a barbell has mass separated from the primary object center, while a sphere has its mass localized near the primary center. These mass moments can be made invariant to view. By computing a number of these moments, occluded objects can be recognized with some success. One of the difficultiess with this approach is that the computation of algebraic invariants is computationally expensive. Also, it is unclear how this method would fare under more realistic image conditions (textures, uneven lighting, etc.) Index-Based Object Recognition in Pictorial Data Management by W. Grosky and R. Mehrotra: In this paper, the authors use a polygonal approximation of the boundary of each object. For each vertex of this polygon, four features are computed: the internal angle, the distance from the next vertex, and the x,y coordinates of the vertex. This representation is invariant to rotation, translation, and with a transform, scale. Oddly, only a few paragraphs of the twenty page paper are devoted to discussing this representation. The rest deals with how to create a tree to search for object matches efficiently. In biology, I believe a more parallel approach has been taken (as you learn more, it does not take longer to access data.) On the topic of whether the polygon representation is a good one: while it does seem to do well on the outlines of objects, the polygon approach does not address how to model internal features such as the eyes on a face. Furthermore, it does not easily generalize to 3D rotations.

Shrenik Daftary

Synopsis of Index-Based Object Recognition in Pictorial Data Management by Grosky and Mehrotra This paper presents an efficient access method for an image database. Most recognition techniques that analyze scenes having touching, or occluded objects rely on feature based analysis. The process of identification relies on hypothesis generation and verification. The generation of hypothesis can be done by different methods - one assume the identity of the object and determine its location (model by model approach), which has a high cost of generating the hypothesis. Another approach relies on determining which features are present in the object and comparing it to the list of objects with the same feature. The speed of these approaches can be enhanced by using binary search trees. The collection of features can by organized according to a feature index tree. One essential element of this tree is a similarity measure between different features. The tree is a binary tree with the following properties: 'The model features are stored at the leaf nodes' 'Each of the interior nodes contains a feature, called a reference feature.' 'The members of any subtree are more similar to the reference feature stored at the root of that subtree than to the reference feature stored at the root of the sibling subtree. Such a tree can be constructed using an algorithm that first determines model features and the finds a reference feature, which will be the root. Several methods to find a reference feature are mentioned. Next the feature collection is sorted into two groups. Next the reference feature for each of the groups is determined. The subcollections are adjusted such that the last requirement of a feature tree is met. Finally the last three steps are recursively applied until the collection has a size of 1. The best match algorithm uses a modified binary search which makes use of the similarity between the features. These algorithms can by best represented using a flow chart. Each object is compared with the feature tree and if the similarity to some known feature is above a threshold then the feature is known. The method of adding a feature is presented, as well as a method to replace a feature. A couple of experiments using the algorithm is presented. The first series used 11 objects to build up the feature space. If a feature's similarity to any other similarity is less than .2 the object was placed in both subtrees. According to the paper objects were successfully recognized even when occlusion occurred. Finally a method based on feature indexing on a multiway search tree is presented. The problems I saw with this paper are its reliance on purely black & white models, with mostly sharp edges. There was not too much testing of curved objects - another possible problem is its use of a specific subset of objects car mechanic tools, that may not be used for this purpose. In general though this method seems like it would be able to find objects of similar characteristics. Synopsis of Object Recognition Based on Moment Invariants by Taubin and Cooper This paper presents a method to deal with object recognition that can cope with arbitrary shaped objects in cluttered environments in arbitrary positions. The general procedure first computes moment invariants that are invariant to affine transformations of the data set. The techniques presented a computationally efficient method to identify objects based on best fitting polynomial analysis of the object, which provides computation of algebraic invariants. The premise of this technique is to provide a method that is completely independent of the viewer coordinate system. The first definition presented in this paper is one based on moments which first provides the normalized moment which is the integral of Phi(x) x the density function over the integral of the density function. This technique rids M of the dependence on viewer distance. A similar method for moment calculation is presented using summation and is presented for sparser sampled objects. Next the definition of centered moments is presented. Finally moment invariants are defined which are functions of the moments of a shape independent of the coordinate system. Next the need to find a sufficient number of invariants is mentioned, along with the need to determine the effects of affine transformations. Lexicographical order is presented for multiindices. Finally a definition of the matrix associated with a multiindex is presented for both vectors M[d] and matrices M[j,k]. Next a definition is presented that deals with coordinate transformation of the same object. The polynomials are expanded into their Taylor series. Forms are defined - forms are sets where each term has the same power. Next a mapping of the viewpoint parameter matrix A of the viewer is presented which provides a d-th degree representation matrix of A. First it is a linear representation, next transposition is preserved, it is lower triangular, and finally the determinant is equivalent to the matrix A raised to the m = (C(m+d-1,n-1)) power. Next the relation of the moments invariants and algebraic invariants is presented. The computation of moment invariants is presented. The first fact mentioned is that the computation of the determinant of A is in the order of n^4. The matrix C[j,k] is defined as a function of the moments M, and C[j,k](M') = A[j]C[j,k](M)A[k]t is defined as the covariant matrix. If both C and C' are related in a certain way C will be call contravariant or left covariant and right contravariant. If the matrix A is orthogonal (Euclidean) then the matrices will be contravariant and A[d] will be orthogonal as well. Euclidean covariant matrices can be constructed by multiplying other Euclidean covariant matrices together. Next a technique to deal with more complex affine moment invariants is presented. Next methods to normalize a shape with respect to affine transformation is presented. The center has already been defined. The orientation in the Euclidean case corresponds to one of the 2^n orthonormal sets which diagonalizes a symmetric nxn covariant matrix. This simply provides a technique to calculate an orientation matrix. Next a covariant vector is computed. In the affine case, since there are an infinite number of nonsingular matrices that diagonalize the symmetric matrix, the process of coordinate system determination is different. The application of these procedures is shown in the next part of the paper, which mentions that the simplest way to use invariants is to compute them for an entire object. Problems may occur if segmentation is not possible, or if the object is occluded. The alternative of using interest regions is presented. This provides the ability to deal with occlusion. The actual implementation of the process for affine shapes involves the following steps - compute the center, compute the second degree moments, find the lower triangular matrix such that M'=LMLt=I, which is computed in two steps (find the lower triangular matrix such that RRt=M and then invert it to satisfy the above condition) The benefits of the techniques presented in this paper are its ability to use few invariants, and the ability to use a polynomial curve to fit an object by using standard matrix computation techniques. Problems with this technique are its inability to deal with 3d rotation, its use of non-color models, and the relatively large distance of the camera from the object. This method does provide the ability to identify objects that are rotated, scaled, or translated.

John Isidoro

The first reading (Mehrotra and Gary) was a reasonably simple introduction to shape recognition.. The explanation of global vs local properties and data driven vs model-based approaches served as a good introduction to the next two papers. The next paper (Grosky and Mehrotra) was about using a data driven approach to shape indexing which is the feature index tree.. I think this concept is powerful, but has one main weakness, it assumes the image you are analyzing contains one of the shapes contained in the feature index tree. The Tubain and Cooper was mathematically overpowering.. However, the concept of contructing a set of properties of a shape that are invariant to affine transforms is a very useful one.. However I think that there might be other ways to find such properties. I'm kind of surprised that there was no information on using histograms in shape analysis. I think that an angle or/and line segment length histograms for shapes represented in vector form could be equally as powerful of a tool as Moment Invarients. Another possibility is arc radius histograms for shapes represented as curves.

John Petry

INDEX-BASED OBJECT RECOGNITION IN PICTORIAL DATA MANAGEMENT, ___________________________________________________________ by Grosky and Mehrotra The authors suggest a data-driven method of organizing and searching images rather than the traditional model-driven method. In the latter case, an image is searched for each model instance or key model feature. This is computationally quite expensive at runtime, especially as the list of possible models grows. Instead, they recommend running a small number (perhaps only one) feature detector on the image, then searching for matches between each detected feature and a binary tree of features found in the set of models. For each feature at a node in the tree there exists a list of models and locations containing instances of the feature. Once sufficient features from the image have been compared with this tree, a viable set of candidate models for the object in the image exist. These candidates can be tested more exhaustively against the image to determine which if any matches the object. This is certainly a fast approach, assuming the feature detector can accurately run on the image at high speed. To help with accuracy, the authors' tests involved presegmenting the image by binarizing it and by limiting its content to pretrained objects. However, they did permit multiple, occluding objects in the scene. They divided their models into a single feature space (interior angle of a polygon at a vertex combined with the distance to the next clockwise node). It appears that this approach would work as well with a combination of feature spaces, depending on the application. For instance, if object overlap and scale were not concerns, models could be indexed into separate trees based on ratio of principal moments, perimeter, and area. Each new object inspected could compute these values and three separate tree searches performed. So in that sense, it's easy to expand the approach. One limitation of the method is its handling of measurement accuracy. They say there are two approaches: either search down both branches of a tree if the nodes on both sides are close to the value of the search key, or duplicate information on both sides of a branch when values are within some limit. The latter suggestion strikes me as strange, since it violates the authors' fundamental data-driven approach. When searching, the data should properly be thought to consist of the key and a tolerance (maximum measurement inaccuracy). Instead, the authors' would embed the tolerance in the feature tree, effectively hard-coding it. The former suggestion, searching both branches from a node when the subnodes are similar fits their paradigm better. This is effectively the same as doing two searches, one on and one on and considering all nodes between the two that are found to be valid candidates. This is conceptually somewhat simpler than what they argue for, though it can obviously be up to twice as slow. (I'm not sure if it can be faster). Their basic design approach is good, in that it permits offline computation of model data in order to enhance realtime performance. I think they go too far in their example, in that they mix a particular embodiment of their approach (identifying binarized scale-invariant rigid polygons by principal angle/distance nodes) with their theory. I'd rather see more discussion of the data-driven organization itself: for instance, how about alternatives to the binary tree, eg. hash tables? This is a reasonable approach if a universal feature test exists (that is, one that can be applied to any image shown to the system), and where the scores of models against that feature span a range, and not just a binary found / not found result. Also, the number of feature measurements performed should be small compared to the number of models in the system. Finally, the feature test itself must be known well in advance, in order to organize the feature space appropriately. OBJECT RECOGNITION BASED ON MOMENT, by Taubin and Cooper __________________________________ There is less to say about this paper, since the bulk of it is mathematical in nature. Essentially, the authors have an improved method of computing invariant object moments based on the edge pixels of a 2-D object, or the surface pixels of a 3-D object. Since most objects are not truly 2-D, the effective constraint is that the camera-to-subject distance be much greater than the height of the object. This is a fast and straightforward method once the source pixels (edge or surface) have been determined. This determination may not be trivial, however. It assumes the object has been segmented. Also, the system claims to be able to handle occlusion by running on the moments of sub-elements of objects (which again requires a certain segmentation) or by having a separate process interpolate/extrapolate missing pixels (quite a challenge). Within the authors' parameters, though, it seems like a very practical approach.


Stan Sclaroff
Created: Oct 1, 1995
Last Modified: