BU CAS CS 585: Image and Video Computing --- Class commentary on articles

BU CAS CS 585: Image and Video Computing

Appearance-Based Recognition
November 5, 1996

Readings:

H. Murase and S. Nayar, Visual learning and recognition of 3-D objects from appearance, Int. J. Computer Vision, 14: 5-24, 1995.
A. Pentland, B. Moghaddam, and T. Starner, View-Based and Modular Eigenspaces for Face Recognition, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 1994.

Kriss Bryan
Bin Chen
Jeffrey Considine
Cameron Fordyce
Timothy Frangioso
Jason Golubock
Jeremy Green
Daniel Gutchess
John Isidoro
Tong Jin
Leslie Kuczynski
Hyun Young Lee
Ilya Levin
Yong Liu
Nagendra Mishr
Romer Rosales
Natasha Tatarchuk
Leonid Taycher
Alex Vlachos

Kriss Bryan

View-Based and Modular Eigenspaces for Face Recognition

This article is fascinating in that it shows methods of successfully recognizing a face given the egienvectors and the eigenspace. Eigenvectors are used to create regions of the eigenfeatures and the eigenface. In a few tests that were performed the eigenfeatures detected the person more accurately than the eigenface and the combination of the two resulted in a more accurate detection.

This is a very useful device to track people if that is desired. The article mentions two types of eigenspace methods: the parametric and the view based. The view based always seemed more accurate, this is because it contained the independent subspaces which described the region of the facespace. What would happen if a face was looked for and no matches were found, would it be placed in the database? I did not realize how much humans take for granted as far as recognition.

Feature Extraction from Faces Using Deformable Templates

This article is also quite interesting. It is the contrast of the eigenfeature in the article above. Deformable Templates are used to get as close as possible to the actual feature. These features include eyes and mouths. A parametric equation is used for each template. The eye and mouth templates follow the valleys, peaks, and edges in images inorder to deform themselves into the shape of that feature.

As stated before I did not realize how difficult feature recognition could be. This article is good in that the equations are interdispersed throughout the article. This makes the argument easier to understand. It seems to me that a combination of this method and the methods above might make a more robust face recognizer.

Bin Chen

Jeffrey Considine

View-Based and Modular Eigenspaces for Face Recognition

Pentland, Moghaddam and Starner present a series of experiments using various eigenvectors for the recognition of faces within a large database. They focus on generalizing the problem to multiple views from just "mug shots" and compare the parametric eigenspace method, parameterizing the eigenspace by identity and the view, with a view based method, using a set of views (parallel "observers") to select the most appropriate eigenspace (the closest view). They find the view based method to be more accurate.

From here, they expand the techniques to higher level features, eigenfeatures such as eigeneyes, eigennoses, and eigenmouths. Since they already have an eigenspace formulation from the problem of multiple views (from the parametric or view based method), they are able to use eigentemplates for the placement of features and can check the error using the DFFS, the distance from feature/face space. Use of these eigenfeatures is much more accurate than eigenfaces when a significant portion of the face is covered. Combining the use of these two layers increases accuracy giving very accurate recognition of faces within the database.

Cameron Fordyce

Visual Learning and Recognition of 3-D Objects from Appearance

by Hiroshi Murase and Shree Nayar

Automatic machine learning is an important new area of research in many areas of pattern recognition research. Currently, speech recognition and synthesis as well as the field of Computational Linguistics use methods that allow the automatic training of models for the domain under study. So, it is no surprise that pattern recognition of images would not benefit from automatic learning of models of images and faces, in particular( see the following review of A. Pentland, B. Moghaddam, and T. Starner's paper). The authors of this paper also depart from standard techniques of recognition by proposing to match objects by appearance rather than by geometry.

In this paper, object appearance is learned by taking a sequence of images where changes in pose and illumination are strictly controlled. The resulting images are then "compressed" into an eigenspace for compact representation by using Karhunen-Loeve transform. The authors limit themselves in this paper to considering only object pose and illumination.

While the technique described is extremely interesting and the authors show its utility in object recognition, they mention some assumptions that limit the usefulness of this technique. One, they assume that the object can be segmented from the surrounding scene and that it is not partially blocked by other objects. The second assumption is natural but it is hardly a realistic one for 'real world' recognition applications like robot navigation where recognition may need to be more robust. Further, the controlled conditions under which they create image models might have problems when applied to objects that cannot be so easy and precisely moved such as trees and automobiles.

Finally, having proposed a method of automatically creating models of images, this technique now faces problems that are common in the other areas of research mentioned above such as data or objects that are out of the domain that you have created your universal eigenspace or model on, problems of differences between the training and the actual recognition image characteristics(mentioned as a problem by the authors). That is, if the object model is created with only one source of illumination, the recognition system will probably fail if the object to be recognized is illuminated with two sources of illumination.

View-Based and Modular Eigenspaces for Face Recognition,

by A. Pentland, B. Moghaddam, and T. Starner

The authors present a technique for further refining the technique outlined in the above reviewed paper. This method includes specific feature representation of faces (i.e. representations of eyes, mouths and noses) that augment the more general approach of eigenvalues for the whole image. Given the difficulty of the problem, near 95% recognition over such a large database( in the case of the database with 3000 images) is impressive.

Problems with the standard eigen method are also presented. They include occlusion and differences in illumination. However, it is clear that this technique especially when augmented with the parts-based description of the face is much better than a standard template matching algorithm.

Timothy Frangioso

Alan L. Yuille, Peter W. Hallinan, And David S. Cohen, "Feature Extraction from Faces Using Deformable Templates", February 11, 1992.

This paper details a method for finding salient features in images. Specifically it demonstrates a way to detect facial salient features. This is accomplished by using particular knowledge about what the images actually is and is going to behave like. This is used to predict what the various features will do. That is what properties they will have. The author claims that is method is pretty robust and "will work despite variations in scale, tilt, and rotation of the head, and lighting conditions."

The process uses a template to fit over the various features of the face. The paper details the eyes, closed mouths and open mouths though the author states that an analogous method could be used for finding eyebrows, ears and other features. This template is created by performing some preprocessing on the image to get three different representations of it. These are then used to find particular properties in the images such as "peaks, valleys and rapid intensity changes. " Surprisingly these representations do not have to be precise and one can use simply methods of extracting this information from them such as morphological filters.

This method seemed to work well when the correct coefficients were found. I am wondering how sensitive this method is to thresholding. If the feature changes drastically will the process recover from this. An example would be the feature being hidden for some period of time like a child playing peak a boo or half of the feature being seen.

Alex Pentland, Baback Moghaddam, Thad Starner, "View_Based and Modular Eigenspaces for Face Recognition" , 1994

This paper describes a series of experiments that show how face recognition performs with a number of various techniques. The eigenface, view based and eigenspace approaches are discussed. The major problems with any face recognition system are the variations in orientation, scale and illumination. These cause the face to look very different in various pictures. For any recognition to work these will have to be overcome.

The first approach talks about the experiments done with the eigenface method. It shows that using this system the recognition accuracy is 95%. Out of this approach the author details both view based techniques and eigenspace techniques by means of comparison.

Jason Golubock

Visual Learning and Recognition of 3-D Objects from Appearance

The problem here is that I have only the last few pages of this article included in my cs585 readings packet, which I just discovered last night. I was able to read only about the experiments the authors did with their recognition techniques. Unfortunately I don't know what they're talking about. It sounds very intersting, though.

View-Based and Modular Eigenspaces for Face Recognition

This article describes the work the authors did using eigenfaces for face searching and recognition. Searches are done in very large databases (O(10^3)) which the authors believe has never been done before. They describe the use of view-based and modular eigenspaces. The article assumes previous knowledge of eigenfaces and eigenspaces which I simply do not have. I am thus unable to comment on their experiments. The authors point out that they achieved good recognition using a large database of faces, which seems like a good thing.

Jeremy Green

Daniel Gutchess

View-Based and Modular Eigenspaces for Face Recognition

Authors Pentland, Moghaddam, and Starner present experimental results of face recognition in a large database using eigenfaces and eigenfeatures. Used is a database with over seven thousand images (faces). Ninety-five percent of the time, their Photobook application picked out the same person given in input image. Two methods of recognition with different viewing angles are discussed: one using parametric eigenspace, and view-based approach using multiple eigenspaces. The view-based method is more accurate (and slower) since a set of projections is computed for each view of the face. Experimental results show that the view based method consistently performs better at both interpolation and extrapolation. It is also shown that the eigenface technique can be used on individual facial features, and further, that a layered representation using both the eigenface and eigenfeatures performs extremely well. Eigenfeatures can be very useful, since keying on the eyes and nose would be invariant to beardedness.

I thought the paper did a good job explaining some of the issues (problems and trade-offs) in creating a face-recognition technique. Although I was a little confused by the section explaining view-based versus parametric techniques, the rest of the paper was very clear. It would probably help to read Pentland's paper on Eigenfaces to understand the technical details.

Feature Extraction from Faces Using Deformable Templates

Authors Yuille, Hallinan, and Cohen show in detail how to use deformable templates to find features on the human face. The parameters of the deformed templates then describe qualitatively the appearance of the particular facial feature. Much like snakes, templates change their size and shape to minimize an energy function. Many functions of the original image are used to compute the energy: edges, valleys, peaks, and the original image intensity. Eye and mouth templates are given (a collection of geometric parameters), as well as energy functions to draw the templates to the correct places on the face. Algorithms to search for features are divided into "epochs", where a subset of the forces are allowed to act upon the template. Problems that the algorithm typically faces are discussed, and several examples illustrate how well it performs. The authors suggest that inter-template relationships might improve detection, since facial features are always aligned a certain way.

This paper was very clear. It was helpful that they provided examples of both eye and mouth templates, and showed diagrams and mathematical formulas for each. The technique itself seems like it would perform very well when the camera direction and lighting is not known, although maybe too slow(?). Tracking with it seems like a formidable task if it is indeed inefficient.

John Isidoro

Tong Jin

Leslie Kuczynski

Hyun Young Lee

The addressed intelligent vision system consists of two stages such that: first, learns object models starting from scale-normalizing images, computing eigenspaces of the normalized image sets and obtaining the appearance representation of objects in universal eigenspace using parametric eigenspace representation and second, recognize objects in an image and estimate its pose by projecting the input image to eigenspace and finding the closest manifold.

It is interesting that the property of correlation of eigenspace yields a very good indication of similarity between images and so makes this scheme powerful for recognition of objects based on their appearance rather than shape (model) which many earlier recognition methods are based on, so that can employ 2-D aspects to recognize 3-D objects.

As indicated by the authors, occlusion of objects in the input image causes this method to fail and segmentation in varying background is also unresolved problem. However, if the objects have some salient features - this thought is related to the eigenfeature of the other paper ("View-Based...") - the vision system can guess the occluded object to recognize, within the permissible range of error probability.

View-Based and Modular Eigenspaces for Face Recognition

Starting from exploring the scalability of eigenface technique, the view-based approach is proposed for face recognition under various pose. Such multiple-observer approach yields a more accurate representation of the underlying geometry, where parametric one is simpler. Furthermore, modular eigenspaces combines eigenspaces with eigen features so as to employ salient facial features such as eyes and nose, to accomplish more accurate and robust face recognition.

These methods show very good experimental results (98%) and also a good scalability of eigenface technique (7,562 images). It is amazing that eigenfeature match identified the correct individual even the face presents with and without mustache (at the first glance I couldn't recognize they are the same person).

Ilya Levin

Yong Liu

Nagendra Mishr

Visual learning and recog of 3-D from appreance

The article by Murse and Nayar describes a method for representing 2d objects for use in machine vision. A major problem in machine vision is that imput images are 2-d and may not be aligned in the same angle that the database images were scaned in.

The algorithm first reads in the image and computes its eigenvalues from a multitude of input angles and lighting conditions. These values are then learned against a "universal" normalized image. This means that all the images have identical lighting conditions and all the images aht all the angles are consolated into one image. The image is then eigenized, or turned into an eigenvalue and stored. The entire process of learning an image entails scannin the iage at all the different possabilities and takes a long time.

The lookup process does not take as long and is pretty reliable for objects which do not look much alike. however, the technique relies on learning images and matching against them at a later time. Scaling the images at lookup time or looking at multiple lighting angles or from a 3d axis will mess things up in the lookup process. The authors do say that they can extend their algorithm to different spaces and that different spaces should be used based on the specific application. By different spaces, I mean different lighting conditions and different viewing angles. I'm am not sure wether to classify this as a top down or a bottom aproach since the same technique works for different objects as long the database has been trained. But at the same time, since the input angles and lighting abgles and query images are constrained to certain predefined values, I think that the algorithm is always looking for a particular match which makes it top down.

View-based and Modular Eigenspaces for Face Recognition

This article by Pentland and Moghaddam talks about applying the eigenvalue technique to recognizing a specific set of faces. The image database is constructed using eigenvalues and later databases are scaned for similar images.

The idea is pretty simple, and can be simply stated as a sorting algorithm for faces. Given an input image, the algorithm looks of the 20 most likley faces which match the given input. They had pretty decent performance of 20 faces in 1 second. The database is configuered by looking at eyes and noses of people.

As in the previous article, image faces were scaned using different viewing angles and stored as eigenvalues. The authors implement a DFFS technique which automatically detects facial features. They do not describe the technique in detail but I think it looks at peaks and vallys to determine eyes and noses and mouths. The input images do have to be alighed up with the eyes to get an accurate distance measure for the nose and mouth so it is not fully automated. (I can't imaging that everyone's eyes are equally spaced the same size.) The authors did not that they had some problems with certain ethnic populations. Also noticing from figuere 4, the faces seem to be the same shape (oval), I have to assume that the study was not complete in terms of people with varyin size faces. Can this technique detect twins acurately?. Can it detect smiles or frownes or yawnes? In defence, the authors do state that the application assumes controled environments such police lineups.

Romer Rosales

View-Based and Modular Eigenspaces for Face Recognition

(Article Review)

This work illustrates experiments for recognition and interactive search in (according to its authors) a large-scale database by using eigenfaces. Also it presents a viewed-based multiple-observer eigenspace approach for the problem of recognition under general viewing orientation and variable pose. A solution for the problem of feature extraction is also studied.

In previous research, good 2-D results were obtained by using template matching and matching using eigenfaces (a kind of template matching using a transformation of a set of pictures, in this case: faces). This work extend these approaches to large databases with more general viewing conditions. Specifically, it studies the scalability and extends the eigenface technique of Turk and Pentland. The generalization of the previous approach to a view-based and eigenspace formulation allows recognition under different face orientations and model the incorporation of facial features (eyes, nose and mouth).

The experiments with the face database estimates the recognition performance on larger databases (7562 images from 3000 people are used). Also, it uses eigenfaces obtained from a sample of 128 faces. This database is interactively searched by using an image database tool (Photobook). Briefly it works as follows: A non-graphic description is specified in order to filter the universe of faces, then faces that match this description are presented. One of these faces can be selected, then the eigenvector description of that face is used to organize the set of faces according with theirs similarity of the selected model.

One important result of this work is that because each face is described using only a small number of eigenvector coefficients, the entire searching and sorting is considerably fast. Another important point: it was able to find (95%) the same person despite some variations in expression and the use of some extra-face components and illumination variables.

The second part of this work compares two ways of approaching the problem of face recognition. Having N individuals and M different views. The first consists in computing the combination of the NxM images in a universal eigenspace, it encodes identity and viewing conditions. The second is based on the construction of M separate eigenspaces, each one captures the variation of the N individuals in a common view. This is an extension of the first that uses a view-based architecture (one eigenvector per view). This last approach brings an extra processing problem: determine the proper viewspace for a given image. The image need to be encoded (an recognized) by using the eigenvectors of the viewspace.

The main difference between them is that the first one describes the whole set with a projection onto a single low-dimensional linear subspace. The second achieve a more accurate representation, it uses M independent subspaces (views of a face) to describe the problem.

The performance of these approaches were evaluated resulting in a higher performance for the view-based method. (Refer to the paper for more information on the methodology used).

I think that the fact that this technique can be easily extended to the description of facial features, which result in a higher recognition accuracy (higher resolution details are now being considered) gives it a large importance in the field of machine recognition in general. It is really interesting how this approach can make an audio-visual communication system, for example, much more efficient.

But the addition of the eigenfeature approach bring some extra work, this is: how to locate these features in any facial image> This works illustrates that the called distance-from-feature-space in the eigenspace formulation can solve this problem.

Something that maybe a weak assumption in this work is to say that only 7562 images from 3000 people can be considered a representative universe. A more formal description of the model would have given us a more complete understanding of this work.

Feature Extraction from Faces Using Deformable Templates.

(Article Review)

Edge detection is a basic problem that is constantly being topic of study in the field. Edge detection in faces is an important sub-area, but these techniques are weak or inconsistent in the detection of features in faces. A lot of assumptions have to be made a all the information is difficult t organize in a general approach.

One alternate way to achieve feature detection in faces is illustrated in this review. In general, this work is related to the detection of features on faces using templates that can be auto-modified in shape (between given constraints), but not in its essential structure. For achieving this, the feature that we want to detect has to be described by a parameterized template.

If we are to design a parameterized template, then it is necessary to have a previous knowledge about the possible shapes that the feature can take to guide the detection process. I think that this knowledge is also useful to serve as a key to define the level of freedom that parameters can allow, so we can guarantee that our template do not change too much and become a non-possible representation of the feature.

The template need to be guided in order to change its shape properly, that is why an energy function need to be defined in order to relate or link sub-features: edges, peaks and valleys in the image with the properties of the template. The template is deformable, it interacts with the image forces and modify its parameters in order to achieve a minimization in the energy.

The final resulting set of parameters that best fit to the real feature in the image can be used to describe the feature, which can be very useful for classification or matching in a set of similar features. Many applications can be easily found for it.

It is important to notice that the initial values of the parameters are determined by preprocessing. The templates this work uses act on three representations of the image and on the image itself. Due to this, the templates only need to be described in simple form . Some techniques are mentioned to perform this. (Refer to the article)

Deformable eyes and mouths templates were used to picture this detection technique. The eye and mouth template was built after some experimentation to determined their important components. The algorithm used a searching technique based on steepest descent. It attempts to find the most salient parts of the object.

Detection of edges is a basic component in the field of image processing. This article approaches the specific field of feature detection in faces, but it can be applied not only to faces but to many other real objects with features that can be defined with parameters.

A remark that I believe is important in this model is its ability (in theory) to detect features in different scale, levels of rotation and lighting conditions. Also, tracking objects by using this technique seems to be easy to model with the same concepts. A good property is the fact that this model can be extended to use other feature to create a sophisticated model for recognition. Also, the model can be as accurate as it needs by simply changing some parameters or even adding new components (perhaps sometimes secondary) to describe more precisely the object.

Some future work and a more general formulation are described at the end of the article.

Natasha Tatarchuk

Visual Learning and Recognition of 3D Objects from Appearance

In this paper Murase and Nayar present a novel technique for automatic recognition of three-dimensional objects from their two-dimensional representations. I enjoyed reading this paper because besides being very coherently written, the authors presented it in a fairly intuitive way, with thorough explanations of the the theoretical basis for the work, and a good discussions of the practical work done to support their approach. The brief overview of the workings of the biological system for vision, with comparison to the machine vision systems also helped in finding the intuition for this technique.

This paper does a great job in explaining the image compression technique by computing the eigenvectors of an image set. Also, a new compact representation for the object appearance, as well as representing a family of images, is introduced in this paper--parametric eigenspace.After an images set of the object is obtained through taking various poses and illumination and normalized in brightness and scale, the eigenspace for the image set is constructed by computing the most prominent eigenvectors of the set. This technique is particularly attractive for the reasons of compactness of the representation for a large image set, which is crucial in this work.

One main problem that stands out to my eye right away from this paper (and I must say that authors don't claim to have solved this at all!) is that the approach for recognizing the object that they presented is not robust at all when the viewing direction has changed or the object was rotated in space. That might not be a problem for the some purposes, but it makes this approach less general. Also, thought the authors have simplified the learning of the objects part in order to optimize the time performance, it seems not entirely robust.

View-Based and Modular Eigenspaces for Face Recognition

The authors of this paper describe their experiments with the eigenface technique of Turk and Pentland (and I guess, in order to understand all the details of this approach, it would've been useful to read that paper), however the purpose of this particular work is to apply the above technique (or modify it if necessary) to a large recognition problem. One definite reason of pride for the authors is the large database of faces that they've attained--about 7,000 images of 3,000 people.

Two different viewing geometries are described in this paper--first, the parametric eigenspace introduced in the previous paper (above), and second, the view-based method. The main difference between the two approaches is that the parametric eigenspace is computed to generalize M different views in one universal image space, and the view-based approach would construct M separate eigenspace for each image view, ove for each orientation and scale. The comparison that the authors make for this technique is that of 'parallel observers'. The view-based approach seems to yield a more accurate representation of the underlying geometry of the face.

In order to have a better recognition approach, the eigenfeatures were developed. So, now, instead of trying to match the face as a whole, the individual features would be compares instead, which allows more robust results. Overall, the authors demonstrate the success of the eigenspace technique (though with some modifications) for object search and recognitions throughout a large image database.

Leonid Taycher

Visual Learning and Recognition for 3-D Objects from appearence.

In this paper Muraze and Nayar discuss an algorithm for recognizing 3-D objects based not on their 3-D model but on the large number of 2-D views with varying lighting and orientation. They treat the whole image (an object view) as a feature vector and group them in two sets. One is the set of all the views of all the objects which is used for determining to which object an unknown sample belongs. Another is the set of all the views for one image. This set is used for approximating orientation and lighting for the sample when it is known which object is belongs to. They perform Principle Components Analysis on both sets and use only few largest components to dirrefentiate between images. After that, they compute parametric manifolds(curves or surfaces which go through all the samples belonging to one object and depending only on two parameters -- lighting and orientation) for objects in the universal set to ease the identification.

The algorithm presented does seem to work. But it has several problems, first of all it requires the sample taking in the controlled environment, which is not always possible. Second, it recognizes only objects it has models for and requires the images for recognition to be segmented from the background, which is not always possible. It also uses whole images a starting feature vectors, which does not influence the search time, but makes the database creation time very time and space consuming.

View-Bases and Modular Eigenspaces for Face Recognition

This article talks about research which seems to build on the previous paper, but goes further in that attempts not only to recognize objects (faces in this case) but, primarily to recognize positions of the parts of the object (in this case eyes, nose and mouth). This is done the same way as the face recognition using DFFS.

Again this approach seems to work (it is reported to have 90% accuracy) and has small wait time, but it requires a big amount of the preprocessing and depends on the human intervention(selection of the facian features in the sample faces)

Alex Vlachos

Reading #15
View-Based and Modular Eigenspaces for Face REcognition

Face recognition is a very interesting subject. They present the topic by using a mug-shot example. Up until this paper was written, other attempts at solving this problem have only taken a set of a few hundred faces. Their experiments took over seven thousand faces of about three thousand different people.

They discuss the difference between a parametric eigenspace and a view-based method of searching through images. The view-based method seems like a more logical solution to the problem. The database consisting of many pictures of the same individuals are separated according to facial expressions. This effectively separates the images into categories of facial features/expressions. The difference-from-feature-space using the view-based method is very successful in face recognition.

According to the paper, this is the first method of face recognition that has been successful on a large database of images. It does a good job of explaining the methods discussed in the paper.

Reading #16
Feature extraction from faces using deformable templates

Deformable templates use three different versions of an image to detect different features, such as eyes, nose, and mouth. Each of the different representations of the image are used to find different facial features. By looking at the different intensities of one of the images, you can find the perimiter of the eye, for example.

They give a template for finding an eye. The template has eleven parameters that define an eye including the center of the pupil, the upper and lower eye lids, the entire pupil, and the size of the pupil. There are constraints on these features such as the maximum height of the parabolas that define the eyelid shape.

They give a similar template for a mouth. The mouth is more difficult to find a template for because of the possibility of showing teeth. They give a few examples of mouth detection which are very interesting. Overall, the method described in this paper offers an interesting method for classifying facial features using mathematical representations.



Stan Sclaroff

Created:  Nov 5, 1996

Last Modified: Nov 8, 1996