BU CLA CS 835: Seminar on Image and Video Computing

Class commentary on articles: Eigenmethods



Lars Liden

"Eigenfaces for Recognition" Turk and Pentland This paper examined a two-dimensional approach to facial recognition. The authors used the eigenvectors capturing the greatest variation of a set of faces to form what they call "eigenfaces" - the significant features of a face. Face recognition was performed by mapping a novel face into face space and finding the face class is space that is nearest to the novel face. If this distance is greater than some threshold the face is classified as an unknown face. The difficulty with this type of method lies in segmentation of the face from the background, and the placement of the face in the center of the image and at the right scale. Suggestions were made for using spatiotemporal filtering, in video for recognition of head position and scale. The end of the paper briefly mentioned the use of neural networks with respect to eigenfaces. There is an article from Gary Cotrell's lab a few years back which uses exactly this technique to recognize faces and identify their sex under various conditions including occlusion of various parts of the face (a problem in the Murase & Nayar). The paper shows that an auto-associative network basically learns the principle components of a set of faces. I can't seem to find the reference at the moment, but I'll look for it and see if I can bring it in. "Visual Learning and Recognition of 3-D Objects from Appearance" Murase & Nayar Problem: Automatically learning object models for object recognition. Traditional Approach: Generate a 3-D model, and use geometric shape for recognition Disadvantages: 3-D models not readily available, must be generated by programmer Although shape and reflectance are intrinsic properties, pose and illumination vary from scene to scene New Approach: Matching of 2-D appearance rather than shape Some support from psychophysical findings that humans use this Problem: Must somehow find a way to compress a large set of images into a low dimensional representation of object appearance Technique: Object first scanned (automatically) in a number of poses and illumination directions. Each digitized image is segmented, re-sampled so the larger of its two dimensions fits a pre-selected image size, and the overall intensity of illumination is normalized Use Karhunen-Loeve transform (principle component analysis) used to find the eigenvectors of an image set Note: Average of all images subtracted out The eigenvectors with the largest eigenvalue are chosen to form two different eigenspaces (account for most variance) The "universal eigenspace" for images of all objects The "object eigenspace" for images of each object Each image of the object is projected into eigenspace, the points representing the object are connected using the cubic spline interpolation method, to form a manifold in eigenspace which represents the image. To identify a new image its image is projected into eigenspace. Neat proof show that the closer projects are in eigenspace, the more highly correlated are the images Note that comparison is made to manifold not original image projections. Allows for object identification in universal space and pose estimation in object space. Problems: Segmentation of image from background is vital and a problem that still hasn't been adequately addressed A simplified segmentation algorithm was presented Method can not handle occlusion of objects Seems to require having seen all combinations of an image that wish to be recognized, (e.g. pose, lighting direction) Humans can recognize known objects in novel views, there must be some structural information available The number of object used in the given examples is rather small (20 with limited views). As the number of object increases, the size of the universal eigenspace is likely to get huge. Only three parameters can be used without difficulty (e.g. two types of rotation and one illumination direction). System wouldn't work for an arbitrary number of rotations, illuminations (gives very limited applicability) Universal eigenspace is must be recomputed from the set of all images Note: Required used 1.6 Gbyte hard disk to learn just 4 objects rotating in one direction and with one illumination direction A better method of adding objects must be discovered, a few are suggested, but they aren't as reliable Illumination conditions have been oversimplified. Assumes same ambient lighting with one additional directional component

Gregory Ganarz

The two papers "Eigenfaces for Recognition" by M. Turk and A. Pentland and "Visual Learning and Recognition of 3-D Objects from Appearance" by H. Murase and S. Nayar are both based on a similar computational process of finding eigenvectors. Thus, the papers both suffer the same shortcomings: the images that the algorithms operate on must be segmented, scaled, brightness normalized, and centered. Only after all this pre-processing can the algorithms produce decent performance. In both papers, learning is done off-line, and no mention of whether an on-line update algorithm exists. Both papers claim to be "biologically plausable" but clearly organisms learn incrementally, which is not what the papers propose. Further, while it might be true that early vision analyzes an image into principle components (orientation, color, etc), it certainly does not go about it by the processes proposed in these two papers (matrix manipulation). There are "neural network" algorithms which find principle components (e.g. oja's rule and sangre's (spelling?) algorithm), but these learning rules are not used in the eigen-papers. Other problems with the proposed eigen-algorithms is that they are sensitive to orientation of the objects they are operating on.

Shren Daftary

Visual Learning and Recognition of 3-D Objects from Appearance by Murase and Nayar This paper presents an algorithm to solve the problem of object model recognition and pose estimation by use of computer learning techniques that convert a 3-dimensional object to 2-dimensional images. The general approach of this paper is to vary the orientation of the object and the illumination and then to compress the set of images to a low-dimensional representation of the image. One such technique - principal component analysis, computes the orthogonal eigenvectors of an image set. This method provides a simple Euclidean distance metric that correlates between the similarity of the images. The authors chose a variation of this technique, parametric eigenspace to initially process the image. The object is then represented in both universal eigenspace - which is determined by the set of all objects of interest, and its own eigenspace. The process for this begins by normalizing the images into background and object regions. The background part of the image is cut out by reducing all of its pixel's brightness to 0. The object region is normalized to fit a predetermined scale. Each image is acquired with a different rotation parameter, and lighting direction. In this particular setup the authors test 5 different lighting sources, and angular precision of 4 degrees. Such a large set of images would be difficult to store and compare against other images, so the next step is to compute the Eigenspace of the image set. Algorithm: c= average of all images in the set X is the set of the images with the average c subtracted from them next Q, which is the covariance matrix of X is computed The eigenvectors Ei and the eigenvalues are calculated by solving the eigenvector problem Since Q is an NxN matrix there will be N eigenvalues, but not all of these are significant so only k are stored. The best way to determine the amount is to make sure a certain percentage of image intensity is contained in the first k images. Each eigenvector is the size of an image, and must be stored completely. In cases where pose or illumination variation cause dramatic changes in brightness, the variation of the eigenspace will not be smooth. This method corresponds well however for smoother cases when k is small. Distance and correlation are then defined. Next the procedure to recognize and object and its pose are given. The brute force method of determining similar images is mentioned. The best way is to compute the distance between two images. This is done with respect to the manifold sets though, and if the distance is below some threshold for an object then we consider the object to be in the set p. This manifold method will determine which image an object is closest to, and then the pose needs to be computed by comparing the new image to previously stored images. Tests were performed by varying the number of dimensions k that were stored for an image set. In the first set the pose of the objects was known, and for only 4 dimensions the error approached 0, while only 20 poses are need to get a reasonable amount of data for recognition. I could not find figure 10, but the algorithm performs robustly, although it acquires the initial data set to be captured with lots of different poses. The main faults of this paper are that it examined fairly small data sets of only 4, which do lend themselves to minimal storage space, therefore I doubt that only 4 dimensions would be sufficient to have high accuracy. Also the amount of data that needs to stored seems to be excessive, and would continue to grow as a function of the number of objects that are in the set, so some improvement in terms of coordinating the different objects may reduce the amount of data that needs to be stored. Eigenfaces for Recognition by Turk and Pentland This paper deals with computer recognition of faces. Unfortunately previous computer vision research does not relate well with the project of facial recognition, so new techniques need to be developed to solve the recognition problem. This paper suggests a technique to have facial recognition that is fast, simple, and accurate in CONSTRAINED environments. The technique is also expandable to allow the learning of new faces, when that becomes necessary. Unlike previous studies of facial images, this paper attempts to present what aspect of the face are significant for facial identification in an informational sense. This technique makes use of the observation that pictures of faces can be efficiently stored using principal components. The initial part of their algorithm involves getting an initial set of faces, calculate the eigenfaces from this set, keeping only M images that correspond to the highest eigenvalues (face space). After that calculate the corresponding distribution in M-dimensional space for each known individual. In order to recognize an image - calculate the projection of the input image into face space, determine if it is in the existing face space, if it is classify as known, otherwise classify as an unknown. The calculation of the eigenface involves the simple assumption that a 256x256 image is a point in 256^2 dimensional space. However since spaces are similar they do not need so many dimensions to be defined - they can be stored in a relatively low dimensional space. All the images in the training set are averaged, and the difference from the calculated average is determined for each face. Next M eigenvectors are calculated using the standard technique. Next for each image in the initial set the necessary weight to reconstruct the image is determined. In a reconstruction using this technique for 115 Caucasian males only 40 eigenfaces were necessary to reduce the reconstruction error to 2%. The new space is projected into faces space by simply calculating the appropriate coefficient. This method works at about 400 msec using what they considered poor equipment, and software, and suggest that a 90% performance improvement is possible with specialized equipment. Next a technique to locate a face in a video is presented. The idea behind this is that faces will not be too far off when projected into face space, while other objects will be. The difference between the projection and the original image is determined by a simple correlation which if below a threshold determines that the object is not a face. Finally techniques to deal with the problem of background effects from hairstyle to environmental surroundings is presented. Also methods to deal with changing facial characteristics such as beard growth are mentioned. Some problems with this paper were with the limitation of the test to Caucasian males, if another test was done with a standard sampling of the population - what would the best method be? Divide the population into males and females with separate spaces for persons based on their skin tone - or not divide. Another problem that is mentioned is the inability to deal with changes in facial features such as hair growth - this may be dealt with by subtraction techniques that rid any face of the facial growth before it is put into the eigenface technique.

Paul Dell

H. Murase and S. Nayar. "Visual Learning and Recognition of 3-D Objects from Appearance." International Journal of Computer Vision, 14(1):5--24, January 1995. The Murase paper introduces a recognition technique for 3-D objects that utilizes the appearance of the object rather than the shape. A number of 2d images are taken of an object to capture a sufficience amount of the "appearance" characteristics ie. shape, reflectance, pose, and illumination. These images are then compressed into a low-dimentional (eg. 20)representation. (note the number of dimentions needed will vary and currently there is no good estimate given by the authors of the number of dimentions needed for a very large and varried data set). The representation used by the authors are parameterized by object pose and illumination and called "parametric eigenspace". The advantage of using object appearance is that no prior knowledge of the object shap or reflectance is needed. An automated object learning system can be constructed to aid in "learning" various images. The learning phase of the system can take a significant amount of time (eg. 20 objects, 72 poses for each image took 12 hours on a Sun SPARC 2), but the recognition process is quick (eg. <1 sec for previous set of objects). There are limitations to this approach. First the approach assumes that there exists a segmentation algorithm to separate the object of interest from a scene. Second the objects are assumed to not be occluded. Third the "universal eigenspace" needs to be recalculated whenever a new object is added to the set. The authors do discuss some ways around this though. M. Turk and A. Pentland. "Eigenfaces for recognition." Journal of Cognitive Neuroscience, 3(1):71--86, 1991. The eigenfaces paper presents a system to recognize faces in an image. The eigenface approach does not try to model a face (eg. 2 eyes, nose, mouth) as other work has done, instead images are reduce to find the principle components that characterize the face. Eigenvectors are calculated from the training set of images and a set of eigenfaces are selected. In the experiment presented in the paper, 115 images of Caucasian males were used and about 40 eigenfaces were taken from this set. The system can be used to both detect faces in images and identify faces. To detect faces in an image, each point in the image can be calculated as a center of a face, but this is very computationally expensive. Instead the authors use motion to filter out humans (because "People are constantly moving.") and then calculations are made on the moving segment. For a set of known faces, the sytem achieves 96% accuracy over lighting changes, 85% accuracy over orientation changes, and 64% over size changes. Along with other improvements the authors suggest a Neural Network implementation of there system. Work is continuing to improve the system and expand its capabilities to identify gender and facial expressions. Overall the system appears to do well with centered, segmented facial images. There is some study by varying head orientation, scale and occlusion. The system works fairly well %85 accuracy over limited orientation movements but does not do well with changes in scale. Also, no data is given about the accuary of the system with a face set of people from different genders, ages, and ethnic backgrounds. The system may not fair well under these variations.

John Isidoro

This week the reading were about being able to recognize either a face or an object position by computing its position within an eigenspace. Theoretically the image in question should be able to be reconstructed using a linear combination of the eigenvectors each weighted respectively by the object's position in each dimension. I think this technique is really only useful in a very controlled environment, i.e. mug shots of peoples faces under a certain lighting which are aligned perfectly. However, it is the only technique we have learned so far that can distinguish between images that are very similar in color and texture like faces.. Another nice feature of using eigen-analysis is that you are theoretically able to reconstruct your base image from the eigenvectors. None of the other techniques we learned allowed us to actually reconstruct an image. I think this is a very good clue as to why using eigen-analysis can be so accurate, there is a tremendous amount of information contained within the eigen-vector images themselves. One area where the eigen-analysis paper are not as thourough as the other papers we have read in the previous weeks is in the description of the actual implementation of the algorithms described. I think the reason for this is that the math involved in reducing the dimensionality of the eigenspace is and builing the eigenvectors is more complex than what we have seen before. On a side note, the idea of being able to reduce a large set of orientation images into a few eigenvectors is very similar to the concept of steerable filters. The similarity is in the fact that just a few filters can be used to simulate the effects of many many more filters, just like the eigen-vectors (images) can be used to recognize a multitude of orientations or faces.

John Petry

EIGENFACES FOR RECOGNITION, by Turk and Pentland __________________________ This is a statistical approach to face recognition and verification (recognition meaning to answer the question "whose face is this?" and verification "is this X's face?" It's computationally straightforward: can be done at about frame rates. Requires scaling, lighting and segmentation control. The approach starts by collecting standardized training images. Consider each as a point in NxN space. Average those belonging to each contributor to form an average value for that person, and average all contributions to form an "average face." Compute eigenvectors and values for the space. While this would require N**2 for perfect correctness, only M training images exist, so only M most important eigenvectors available. Using these, or more likely a subset of the M' most important, any face image can be described as a linear combination of the M' eigenfaces, using appropriate weights for each eigenface. These weights themselves form a vector which uniquely describes each face (or average face) in terms of the eigenfaces. When an image of an unknown face is presented to the system for recognition, first convert it to its eigenface representation; i.e., compute its weights. Compare these to that of the avg. face to determine if image really is of a face. If so, compare to each known face weighting vector to see which it is closest to. If close to one particular face, call it a match. If more than a certain amount different from any known face, it is probably a new face. Can add it to the list of known faces if desired. For identification, use the same process, but only compare to avg. face and one other face. Since all faces presented to the system are first converted into weights of eigenfaces, this method is also very useful as an image compression technique for storing facial data in a small fraction of that required by image pixels, with only minimal degradation. According to the paper, very good results can be achieved if scale, orientation and lighting are carefully controlled. The scores drop off quickly if this is not the case. An addition to the original system, described in the follow-up paper, presents a method to handle orientation changes in one dimension using interpolation techniques. Perhaps the largest limitation is that of segmentation; the method assumes that a face has been segmented from the background, then scaled correctly. If this is not true -- if the person is wearing a hat, for instance, which confuses the scaling function; or if part of the background is included in the image containing the face -- then I'd expect the approach to fail quickly. The lighting limits mean that this technique can probably only work well under fairly controlled conditions, which implies that it may not be possible to use this for recognition or verification on oncontrolled image databases. It might be possible to use the average face vector to search for faces in general at different scales, though. Also, while this approach may break down when trying to match a face if the database of known faces is very large, say > 1000. My suspicion is that differences in imaging between training time and runtime may outweigh the variation between images of different people created under identical conditions. The follow-up paper supports this; they recommend using the same approach on facial features as is done on the faces themselves to enhance discrimination, ie., using eigenfeatures. This may add considerably to computation time, and resolution issues are more important, but I can see that it would help. In general, this is a good approach that offers a significant improvement on other techniques such as corelation, but it still falls well short of being a general purpose tool. Within strict limits, I suspect it is pretty good. VISUAL LEARNING AND RECOGNITION OF 3-D OBJECTS FROM APPEARANCE, ______________________________________________________________ by Murase and Nayar This paper takes an approach similar to the first, with two main enhancements: it is generalized to 3-D objects rather than 2-D faces; and it attempts to deal with lighting and orientation by building these parameters into the training database itself. Rather than having one or two training images of each object, as the original eigenface project did, the authors create an image at each of a controlled set of orientations, and with controlled lighting changes. This forms a complex set of representational vectors whose values define a manifold. Handled correctly, it is possible to present the system with uncontrolled images and have it interpolate between known orientations and lighting conditions. In addition, the system can handle more than one object. It does this by creating two separate eigenspaces: one formed from the set of average representation of objects (the "universal" space) to distinguish between objects, and one formed from the different appearances of a single object under the range of orientation and lighting created at training time (the "object" space). The universal space is used to select which object most closely resembles a new image presented to the system; the object space then tries to determine its orientation and lighting to verify the identification. The system is useful in that it identifies a method to deal with some of the variables that caused problems for the eigenface system, namely lighting and orientation. The two-phase search method need not be limited to the authors' choice of universal and object spaces, either -- it could be generalized for other training choices. The training step itself seems overwhelming: it involves a special setup a 2- or 3-D rotational device, plus specialized movable lighting, and considerable training time. It is unlikely it could be used for many objects, and certainly can't handle the addition of new images to its database the way the eigenface system can when a new face is shown to it. To be fair, though, I'd have to say that many (most?) objects have predominant orientations which would preclude the necessity of training from every conceivable vantage point. The same can't be said of lighting, though. In addition, the number of sample points means this is probably much more computationally expensive than the eigenfaces, when the number of views and lighting differences are multiplied by the number of objects. In addition, the average eigenobject may be fairly meaningless -- at least faces have a common organization. This may mean that the dimensionality of the eigenspace needs to be much higher than for faces. Also, unlike faces, many typical objects have non-linear components such as textural textural features or highlights that may not be well-represented in the database, or which may be very susceptible to scale. Nonetheless, if the input can be sufficiently controlled (a big if), this could be quite powerful. For instance, it offers a way to handle facial rotation in all three dimensions for the eigenface problem. I seriously doubt whether this would work as a general interactive object finder, though, most importantly because of the training issues.


Stan Sclaroff
Created: Oct 1, 1995
Last Modified: