BU CLA CS 835: Seminar in Image and Video Computing --- Class commentary on articles

BU CLA CS 835: Seminar on Image and Video Computing

Class commentary on articles: Eigenmethods



Lars Liden

"Eigenfaces for Recognition"
 Turk and Pentland

This paper examined a two-dimensional approach to facial
recognition. The authors used the eigenvectors capturing the greatest
variation of a set of faces to form what they call "eigenfaces" - the
significant features of a face.  Face recognition was performed by
mapping a novel face into face space and finding the face class is
space that is nearest to the novel face.  If this distance is greater
than some threshold the face is classified as an unknown face.  The
difficulty with this type of method lies in segmentation of the face
from the background, and the placement of the face in the center of
the image and at the right scale.  Suggestions were made for using
spatiotemporal filtering, in video for recognition of head position
and scale. 	The end of the paper briefly mentioned the use of
neural networks with respect to eigenfaces.  There is an article from
Gary Cotrell's lab a few years back which uses exactly this technique
to recognize faces and identify their sex under various conditions
including occlusion of various parts of the face (a problem in the
Murase & Nayar).  The paper shows that an auto-associative network
basically learns the principle components of a set of faces.  I can't
seem to find the reference at the moment, but I'll look for it and see
if I can bring it in.

"Visual Learning and Recognition of 3-D Objects from Appearance"
Murase & Nayar

Problem: 
	Automatically learning object models for object recognition. 
Traditional Approach: 	
	Generate a 3-D model, and use geometric shape for recognition   
  Disadvantages: 	
	3-D models not readily available, must be generated by programmer 
	Although shape and reflectance are intrinsic properties, pose
		and illumination vary from scene to scene 


New Approach: 	
	Matching of 2-D appearance rather than shape 
	Some support from psychophysical findings that humans use this   
  Problem: 
	Must somehow find a way to compress a large set of images into
		a low dimensional representation of object appearance
  Technique: 	
	Object first scanned (automatically) in a number of poses and
		illumination directions. 
	Each digitized image is segmented, re-sampled so the larger
		of its two dimensions fits a pre-selected image size,
		and the overall intensity of illumination is normalized
	Use Karhunen-Loeve transform (principle component analysis)
		used to find the eigenvectors of an image set 	        
		Note: Average of all images subtracted out 
	The eigenvectors with the largest eigenvalue are chosen to
		form two different eigenspaces (account for most
		variance)
	    The "universal eigenspace" for images of all objects 	
	    The "object eigenspace" for images of each object 
	Each image of the object is projected into eigenspace, the
		points representing the object are connected using the
		cubic spline interpolation method, to form a manifold
		in eigenspace which represents the image. 	
	To identify a new image its image is projected into
		eigenspace. 
	Neat proof show that the closer projects are in eigenspace,
		the more highly correlated are the images 	
	Note that comparison is made to manifold not original image
		projections. 
	Allows for object identification in universal space and pose
		estimation in object space.


Problems: 	
	Segmentation of image from background is vital and a problem
		that still hasn't been adequately addressed 	
	   A simplified segmentation algorithm was presented
	Method can not handle occlusion of objects 	
	Seems to require having seen all combinations of an image that
		wish to be recognized, (e.g. pose, lighting direction)
	Humans can recognize known objects in novel views,  there must
		be some structural information available 	
	The number of object used in the given examples is rather
		small (20 with limited views).  
	As the number of object increases, the size of the universal
		eigenspace is likely to get huge. 	
	Only three parameters can be used without difficulty (e.g. two
		types of rotation and one illumination direction).  
	System wouldn't work for an arbitrary number of rotations,
		illuminations (gives very limited applicability) 	
	Universal eigenspace is must be recomputed from the set of all
		images 	   
		Note: Required used 1.6 Gbyte hard disk to learn just
		4 objects rotating in one direction and with one
		illumination direction 	   
	A better method of adding objects must be discovered, a few
		are suggested, but they aren't as reliable 	
	Illumination conditions have been oversimplified.   
		Assumes same ambient lighting with one additional
			directional component

 Gregory Ganarz


The two papers "Eigenfaces for Recognition" by M. Turk and A. Pentland and
"Visual Learning and Recognition of 3-D Objects from Appearance" by H. Murase
and S. Nayar are both based on a similar computational process of finding
eigenvectors.  Thus, the papers both suffer the same shortcomings: the images
that the algorithms operate on must be segmented, scaled, brightness normalized,
and centered.  Only after all this pre-processing can the algorithms
produce decent performance.  In both papers, learning is done off-line, and no
mention of whether an on-line update algorithm exists.  Both papers claim to be
"biologically plausable" but clearly organisms learn incrementally, which is
not what the papers propose.  Further, while it might be true that early vision
analyzes an image into principle components (orientation, color, etc),
it certainly does not go about it by the processes proposed in these two papers
(matrix manipulation).  There are "neural network" algorithms which find
principle components (e.g. oja's rule and sangre's (spelling?) algorithm), but
these learning rules are not used in the eigen-papers.  Other problems with the
proposed eigen-algorithms is that they are sensitive to orientation of the
objects they are operating on.

 Shren Daftary


Visual Learning and Recognition of 3-D Objects from Appearance by Murase
and Nayar

This paper presents an algorithm to solve the problem of object model
recognition and pose estimation by use of computer learning techniques
that convert a 3-dimensional object to 2-dimensional images.  The general
approach of this paper is to vary the orientation of the object and the
illumination and then to compress the set of images to a low-dimensional
representation of the image. One such technique - principal component
analysis, computes the orthogonal eigenvectors of an image set. This
method provides a simple Euclidean distance metric that correlates between
the similarity of the images. The authors chose a variation of this
technique, parametric eigenspace to initially process the image. The
object is then represented in both universal eigenspace - which is
determined by the set of all objects of interest, and its own eigenspace. 

The process for this begins by normalizing the images into background and
object regions. The background part of the image is cut out by reducing
all of its pixel's brightness to 0. The object region is normalized to fit
a predetermined scale. Each image is acquired with a different rotation
parameter, and lighting direction. In this particular setup the authors
test 5 different lighting sources, and angular precision of 4 degrees.
Such a large set of images would be difficult to store and compare against
other images, so the next step is to compute the Eigenspace of the image
set. 

Algorithm: 

c= average of all images in the set X is the set of the images with the
average c subtracted from them next Q, which is the covariance matrix of X
is computed The eigenvectors Ei and the eigenvalues are calculated by
solving the eigenvector problem

Since Q is an NxN matrix there will be N eigenvalues, but not all of these
are significant so only k are stored. The best way to determine the amount
is to make sure a certain percentage of image intensity is contained in
the first k images. Each eigenvector is the size of an image, and must be
stored completely. In cases where pose or illumination variation cause
dramatic changes in brightness, the variation of the eigenspace will not
be smooth. This method corresponds well however for smoother cases when k
is small. 

Distance and correlation are then defined. Next the procedure to recognize
and object and its pose are given. The brute force method of determining
similar images is mentioned. The best way is to compute the distance
between two images. This is done with respect to the manifold sets though,
and if the distance is below some threshold for an object then we consider
the object to be in the set p. This manifold method will determine which
image an object is closest to, and then the pose needs to be computed by
comparing the new image to previously stored images.  Tests were performed
by varying the number of dimensions k that were stored for an image set. 

In the first set the pose of the objects was known, and for only 4
dimensions the error approached 0, while only 20 poses are need to get a
reasonable amount of data for recognition. I could not find figure 10, but
the algorithm performs robustly, although it acquires the initial data set
to be captured with lots of different poses. 

The main faults of this paper are that it examined fairly small data sets
of only 4, which do lend themselves to minimal storage space, therefore I
doubt that only 4 dimensions would be sufficient to have high accuracy. 
Also the amount of data that needs to stored seems to be excessive, and
would continue to grow as a function of the number of objects that are in
the set, so some improvement in terms of coordinating the different
objects may reduce the amount of data that needs to be stored. 

Eigenfaces for Recognition by Turk and Pentland

This paper deals with computer recognition of faces. Unfortunately
previous computer vision research does not relate well with the project of
facial recognition, so new techniques need to be developed to solve the
recognition problem. This paper suggests a technique to have facial
recognition that is fast, simple, and accurate in CONSTRAINED
environments. The technique is also expandable to allow the learning of
new faces, when that becomes necessary. 

Unlike previous studies of facial images, this paper attempts to present
what aspect of the face are significant for facial identification in an
informational sense. This technique makes use of the observation that
pictures of faces can be efficiently stored using principal components. 
The initial part of their algorithm involves getting an initial set of
faces, calculate the eigenfaces from this set, keeping only M images that
correspond to the highest eigenvalues (face space). After that calculate
the corresponding distribution in M-dimensional space for each known
individual. 

In order to recognize an image - calculate the projection of the input
image into face space, determine if it is in the existing face space, if
it is classify as known, otherwise classify as an unknown. 

The calculation of the eigenface involves the simple assumption that a
256x256 image is a point in 256^2 dimensional space. However since spaces
are similar they do not need so many dimensions to be defined - they can
be stored in a relatively low dimensional space. All the images in the
training set are averaged, and the difference from the calculated average
is determined for each face. Next M eigenvectors are calculated using the
standard technique. Next for each image in the initial set the necessary
weight to reconstruct the image is determined. 

In a reconstruction using this technique for 115 Caucasian males only 40
eigenfaces were necessary to reduce the reconstruction error to 2%. The
new space is projected into faces space by simply calculating the
appropriate coefficient. This method works at about 400 msec using what
they considered poor equipment, and software, and suggest that a 90%
performance improvement is possible with specialized equipment. 

Next a technique to locate a face in a video is presented. The idea behind
this is that faces will not be too far off when projected into face space,
while other objects will be. The difference between the projection and the
original image is determined by a simple correlation which if below a
threshold determines that the object is not a face. 

Finally techniques to deal with the problem of background effects from
hairstyle to environmental surroundings is presented. Also methods to deal
with changing facial characteristics such as beard growth are mentioned. 

Some problems with this paper were with the limitation of the test to
Caucasian males, if another test was done with a standard sampling of the
population - what would the best method be? Divide the population into
males and females with separate spaces for persons based on their skin
tone - or not divide. Another problem that is mentioned is the inability
to deal with changes in facial features such as hair growth - this may be
dealt with by subtraction techniques that rid any face of the facial
growth before it is put into the eigenface technique. 

 Paul Dell 

H. Murase and S. Nayar. "Visual Learning and Recognition of 3-D Objects from  
Appearance." International Journal of Computer Vision, 14(1):5--24, January  
1995.

The Murase paper introduces a recognition technique for 3-D objects that  
utilizes the appearance of the object rather than the shape.  A number of 2d  
images are taken of an object to capture a sufficience amount of the  
"appearance" characteristics ie. shape, reflectance, pose, and illumination.   
These images are then compressed into a low-dimentional (eg.  
20)representation.  (note the number of dimentions needed will vary and  
currently there is no good estimate given by the authors of the number of  
dimentions needed for a very large and varried data set). The representation  
used by the authors are parameterized by object pose and illumination and  
called "parametric eigenspace".

The advantage of using object appearance is that no prior knowledge of the  
object shap or reflectance is needed.  An automated object learning system  
can be constructed to aid in "learning" various images.  The learning phase  
of the system can take a significant amount of time (eg. 20 objects, 72 poses  
for each image took 12 hours on a Sun SPARC 2), but the recognition process  
is quick (eg. <1 sec for previous set of objects).

There are limitations to this approach.  First the approach assumes that  
there exists a segmentation algorithm to separate the object of interest from  
a scene.  Second the objects are assumed to not be occluded.  Third the  
"universal eigenspace" needs to be recalculated whenever a new object is  
added to the set.  The authors do discuss some ways around this though.




M. Turk and A. Pentland. "Eigenfaces for recognition." Journal of Cognitive  
Neuroscience, 3(1):71--86, 1991.

The eigenfaces paper presents a system to recognize faces in an image.  The  
eigenface approach does not try to model a face (eg. 2 eyes, nose, mouth) as  
other work has done, instead images are reduce to find the principle  
components that characterize the face.  Eigenvectors are calculated from the  
training set of images and a set of eigenfaces are selected.  In the  
experiment presented in the paper, 115 images of Caucasian males were used  
and about 40 eigenfaces were taken from this set.

The system can be used to both detect faces in images and identify faces.  To  
detect faces in an image, each point in the image can be calculated as a  
center of a face, but this is very computationally expensive.  Instead the  
authors use motion to filter out humans (because "People are constantly  
moving.") and then calculations are made on the moving segment.   For a set  
of known faces, the sytem achieves 96% accuracy over lighting changes, 85%  
accuracy over orientation changes, and 64% over size changes.

Along with other improvements the authors suggest a Neural Network  
implementation of there system.  Work is continuing to improve the system and  
expand its capabilities to identify gender and facial expressions.

Overall the system appears to do well with centered, segmented facial images.   
There is some study by varying head orientation, scale and occlusion.  The  
system works fairly well %85 accuracy over limited orientation movements but  
does not do well with changes in scale.  Also, no data is given about the  
accuary of the system with a face set of people from different genders, ages,  
and ethnic backgrounds.  The system may not fair well under these variations.

 John Isidoro 


This week the reading were about being able to recognize either a face or an
object position by computing its position within an eigenspace.  Theoretically
the image in question should be able to be reconstructed using a linear 
combination of the eigenvectors each weighted respectively by the object's 
position in each dimension.

I think this technique is really only useful in a very controlled environment,
i.e. mug shots of peoples faces under a certain lighting which are aligned 
perfectly.  However, it is the only technique we have learned so far that can 
distinguish between images that are very similar in color and texture like 
faces..  Another nice feature of using eigen-analysis is that you are 
theoretically able to reconstruct your base image from the eigenvectors.
None of the other techniques we learned allowed us to actually reconstruct 
an image.  I think this is a very good clue as to why using eigen-analysis
can be so accurate, there is a tremendous amount of information contained
within the eigen-vector images themselves.  

One area where the eigen-analysis paper are not as thourough as the other 
papers we have read in the previous weeks is in the description of the actual 
implementation of the algorithms described.  I think the reason for this 
is that the math involved in reducing the dimensionality of the eigenspace
is and builing the eigenvectors is more complex than what we have seen before.

On a side note, the idea of being able to reduce a large set of orientation 
images into a few eigenvectors is very similar to the concept of steerable
filters.  The similarity is in the fact that just a few filters can be used 
to simulate the effects of many many more filters, just like the eigen-vectors
(images) can be used to recognize a multitude of orientations or faces. 

 John Petry 



EIGENFACES FOR RECOGNITION, by Turk and Pentland
__________________________

This is a statistical approach to face recognition and verification
(recognition meaning to answer the question "whose face is this?"
and verification "is this X's face?"  It's computationally straightforward:
can be done at about frame rates.  Requires scaling, lighting and
segmentation control.

The approach starts by collecting standardized training images.  Consider
each as a point in NxN space.  Average those belonging to each contributor
to form an average value for that person, and average all contributions to
form an "average face."  Compute eigenvectors and values for the space.
While this would require N**2 for perfect correctness, only M training
images exist, so only M most important eigenvectors available.

Using these, or more likely a subset of the M' most important, any face
image can be described as a linear combination of the M' eigenfaces, using
appropriate weights for each eigenface.  These weights themselves form a
vector which uniquely describes each face (or average face) in terms of the
eigenfaces.

When an image of an unknown face is presented to the system for recognition,
first convert it to its eigenface representation; i.e., compute its weights.
Compare these to that of the avg. face to determine if image really is of
a face.  If so, compare to each known face weighting vector to see which
it is closest to.  If close to one particular face, call it a match.
If more than a certain amount different from any known face, it is probably 
a new face.  Can add it to the list of known faces if desired.

For identification, use the same process, but only compare to avg. face and
one other face.

Since all faces presented to the system are first converted into weights 
of eigenfaces, this method is also very useful as an image compression 
technique for storing facial data in a small fraction of that required by 
image pixels, with only minimal degradation.

According to the paper, very good results can be achieved if scale,
orientation and lighting are carefully controlled.  The scores drop off
quickly if this is not the case.  An addition to the original system,
described in the follow-up paper, presents a method to handle orientation
changes in one dimension using interpolation techniques.

Perhaps the largest limitation is that of segmentation; the method assumes
that a face has been segmented from the background, then scaled correctly.
If this is not true -- if the person is wearing a hat, for instance, which
confuses the scaling function; or if part of the background is included
in the image containing the face -- then I'd expect the approach to fail
quickly.

The lighting limits mean that this technique can probably only work well
under fairly controlled conditions, which implies that it may not be
possible to use this for recognition or verification on oncontrolled 
image databases.  It might be possible to use the average face vector to
search for faces in general at different scales, though.

Also, while this approach may break down when trying to match a face if the
database of known faces is very large, say > 1000.  My suspicion is that
differences in imaging between training time and runtime may outweigh
the variation between images of different people created under identical
conditions.  The follow-up paper supports this; they recommend using 
the same approach on facial features as is done on the faces themselves
to enhance discrimination, ie., using eigenfeatures.  This may add
considerably to computation time, and resolution issues are more important,
but I can see that it would help.

In general, this is a good approach that offers a significant improvement
on other techniques such as corelation, but it still falls well short of
being a general purpose tool.  Within strict limits, I suspect it is pretty
good.


VISUAL LEARNING AND RECOGNITION OF 3-D OBJECTS FROM APPEARANCE,
______________________________________________________________
by Murase and Nayar

This paper takes an approach similar to the first, with two main 
enhancements: it is generalized to 3-D objects rather than 2-D faces;
and it attempts to deal with lighting and orientation by building these
parameters into the training database itself.

Rather than having one or two training images of each object, as the
original eigenface project did, the authors create an image at each of
a controlled set of orientations, and with controlled lighting changes.
This forms a complex set of representational vectors whose values define
a manifold.  Handled correctly, it is possible to present the system
with uncontrolled images and have it interpolate between known orientations
and lighting conditions.

In addition, the system can handle more than one object.  It does this by
creating two separate eigenspaces: one formed from the set of average 
representation of objects (the "universal" space) to distinguish between
objects, and one formed from the different appearances of a single object
under the range of orientation and lighting created at training time
(the "object" space).

The universal space is used to select which object most closely resembles
a new image presented to the system; the object space then tries to determine
its orientation and lighting to verify the identification.

The system is useful in that it identifies a method to deal with some of
the variables that caused problems for the eigenface system, namely lighting
and orientation.  The two-phase search method need not be limited to the
authors' choice of universal and object spaces, either -- it could be
generalized for other training choices.

The training step itself seems overwhelming: it involves a special setup
a 2- or 3-D rotational device, plus specialized movable lighting, and
considerable training time.  It is unlikely it could be used for many
objects, and certainly can't handle the addition of new images to its
database the way the eigenface system can when a new face is shown to it.
To be fair, though, I'd have to say that many (most?) objects have
predominant orientations which would preclude the necessity of training
from every conceivable vantage point.  The same can't be said of lighting,
though.

In addition, the number of sample points means this is probably much more
computationally expensive than the eigenfaces, when the number of views and
lighting differences are multiplied by the number of objects.  In addition,
the average eigenobject may be fairly meaningless -- at least faces have
a common organization.  This may mean that the dimensionality of the
eigenspace needs to be much higher than for faces.

Also, unlike faces, many typical objects have non-linear components such
as textural textural features or highlights that may not be well-represented 
in the database, or which may be very susceptible to scale.

Nonetheless, if the input can be sufficiently controlled (a big if),
this could be quite powerful.  For instance, it offers a way to handle
facial rotation in all three dimensions for the eigenface problem.  I
seriously doubt whether this would work as a general interactive object
finder, though, most importantly because of the training issues.



Stan Sclaroff

Created:  Oct 1, 1995

Last Modified: