The system has the advantages of a planar face tracker (reasonable simplicity and robustness to initial positioning) but not the disadvantages (difficulty in tracking large rotations). The main differences are that a.) self occlusion can be managed and b.) better tracking of the face can be achieved through the use of a texture map mosaic acquired via view integration as the head moves.
The following is a brief overview of the head tracking approach. Readers are referred to the technical report. for a detailed description of the approach.
Our technique is based directly on the incoming image stream; no optical flow estimation is required. The basic idea consists of using a texture mapped surface model to approximate the head, accounting in this way for self-occlusions and to approximate head shape. We then use an image registration technique to fit the model with the incoming data.
To explain how our technique works, we will assume that the head is exactly a cylinder with a 360 degree wide image, or more precisely a movie due to facial expression changes, texture mapped on its surface. Obviously only a 180 degree wide slice of this texture is visible in each frame. If we know the initial position of the cylinder we can use the incoming image to compute the texture map for the currently visible portion, as shown in Fig. 1. The transformation to project part of the incoming frame in the corresponding cylindrical surface depends in fact only on the 3D parameters of the cylinder and on the camera model.
As a new frame is acquired it is possible to find a set of cylinder's parameters such that the texture extracted from the incoming frame best matches the reference texture. In other words, the 3D head parameters are recovered by performing image registration in the model's texture map. Due to the rotations of the head the visible part of the texture can be shifted respect to the reference texture, in the registration procedure we should then consider only the intersection of the two textures.
The registration parameters determine the projection of input video onto the surface of the object. Taken as a sequence, the project video images comprise a dynamic texture map , as shown in Fig. 2. This map provides a stabilized view of the face that is independent of the current orientation, position and scale of the surface model.
At this point the tracking capabilities of this system are only slightly better than that of a planar approach, because a cylinder is a better approximation of a face respect to a plane. The key to allowing for large rotation tracking consists of building a mosaicked reference texture over a number of frames, as the head moves. In this way, assuming that there are no huge interframe rotations along the vertical axis, we always have enough information to keep the registration procedure working. The resulting mosaic can also be used as input to face recognition.
For details of the head tracking formulation, readers are referred to the technical report.
Last Modified: March 5, 1998