BU GRS CS 680: Graduate Introduction to Computer Graphics --- Class commentary on articles

BU GRS CS 680
Graduate Introduction to Computer Graphics

Readings for April 2, 1997

M. Gleicher and A. Witkin. Through-the-lens camera control. In Computer Graphics Proceedings, ACM SIGGRAPH, pages 331--340, 1992.
S. M. Drucker and D. Zeltzer. CamDroid: A System for Intelligent Camera Control. In Proc. SIGGRAPH Symposium on Interactive 3D Graphics, 1995.

Commentary

Through-the-Lens Camera Control

This paper presents an intuitive method for user control of a camera. The user can specify that certain points appear at certain locations in the image and can control the motion of those points during animation, independent of the 3-D model motion of those points. This makes it significantly easier to compose and produce images and animations which meet the user's goals.

This system uses velocities and other first derivatives to solve the equations and determine the set of possible camera descriptions which will satisfy the constraints imposed by the user. It is necessary to use first derivatives because the system using actual positions is nonlinear and significantly harder, if even possible, to solve. If the user has not specified enough constraints to result in only one possible camera description, then the system will supply reasonable additional constraints. The user is prevented from overspecifying constraints which would result in no feasible camera description; this is implemented in the user interface, so that the user can't drag a point to a location where it would overconstrain the system. Naturally, a least-squares fit could be taken to overspecified systems, but preventing them from occuring is preferable.

CamDroid: A System for Implementing Intelligent Camera Control

This paper provides a method for breaking up desired camera motions into seperate modules, each of which has a given task. Thus, one module may be specified to follow a specific character. The user can create the different modules, which are given a series of constraints which must be followed when that module is active. In addition to the constraints, each module has a controller and an initializer. The control takes the constraints, its state, and the world-state and determines the new camera description, which is its new state. The user can specify movement from one camera module to another. When a camera module is given control, its initial conditions are generally the final state of the previously controlling camera module.

Once these modules have been created, the user can create a graph, directing how camera control should change among the modules based upon events in the model world. The authors have used this to create footage of virtual conversations and football games.

Timothy Frangioso

`Through-the Lens Camera Control" by Michael Gleicher and Andrew Witkin

This paper describes a system that allows the user to use a virtual camera by controlling constraining features in the images as would be seen through a lens. Most formulations of camera models have been based on the perspective projection modal. In these models the 3-D view is fully specified by giving all the associated parameters. The major problem is every time that new controls are desired the parameters have to be specified again and a new set of controls have to be created. One can not simply solve the problem by picking particular points in the image and working to new desired points. There are two many degrees of freedom involved.

This paper claims to have a general solution to this problem of having to specify all the parameters for every type of control that is desired. The problem has been redefined to be one of creating a user interface that will enable the ability of dragging the desired object to its final position rather than of specifying the final position and having it move there. When this assumption of the problem is made the method consists of solving for the time derivatives of the camera parameters. This method uses the interface to give velocity information to the system so that it can solve for the parameters "on the fly"

"CamDroid: A System for Implementing Intelligent Camera Control" by Steven M. Drucker and David Zeltzer.

This paper describes a system for encapsulating camera shots or motions into modules. The basic idea is that controlling the 6 degrees of freedom of the camera is not necessary to give a system for controlling a camera. This method breaks the problem into these small pieces and then creates a high level user interface that the user can call to have the camera perform actions without specify the camera parameters directly. The camera modules serve to control the actions that the camera can take. These modules where defined as shots in a movie. Where the shots are predefined as modules. The question that was left unanswered was how hard is it to create these modules in the first place. Can you create these modules on the fly or do you have to have a well defined problem like the football game or the conversation to use this technique.

Scott Harrison

Leslie Kuczynski

"Through-the-Lens Camera Control", Michael Gleicher and Andrew Witkin
In this paper the authors present a method, or as they call it a "body of techniques", by which a user can manipulate a virtual camera independently of camera parameters. The key benefit of the technique seems to be the decoupling of camera manipulation from camera parameters. A single camera parameterization will not support arbitrary control and it is often necessary to change the parameterization (i.e., select a pre-existing model or define a new one). Clearly what is desired is flexible model that offers a general solution.

Using through-the-lens camera control, a user can interactively move (drag) or "pin" image points within the "image-space" of the image. Additionally a user is free to specify "world-space" controls which include things such as camera orientation and position. The authors state that the hard problem is the non-linearity of the relationship between the controls and the underlying view specification (i.e., what initial camera parameterization was chosen). Their solution (the key point) is, instead of trying to solve for the camera parameters directly, they solve the time derivatives of the camera parameters, given the time derivatives of the controls. Assuming an initial camera model, the time derivatives for the parameters are solved such that the variance from the desired value is minimized (i.e., they try to "fit" the camera parameters to the user specified control points).

Once this has been done, the entire state of the camera can be recomputed and updated. This method constrains instantaneous camera positioning since the information needed depends upon time derivatives (i.e., change in camera position over time) but lends itself nicely to applications such as animation and motion tracking.

A problem arises when a user desires more degrees of control than the camera has degrees of freedom. The authors suggest two solutions (1) using a least squares method to fit the degrees of control into the domain of the camera and (2) constraining image points or features based on the camera's degrees of freedom. Another possible method could be to drop user input controls that are outside of the camera's "ability" or to introduce control priority which could be specified by the user. If priority is introduced the least squares problem could made into a weighted least squares problem and dropping user input could be priority based instead of arbitrary.

Geoffry Meek

Romer Rosales

The ALIVE System: Wireless, Full-body Interaction with Autonomous Agents

Maes, Darrel, Blumberg, Pentland
(Article Review)

This paper discusses the design and implementation of a wireless full-body interaction between a human and a virtual world inhabited by autonomous agents. They use a 3D video camera to obtain a color image of the person which is composited in the 3D world , so the result is projected onto a large screen with the person and all the objects and agents defined.

To model the agents, they use a combination of behavioral and motivational elements in the model, they also used a vision interface that allows the agents to use the user's location, body pose, gestures and sensory inputs to perform certain activities. It incorporates a behavior modeling tool kit, which is the result of other research and that allows for the development of semi-intelligent autonomous agents. It is necessary to keep in mind that there are problem with sensing and perception, so the agent's perception of the world is inaccurate most of the times. They listed the specifications needed to build the model, which I think, can achieve a high level of complexity. A special 3D agent is the user, which is computed using the vision system. The advan tage with this is that the agents can use the same sensors to perceive the user in the same way they perceive objects.

The vision interface works with no special tools or marks, using the mirror paradigm. It represent 3D worlds and interactions on it and allow users to interact with virtual agents by using recognition techniques to understand body gestures, patterns in ti me and space.

They developed some vision routines to perceive body actions, for this, the systems needs to find location of head, hands, and other important features. They use a calibrated camera and assumed that the background is fixed. So to establish the connection to the real world, the system uses figure-ground processing, scene projection, hand tracking, and gesture interpretation.

They use connected components, morphological analysis, computation of mean and variance of background. This is necessary to compute pixel class membership. A color classification is used to compute figure/ground segmentation, shadow regions are identified and a normalization process takes place. Markov statistics are also used.

Basically, a seed pointing at the centroid location of the user in the previous frame is used to grow a region, if this fails random seed point are selected. I think that they could use a more robust estimation technique in this case.

Once this is done a 3D estimation of the location of the user in the world is computed, using knowledge of the camera geometry. They paid special attention to the hand position, as an important feature to be tracked because of its importance to interact w ith the agents, so they implemented a feature localizator to determine hand location. In general, 2D position of hands, depth of the user's body, and gesture information as the user info to manipulate objects. They work most of the time, but there are cas es in which it give erroneous estimations.

I think that there are a lot of good implications related to the use of no interface device (wireless sensors), more freedom, easiness, safer, comfortable, etc, although this makes some problems more difficult to solve, specially because we need to desig n algorithms that perceive what can easily be perceived by an interface device. These algorithms are still inaccurate and take a lot of computation time.

The idea of a 3rd viewer point of view is also very interesting, they said that users felt more comfortable in this way.

In general this work is a nice combination of vision techniques for tracking and for segmentation, 3D modeling graphics techniques, camera modeling, and behavior modeling. I think a lot of applications can be derived from this work.

Although it is necessary to use domain knowledge, I think that this is general approach to this kind of systems and a good working example that can be improved with new techniques.

Superior Augmented Reality Registration by Integrating Landmark Tracking and Magnetic Tracking

State, Hirota, Chen, Garret, Livingston
(Article Review)

This paper deals with the problem of accurately register real and virtual objects for augmented reality applications. They present a technique that combines the capabilities of a vision-based tracker and magnetic tracking.

The problem of registration in AR refers basically to how to make a virtual object appear in the proper place in the real world and in general interact within it. How to perform dynamic registration when moving in the environment is a related problem.

They developed a hybrid tracking mechanism that uses magnetic and vision-based tracking, video tracking of landmarks (color-coded, which improve performance and robustness) for determining camera position and orientation. 3 Landmarks are used, which are e nough to determine camera position-orientation. They used a non-linear equation solver and a local least-squares method for minimization. The magnetic tracker is calibrated in real time using the vision-based tracker.

They used the magnetic tracker for narrowing the landmark search, selection of multiple solutions (in a non-linear equation), for primary tracking technique when landmarks cannot be located by the vision-based tracker and to avoid some problems with the i nstability of the vision tracker.

Using the assumption that the geometry of the devices is known, also that the calibration parameters have been established, and that the location of the landmarks are calibrated then the system can perform well enough.

The system tries to determine the head pose from the landmarks, then computes the error-correcting transformation between the magnetic tracker reading and the head pose computed by the vision tracker. This is used to predict the head pose in the next fram e (using the temporal coherence assumption).

They consider 2 cases once landmarks are found. If the number of detected landmarks is not enough to give a solution, it uses a local, heuristic head pose adjuster, if it is well determine or over-determined, its global solver is called, but it may comput e multiple solutions, in this case the magnetic tracker helps choosing a solution. A local least square optimizer also helps in finding the optimal head pose. They described in more detail the landmark tracking and the head pose determination in every cas e.

Their system gives good results in their tests, which used simple and more complicated geometric objects. The camera position and orientation error is difficult to determine for the lack of ground truth values, they developed a simulator for the camera vi deo images, which gave small errors.

I think that the use of landmarks decrease the magnitude of the problem a lot. Prediction could be better than just using previous frames.

In general I think that this is a very helpful model of a tracking system using for AR, it provides good ideas of the problems that we may find in such applications. It is a real implementation, and I think that it is not very easy to build working applic ations in this area. According to them, current AR systems cannot really meet the requirement of accurate registration.

Lavanya Viswanathan



Stan Sclaroff

Created:  Jan 21, 1997

Last Modified: Jan 30, 1997