BU GRS CS 680
Graduate Introduction to Computer Graphics

Readings for April 9, 1997

A. State, G. Hirota, D. Chen, W. Garrett, and M. Livingtson. Superior augmented reality registration by integrating landmark tracking and magnetic tracking. In Computer Graphics Proceedings, ACM SIGGRAPH, pages 429--438, 1996.
P. Maes, T. Darrell, B. Blumberg, and A. Pentland. The {ALIVE system: full-body interaction with autonomous agents. In Proc. of Computer Animation Conference, Switzerland, April 1995.

Commentary

Alia Atlas

Paragraphs about Papers for CS680 for 4/9/97

by: Alia Atlas

Superior Augmented Reality Registration by Integrating Landmark Tracking and Magnetic Tracking

A method for tracking head movement is given in this paper. This allows the accurate overlay of virtual objects in a scene with real objects. The computer models the real and virtual objects, and projects the virtual objects accurately superimposed upon the reality. In order to do this, the computer must know how the viewer's head is moving. Previous systems have used computer-vision, magnetic tracking and mechanical tracking in an effort to determine this. However, all of these have problems, either in speed, accuracy or user limitations.

This paper combines vision and magnetic tracking to produce a quicker and more reliable system. Rather than requiring arbitrary feature recognition and full determining of the real scene, the computer is restricted to recognizing certain landmark features. When the computer sees enough landmarks, then it can resolve the probable head location and orientation. The magnetic tracking helps determine the most probable solution and aids in locating the landmarks. To find a landmark, the area most likely to hold it must be searched, pixel by pixel. The magnetic tracking helps determine the first areas to search, until using the landmarks can resolve the likely locations of other landmarks.

The resulting pictures of the combinations of real and virtual objects are, I think, well done. Because the computer knows the proper mappings into the real world from its model world, virtual shadows can be imposed on real objects, and vice versa. Real and virtual objects convincingly interpenetrate each other. Of course, the current limitations of the method are obvious, such as needing landmarks painted on all real objects, and the smallness of the world thus modeled. However, it seems like a good start.

The ALIVE System: Wireless, Full-body Interaction with Autonomous Agents

This paper describes a virtual reality system where the user has a third person view of the world, rather than a first person's. This view is made more realistic by the user's agent being displayed via real-time video from a camera. The system can tell where the user's body is by analysis against a static background. In addition, the system uses a camera and light to determine the z coordinate or depth of the body in the room.

Using this information, the system creates a virtual agent in the virtual world, which does exactly what the user does. The other objects and agents in the virtual world interact with this user agent via the same sensory input/output as with any other virtual object or agent. The physical interface and the ability to reasonably locate a person's body in the physical and thus virtual world seems very useful.

Beyond that, the system is dressed up with somewhat intelligent agents who can watch the user's agent, and determine if the user has made various gestures. Similarly, speech-processing software has been used to allow the user to issue verbal commands. These are both very good in terms of user interface.

I do have some doubts about the system, however. Much of what is described seems like, not exactly quick hacks, but irrelevant window dressing. Basically, by making the view third person on a flat large screen, the problem has been greatly reduced. While this solution is, evidently, well-received and popular, much of the desired reality seems like it may be missing. Also, though the paper states that the world is 3D, from the paper, it seems as if this is not done in a perspective or natural manner, since UP corresponds to further z coords and DOWN on the screen corresponds to closer z coords. I do wonder why they didn't choose to do even some simple form of perspective viewing using the body's z value to derive the necessary camera coordinate information. Also, while interesting, the paper seemed much more concerned with touting the system's superiority, rather than adequate technical detail to let one fully decide. Perhaps this is because of the intended audience.

Timothy Frangioso

"The ALIVE System - Wireless, Full-body Interaction with Autonomous Agents"

This paper details and interface to a virtual reality system that does not use any cumbersome input devices. There are no gloves, goggles or other attach decides. Instead the system uses a feedback screen to allow the user to interact within the virtual environment. The user actions are input by a video that is feed into the system. The users image is then displayed on the screen giving the user a third person view of themselves. This allows the to better respond to the environment because it solves the problems of disorientation that accompany other user centric views. (This approach is called the magic mirror technique) This interface has a major advantage it allows the user to interact with the agents within the system by using simple gestures. The strength of this system is that it allows that user to interact and a free and natural way with the environment. These simple gestures can have very complicated meanings.

"Superior Augmented Reality Registration by Integrating Landmark Tracking and Magnetic Tracking"

This paper discusses a system to do object tracking within an augmented reality system. This is it details a system for merging the two artificial and virtual objects into one user centric view by using landmarks or flags that are in the scene to get the camera orientation.

The system uses both vision-based tracking and magnetic tracking to accomplish this registration and orientation. The vision based tracking assists in four ways Image analysis, Selection from multiple solutions, backup tracking and Sanity checks. By taking in a stereo image the tracking system determines the head position by finding the landmarks and calculating the difference from the last frame and adjusting. The problem with this system is that it will not handle unexpected movement well and there are some problems that will occur when the landmarks within the image can not be detected properly.

Scott Harrison

Leslie Kuczynski

"Superior Augmented Reality Registration by Integrating Landmark Tracking and Magnetic Tracking", A.State, G.Hirota, D.T.Chen, W.F.Garrett and M.A.Livingston

In this paper the authors present a "hybrid" tracking system that integrates computer vision techniques with magnetic tracking. The motivation for the system stems from the need to have "accurate registration between synthetic and real objects" in an augmented reality environment. In other words, virtual objects should appear convincingly real from a users perspective as if they were actual objects in the environment. Additionally, interaction with these objects should mirror that of interaction with objects in our own environment. That is, we would not want to reach for a virtual object only to find that we need to grasp the air to the left of the object to move it.

The basic idea that drives the system is the use of "landmarks" within the scene to facilitate accurate registration. For each frame of video the system searches for landmarks based on predictions made as to their position. If found head pose is computed based on the landmark positions. Coupling of the magnetic tracker and the vision-based tracker comes into play when the vision-tracker cannot find enough landmarks to compute head pose (i.e. three non-collinear points are needed for triangulation). Information from the magnetic traker is used to correct and adjust head pose. This is considered to be underdetermined. Clearly, this will not always be the case. We also encounter well-determined and over-determined cases. The underdetermined and well-determined cases can produce many differerent solutions for head-pose among which the best one must be chosen. A least squares optimization technique is applied in the over-determined case to converge to a solution. However, this solution is not desirable to the under and well-determined cases due to the fact that solutions would be excluded before checked. Thus, heuristics are employed to choose the best solution.

Some limitations dicussed involve (1) lack of syncronization between the magnetic tracker and the vison-based subsystem, (2) the magnetic tracker lags behind the video camera images (i.e. this causes problems with abrupt movement), (3) time difference between top and bottom scanlines for images is not taken into account and (4) poor performance due to harsh or changing lighting conditions.

Future work discussed included (1) correcting the above limitations and (2) attaching the landmarks to moving objects.

"The ALIVE System: Wireless, Full-body Interaction with Autonomous Agents", P.Maes, T.Darrell, B.Blumberg and A.Pentland

I can't help remembering attending a local sporting event and trying out a system that used some of the same ideas as the ALIVE system. I stood against a black backdrop facing a large screen on which my image was projected. However, the projected me was not standing against a black backdrop. Rather, my image was standing in front of a soccer goal. Suddenly soccer balls were bombared at my image and I for a brief while I could almost believe that I was a full fledged soccer goalie. I didn't think much about the technology at the time but found my self revisiting my experience while reading about the ALIVE system.

The main difference that seperates the ALIVE system from the system I used is the ALIVE system's use of intelligent agents. Additionally, the ALIVE system has support for tracking much more complicated behavior (e.g., hand gestures) than the system I used. I assume that the soccer system performed simple collision detection between my body and the ball. It did not care which part of my body hit the ball only that it did.

Many of the techniques used by the ALIVE system to track a user and to animate the creatures were ones that we have read about before. It was interesting to see everything brought together into world in which we could then be participants.

Although it does have some benefits, I question the amount of attention drawn to the wireless interface. The main thing we loose is the ability to recieve physical feedback (i.e, we push a button and actually feel it).

Geoffry Meek

Superior Augmented Reality Registration by Integrating Landmark Tracking
and Magnetic Tracking
State, Hirota, Chen, Garrett, Livingston

Observations:
1. Replace landmarks with textures underneath
2. Certainly an image-space algorithm
3. The visual tracker is a lot like a visual motion capture system
4. Besides landmark occlusion, the magnetic system doesn't seem that
useful
5. What is the frame rate (15Hz stereo?) seems nauseously low.
6. How are the polygonal models mapped to the real-world stuff?


Hardware
Heavy-duty. An Onyx, Head-Mounted display, etc. -- $750,000 - $1.25MM

Landmark Predictor
Computes expected position of landmark in image space. It determines
search space (for image analyzer)

Image Analyzer
Every pixel is classified as belonging to one of the landmark colors.
Starts in the search area determined by the landmark predictor, then
gradually increases the search space (time consuming)
The circular landmarks are tested for:
* Correct area ratio (8:1)
* Centers of mass concentricity (for detecting clipping of partial
occlusion)

Head Pose Determination
Under-determined case:
 Need three landmarks to completely determine head pose
 Dependence on the Magnetic Tracking system
Well-determined case:
Out of 8 solutions, only two tend to be useful (positive, real) the final
one is picked by checking results with landmarks NOT used in the
calculations. If there is still a problem, they use the magnetic tracker.

Romer Rosales

Lavanya Viswanathan

1) A. State, G. Hirota, D. Chen, W. Garrett, and M. Livingtson. Superior augmented reality registration by integrating landmark tracking and magnetic tracking. In Computer Graphics Proceedings, ACM SIGGRAPH, pages 429--438, 1996.

This paper discusses presents a tracking method for augmented reality applications. The field of Augmented Reality is closely related to that of Virtual Reality in that while Virtual Reality deals with ways in which to immerse the user in a synthetic computer-generated world, augmented reality is concerned with the complementary problem of positioning computer-generated "objects" or images in the user's world (or the "real" world). Similar issues are of importance in both these fields such as how to ensure that the relative positions of the user and the object and also other parts of the world are constant at all times (except in the case of relative motion, which should then be smooth) and user-object interaction issues, in terms of how realistic it is and whether or not it can be done in real-time. For instance, it would be undesirable to have a delay between the performance of an action by the user and the updating of the display. In augmented reality applications, the major issue is that of accurate registration of the position of objects in the real-world and the user with respect to that of the computer-generated object(s). The errors that are thus created are mainly due to limitations of the tracking system. The authors attempt to solve this problem by proposing a hybrid tracking mechanism that combines the accuracy of vision-based tracking systems and the robustness of magnetic tracking systems.

2) P. Maes, T. Darrell, B. Blumberg, and A. Pentland. The {ALIVE system: full-body interaction with autonomous agents. In Proc. of Computer Animation Conference, Switzerland, April 1995.

This paper is in the field of Virtual Reality and proposes a system that allows full-body interactions between a user and a graphical computer-generated world. The interesting thing about the system proposed by the authors is that it is completely wireless. The system is called the ALIVE ("Artificial Life Interactive Video Environment") system. I found this paper especially interesting because the authors model autonomous semi-intelligent agents with sensory-motor interactions with their environment and a repertoire of behaviors to perform given the internal state of the system. Of course, this is nowhere close to being able to completely model a "creature" because it makes the assumption that one has prior knowledge of all possible states that the agent could possibly find itself during its lifetime. So in this sense, the agent's behavior is deterministic. No adaptation is allowed. Now, for a real "creature" in the real-world, adaptation is crucial for survival and no organism has prior knowledge of all the internal states that it can have. Besides, it becomes extremely difficult to be able to model the internal state of such an organism. Even the fact that some internal representation of the external world (coded in some sense by the internal state of the organism) is debatable. However, the methods proposed in this paper are interesting and the system has several possible applications, though I would still maintain that modeling a virtual world would require the existence of adaptive and completely autonomous agents, agents that rely completely on their own intelligence for survival, rather than on the intelligence of the scientists that model them. Evolution must also play a role here.



Stan Sclaroff

Created:  Jan 21, 1997

Last Modified: Jan 30, 1997

BU GRS CS 680 Graduate Introduction to Computer Graphics