This paper combines vision and magnetic tracking to produce a quicker and more reliable system. Rather than requiring arbitrary feature recognition and full determining of the real scene, the computer is restricted to recognizing certain landmark features. When the computer sees enough landmarks, then it can resolve the probable head location and orientation. The magnetic tracking helps determine the most probable solution and aids in locating the landmarks. To find a landmark, the area most likely to hold it must be searched, pixel by pixel. The magnetic tracking helps determine the first areas to search, until using the landmarks can resolve the likely locations of other landmarks.
The resulting pictures of the combinations of real and virtual objects are, I think, well done. Because the computer knows the proper mappings into the real world from its model world, virtual shadows can be imposed on real objects, and vice versa. Real and virtual objects convincingly interpenetrate each other. Of course, the current limitations of the method are obvious, such as needing landmarks painted on all real objects, and the smallness of the world thus modeled. However, it seems like a good start.
Using this information, the system creates a virtual agent in the
virtual world, which does exactly what the user does. The other
objects and agents in the virtual world interact with this user agent
via the same sensory input/output as with any other virtual object or
agent. The physical interface and the ability to reasonably locate a
person's body in the physical and thus virtual world seems very
useful.
Beyond that, the system is dressed up with somewhat intelligent agents
who can watch the user's agent, and determine if the user has made
various gestures. Similarly, speech-processing software has been used
to allow the user to issue verbal commands. These are both very good
in terms of user interface.
I do have some doubts about the system, however. Much of what is
described seems like, not exactly quick hacks, but irrelevant window
dressing. Basically, by making the view third person on a flat large
screen, the problem has been greatly reduced. While this solution is,
evidently, well-received and popular, much of the desired reality
seems like it may be missing. Also, though the paper states that the
world is 3D, from the paper, it seems as if this is not done in a
perspective or natural manner, since UP corresponds to further z
coords and DOWN on the screen corresponds to closer z coords. I do
wonder why they didn't choose to do even some simple form of
perspective viewing using the body's z value to derive the necessary
camera coordinate information. Also, while interesting, the paper
seemed much more concerned with touting the system's superiority,
rather than adequate technical detail to let one fully decide.
Perhaps this is because of the intended audience.
"The ALIVE System - Wireless, Full-body Interaction with Autonomous
Agents"
This paper details and interface to a virtual reality system that does not use
any cumbersome input devices. There are no gloves, goggles or other attach
decides. Instead the system uses a feedback screen to allow the user to
interact within the virtual environment. The user actions are input by a
video that is feed into the system. The users image is then displayed on the
screen giving the user a third person view of themselves. This allows the to
better respond to the environment because it solves the problems of
disorientation that accompany other user centric views. (This approach is
called the magic mirror technique) This interface has a major advantage it
allows the user to interact with the agents within the system by using simple
gestures. The strength of this system is that it allows that user to interact
and a free and natural way with the environment. These simple gestures can
have very complicated meanings.
"Superior Augmented Reality Registration by Integrating Landmark Tracking
and Magnetic Tracking"
This paper discusses a system to do object tracking within an augmented reality
system. This is it details a system for merging the two artificial and virtual
objects into one user centric view by using landmarks or flags that are in the
scene to get the camera orientation.
The system uses both vision-based tracking and magnetic tracking to
accomplish this registration and orientation. The vision based tracking assists
in four ways Image analysis, Selection from multiple solutions, backup tracking
and Sanity checks. By taking in a stereo image the tracking system determines
the head position by finding the landmarks and calculating the difference from
the last frame and adjusting.
The problem with this system is that it will not handle unexpected movement
well and there are some problems that will occur when the landmarks within the
image can not be detected properly. In this paper the authors present a "hybrid" tracking system
that integrates computer vision techniques with magnetic tracking. The
motivation for the system stems from the need to have "accurate registration
between synthetic and real objects" in an augmented reality environment.
In other words, virtual objects should appear convincingly real from a
users perspective as if they were actual objects in the environment. Additionally,
interaction with these objects should mirror that of interaction with objects
in our own environment. That is, we would not want to reach for a virtual
object only to find that we need to grasp the air to the left of the object
to move it. The basic idea that drives the system is the use of "landmarks"
within the scene to facilitate accurate registration. For each frame of
video the system searches for landmarks based on predictions made as to
their position. If found head pose is computed based on the landmark positions.
Coupling of the magnetic tracker and the vision-based tracker comes into
play when the vision-tracker cannot find enough landmarks to compute head
pose (i.e. three non-collinear points are needed for triangulation). Information
from the magnetic traker is used to correct and adjust head pose. This
is considered to be underdetermined. Clearly, this will not always be the
case. We also encounter well-determined and over-determined cases. The
underdetermined and well-determined cases can produce many differerent
solutions for head-pose among which the best one must be chosen. A least
squares optimization technique is applied in the over-determined case to
converge to a solution. However, this solution is not desirable to the
under and well-determined cases due to the fact that solutions would be
excluded before checked. Thus, heuristics are employed to choose the best
solution. Some limitations dicussed involve (1) lack of syncronization between
the magnetic tracker and the vison-based subsystem, (2) the magnetic tracker
lags behind the video camera images (i.e. this causes problems with abrupt
movement), (3) time difference between top and bottom scanlines for images
is not taken into account and (4) poor performance due to harsh or changing
lighting conditions. Future work discussed included (1) correcting the above limitations
and (2) attaching the landmarks to moving objects. I can't help remembering attending a local sporting event and trying
out a system that used some of the same ideas as the ALIVE system. I stood
against a black backdrop facing a large screen on which my image was projected.
However, the projected me was not standing against a black backdrop. Rather,
my image was standing in front of a soccer goal. Suddenly soccer balls
were bombared at my image and I for a brief while I could almost believe
that I was a full fledged soccer goalie. I didn't think much about the
technology at the time but found my self revisiting my experience while
reading about the ALIVE system. The main difference that seperates the ALIVE system from the system
I used is the ALIVE system's use of intelligent agents. Additionally, the
ALIVE system has support for tracking much more complicated behavior (e.g.,
hand gestures) than the system I used. I assume that the soccer system
performed simple collision detection between my body and the ball. It did
not care which part of my body hit the ball only that it did. Many of the techniques used by the ALIVE system to track a user and
to animate the creatures were ones that we have read about before. It was
interesting to see everything brought together into world in which we could
then be participants. Although it does have some benefits, I question the amount of attention
drawn to the wireless interface. The main thing we loose is the ability
to recieve physical feedback (i.e, we push a button and actually feel it).
This paper discusses presents a tracking method for augmented reality
applications. The field of Augmented Reality is closely related to
that of Virtual Reality in that while Virtual Reality deals with
ways in which to immerse the user in a synthetic computer-generated world,
augmented reality is concerned with the complementary problem of
positioning computer-generated "objects" or images in the user's world (or
the "real" world). Similar issues are of importance in both these fields
such as how to ensure that the relative positions of the user and the
object and also other parts of the world are constant at all times (except
in the case of relative motion, which should then be smooth) and
user-object interaction issues, in terms of how realistic it is and
whether or not it can be done in real-time. For instance, it would be
undesirable to have a delay between the performance of an action by the
user and the updating of the display. In augmented reality applications,
the major issue is that of accurate registration of the position of
objects in the real-world and the user with respect to that of the
computer-generated object(s). The errors that are thus created are mainly
due to limitations of the tracking system. The authors attempt to solve
this problem by proposing a hybrid tracking mechanism that combines the
accuracy of vision-based tracking systems and the robustness of magnetic
tracking systems.
This paper is in the field of Virtual Reality and proposes a system that
allows full-body interactions between a user and a graphical
computer-generated world. The interesting thing about the system proposed
by the authors is that it is completely wireless. The system is called the
ALIVE ("Artificial Life Interactive Video Environment") system. I found
this paper especially interesting because the authors model autonomous
semi-intelligent agents with sensory-motor interactions with their
environment and a repertoire of behaviors to perform given the internal
state of the system. Of course, this is nowhere close to being able to
completely model a "creature" because it makes the assumption that one has
prior knowledge of all possible states that the agent could possibly find
itself during its lifetime. So in this sense, the agent's behavior is
deterministic. No adaptation is allowed. Now, for a real "creature" in
the real-world, adaptation is crucial for survival and no organism has
prior knowledge of all the internal states that it can have. Besides, it
becomes extremely difficult to be able to model the internal state of
such an organism. Even the fact that some internal representation of the
external world (coded in some sense by the internal state of the organism)
is debatable. However, the methods proposed in this paper are interesting
and the system has several possible applications, though I would still
maintain that modeling a virtual world would require the existence of
adaptive and completely autonomous agents, agents that rely completely on
their own intelligence for survival, rather than on the intelligence of
the scientists that model them. Evolution must also play a role here.
The ALIVE System: Wireless, Full-body Interaction with
Autonomous Agents
Timothy Frangioso
Scott Harrison
Leslie Kuczynski
"Superior Augmented Reality Registration by Integrating Landmark
Tracking and Magnetic Tracking", A.State, G.Hirota, D.T.Chen, W.F.Garrett
and M.A.Livingston
"The ALIVE System: Wireless, Full-body Interaction with Autonomous
Agents", P.Maes, T.Darrell, B.Blumberg and A.Pentland
Geoffry Meek
Superior Augmented Reality Registration by Integrating Landmark Tracking
and Magnetic Tracking
State, Hirota, Chen, Garrett, Livingston
Observations:
1. Replace landmarks with textures underneath
2. Certainly an image-space algorithm
3. The visual tracker is a lot like a visual motion capture system
4. Besides landmark occlusion, the magnetic system doesn't seem that
useful
5. What is the frame rate (15Hz stereo?) seems nauseously low.
6. How are the polygonal models mapped to the real-world stuff?
Hardware
Heavy-duty. An Onyx, Head-Mounted display, etc. -- $750,000 - $1.25MM
Landmark Predictor
Computes expected position of landmark in image space. It determines
search space (for image analyzer)
Image Analyzer
Every pixel is classified as belonging to one of the landmark colors.
Starts in the search area determined by the landmark predictor, then
gradually increases the search space (time consuming)
The circular landmarks are tested for:
* Correct area ratio (8:1)
* Centers of mass concentricity (for detecting clipping of partial
occlusion)
Head Pose Determination
Under-determined case:
Need three landmarks to completely determine head pose
Dependence on the Magnetic Tracking system
Well-determined case:
Out of 8 solutions, only two tend to be useful (positive, real) the final
one is picked by checking results with landmarks NOT used in the
calculations. If there is still a problem, they use the magnetic tracker.
Romer Rosales
Lavanya Viswanathan
1) A. State, G. Hirota, D. Chen, W. Garrett, and M. Livingtson.
Superior augmented reality registration by integrating landmark tracking
and magnetic tracking. In Computer Graphics Proceedings, ACM SIGGRAPH,
pages 429--438, 1996.
2) P. Maes, T. Darrell, B. Blumberg, and A. Pentland. The {ALIVE
system: full-body interaction with autonomous agents. In Proc. of Computer
Animation Conference, Switzerland, April 1995.
Stan Sclaroff
Created: Jan 21, 1997
Last Modified: Jan 30, 1997