Objects accessed through today's "Cyber World,"
(i.e., the Web) are virtual. They can be controlled and replicated. They
are served through well-defined procedures. Each object has a name, the
Uniform Resource Locator (URL) that allows its retrieval. On the other
hand, objects in the "Real World" are physical. They can only be
accessed through sensory means (e.g., web-cams that monitor physical
spaces and transmit live video over the internet). Currently, objects in
these physical spaces can only be passively observed through the
Web. The RealityWeb will enable an active understanding of the physical
world. It will give physical objects, which include things and places,
as well as people and their activities, unique identities. The object
identity is not simply linked to a physical space and accessed with the
web-cam's URL, but it instead has its own "Uniform Resource Identity"
(URI), which can be used to search for the object in all physical spaces
that are accessible through the Web. The object can then be uniquely
retrieved, and actively monitored and tracked.
The existing infrastructure of web-cams is the result
of an explosive and ad-hoc growth of camera installations all over the
world. In addition to web-cams, a vast number of digital video cameras
have been deployed for surveillance purposes, for example, of stores,
ATM machines, airports, and other public and commercial facilities.
These cameras are untapped resources that provide the opportunity to
merge the physical and cyber worlds in an integrated, well-defined, and
privacy-protecting manner. There are also large-scale research projects
that propose the use of video sensors in intelligent rooms, homes,
instrumented classrooms, roadways, and vehicles. In addition, there has
been extensive work in the area of visual surveillance of human
activity. Most of these projects focus on the computer vision aspects of
the problem and disregard the complex networking, systems, and data
management issues. None have tried to embed the monitored physical space
into the Web. Similarly, much of the research in sensor networking has
abstracted away specific application-level issues that arise in this
context.
At any instant in time, we refer to the finite
spatial extent visible to a particular video camera as that sensor's
view volume; objects within that volume are recorded at a resolution
which can vary due to pixel and frame rate sampling. For a group of
video cameras, the union of view volumes can be used to define a
domain volume for a monitored room or public space. Each video
camera is attached to PCs for processing and indexing sensory
information. A user with her mobile unit can ask the Sensorium to
perform high-level, automated sensing tasks, such as locating and
tracking a human moving through the Sensorium. Queries can be sent over
a backbone network either through a wireless access point or through
another user within communication range (as in ad-hoc networks). In
response to user queries, or in cooperation with other sensors, cameras
perform basic operations such as adjusting their resolution or altering
their view volume. Responding to a single such query is already a
challenging research problem from a networking and computer vision
standpoint, but we are interested in an even broader challenge:
Designing a scalable Sensorium system architecture that could be easily
reproduced to create a Web of domain volumes (e.g. various
rooms in various buildings on campus) that comprise a RealityWeb
accessible to users through an appropriate RealityWeb Browser.
To build the RealityWeb, we will develop vision,
database, and network services that are capable of gathering,
interpreting, routing, and storing data from distributed video sensors.
These services will be used to answer queries about the physical world
on the Web---in other words, enable us to "surf the physical world".
In contrast to current sensory systems which are
conceived (and built) as special purpose systems with
custom-developed architectures, we intend to build our architecture from
commodity hardware and with freely available software, opening up much
wider access to sensory systems to the computer science research
community. However, our desire to develop sensory commodity system for a
broad class of environments also highlights some of the current
impediments to building such a system. Among these, we believe that (1)
multi-resolution encoding of sensory data, and (2) cognizance of spatio-temporal
and resource constraints at every level of a system's architecture will
be critical core technologies. These two threads are common to all
Sensorium research projects supporting the RealityWeb.