BU CAS CS 585: Image and Video Computing --- Class commentary on articles

BU CAS CS 585: Image and Video Computing

Image Mosaics
November 21, 1996

Readings:

M. Irani, P. Anandan, and S. Hsu, Mosaic Based Representations of Video Sequences and Their Applications, Proc. 5th Int. Conf. Computer Vision, 1995, 605-611.
R. Szeliski, Video Mosaics for Virtual Environments, IEEE Computer Graphics and Applications, 16(2):22-30, 1996.

Kriss Bryan
Bin Chen
Jeffrey Considine
Cameron Fordyce
Timothy Frangioso
Jason Golubock
Jeremy Green
Daniel Gutchess
John Isidoro
Tong Jin
Leslie Kuczynski
Hyun Young Lee
Ilya Levin
Yong Liu
Nagendra Mishr
Romer Rosales
Natasha Tatarchuk
Leonid Taycher
Alex Vlachos

Kriss Bryan

The article Mosaic Based Representations of Video Sequences and Their Applications is very interesting. In the paper, various methods were discussed on the use of Mosaics and their types. The mosaics, depending on their type use different techniques to represent an image. These representations are different because of the techniques used.

In Satic Mosaics, moving objects tend to create a blur or a ghost like representation of the object as opposed to dynamic Mosaics which capture the picture with any moving objects a lot better. It is interesting that a pyramid, in fact a Laplacian pyramid can be used in the implimentation of a Mosaic. Construction of the mosaic requires alignment of previous images. It seems as though templates are used here somewhat like in object recognition programs to determine alignment or location of the last image.

The Mosaics seem very useful in that they could be used to represent an actual 3-D image for use in a virtual reality game. I was wondering whether both the static mosaic and the dynamic mosaic can be used together in that the dynamic mosaic could use the static mosaic as a template since the static mosaic is good at tracking non-moving objects.

Bin Chen

Mosaic Based Representations of Video Sequences and Their Applications

There has been a growing interest in the use of mosaic images to represent the video sequence completely and efficiently. This paper discusses the various issues related to developping mosaic representations as well as its applications.

There are several kinds of mosaic representations, such as the most common static mosaic, dynamic mosaic, and others which are based on the previous two like temporal pyramid and multiresolution mosaic. The paper presents the advantage and disadvantage of the above mentioned techniques and gives some example pictures to help understand. Then the major steps of constructing a mosaic are showed one by one, including image alignment, image integration and computation of significant residuals. Two mostly used application of mosaic representation are video compression and visualization. A comparison between mosaic based video compression and the existing standard video compression method MPEG is showed that the former one performs much better than the later one. Some other applications, like video enhancement, managing large digital libraries, etc are also introduced there in the paper.

In conclusion, this paper gives a good overview of mosaic representations, with its advantage and future applications. However, no details on how to implement those ideas are touched in the paper. Further topics such as how to expand 2D to 3D mosaic representations can be further discussed.

Video Mosaics for Virtual Environments

Virtual reality is an exciting area to explore today. It has so many applications from computer games to simulations. The technique on which virtual reality is based is video mosaic representaion. Video mosaic can give a spherical image of the environment. This article provides the foundamental computing algorithms related to that, and Most algorithms are related to computer graphics field.

To align images, the basic imaging equations are using homogeneous coordinates. 2D planar projective transformations is followed by rigid transformations and similarity transformations, affine transformations and full projective transformations. Several approaches to build environment map and recover projective depth. Mathematical details are given in the paper. Simple situations are first proposed with simple algorithms, then problems are found and solved by more complex ones. Test result demonstrates the performance of the algorithm is good on the example, but the author carefully mentioned that the performance in general case is still in uncertain. Broad areas of applications using the video mosaic models are discussed at the end of this paper. However, the algorithm is not perfect and may produce some erroreous results. Further researches are still needed on more challanging topics as truly realistic virtural environments, and can be applied to even more applications.

Jeffrey Considine

Video Mosaics for Virtual Environments

In this paper, Szeliski presents a method for combining overlapping similar images into a mosaic to produce a larger image of the scene. To align the images and find the transform, he uses the method of gradient descent or the Levenberg Marquardt iterative nonlinear minimization algorithm to find local solutions. This seems rather expensive and as he notes daes not alwats find the best solution. For this reason, hierarchical matching - alignment of smaller subsampled versions of the image, and phase correlation - alignment through comparison of the Fourier spectrum, are used to focus in on the correct alignment.

This method for combining images seems like it would be most useful in its simplest application - combining images into a mosaic, not building worlds for the much hyped virtual reality. It could be used to combine the output of several cameras into a single larger video stream with more detail but less bandwidth than the sum of the inputs. Alignment could be sped up by using the previous transform as a starting point much like the way snakes are used with video streams. It also seems that this method would be more useful if the alignment process involved less brute force.

Mosaic Based Representations of Video Sequences and Their Applications

In this paper, Irani, Anandan, and Hsu discuss the uses of image mosaics for video sequences. The basic premis is that much of a video sequence, especially the background, will be essentially the same. This allows the construction of image mosaics that collapse this redundancy into a more compact form. The basic forms the use are the static mosaic - a single still combining the whole video sequence, and the dynamic mosaic - an initial mosaic plus the changes to the transform and the residuals needed to iteratively construct the later frames. There are other types of video mosaics but these seem to be the most common and useful.

The authors discuss applications such as video compression, low data rate transmission, and visualization. Low data rates can be achieved by using the dynamic mosaic, since only the differences between frames are transmitted. Visualization can be aided by use of the static mosaic, since it will contain more information than any single frame and will have information from all the frames. One can also construct a video sequence of the whole mosaic though only the section corresponding to the current frame will actively be changing and the cost will be less than that of a dynamic mosaic since only the residuals need to be transmitted (differences in the transform can be dropped). The third major application they discuss seems to be the most useful - video enhancement. Since most parts of a scene are visible in most of the frames, the redundant information can be combined to form a higher resolution image.

Cameron Fordyce

Video Mosaics for Virtual Environments by R. Szeliski

This paper presents a procedure for automatically aligning a large number of overlapping or relatively similar images into an image mosaic. An application to creating virtual reality environments using this technique with the output of continuous scene recording with a video camera is suggested. The author also suggests other applications for this method such as the alignment of hand-held scans of documents and automatic navigation. In short, this technique promises to increase the resolution of the recording device by allowing the images to be recorded in successive frames and automatically aligned while taking into account the variables such a camera model, and movement of the camera.

This paper seems a logical extension to one of the first papers that we read concerning the splining of image mosaics using multi-resolution techniques( see P.J. Burt and E. H. Adelson's paper, "A Multiresolution Spline with Application to Image Mosaics"). Naturally, this paper differs in that the images here are largely overlapping and the method relies on minimizing intensities at each pixel and image warping to achieve a unified spline.

The discussion of the technique for warping or transforming an image was particularly informative.

The only lack that I noticed in this paper was a discussion of computational time and hardware requirements. Is it now possible to do such splining in real-time?

Mosaic Based Representations of Video Sequences and Their Representations by M. Irani, P. Anandan, and S. Hsu

This paper covers some of the same topics as the paper reviewed above but differs significantly. This paper also deals with the image mosaic that might result from a stream of video images but is more concerned with utilizing this stream to more efficiently represent a scene whether this be in compression of the image stream ( by reducing redundancy of information) or simply by being able to highlight relevant characteristics of the image stream. The idea that a video stream of images is likely to provide a lot of redundant information about a scene is relatively intuitive and a logical extension from simply thinking about splining overlapping static images. These ideas are broken down in two representations: a static image mosaic and a dynamic image. The former is related to simply splining related images together to provide a more complete view of a scene. The latter is related to updating a single scene ( not extending the resolution of the camera as was discussed in the previous paper) with new information such as motion information or trajectories.

The authors then go on to outline the potential applications for this technique which include such diverse applications as video compression , enhancement, enhanced visualization( which means that an image can be enhanced to provide more information than possible with a single image or a stream of images in sequence), and indexing, search and manipulation of video images.

The list of possible applications seems overly ambitious until one reads further. The paper describes a new way to visual image mosaics and video image streams that is very powerful. I was remain very impressed by this paper.

Timothy Frangioso

Jason Golubock

Jeremy Green

Daniel Gutchess

John Isidoro

The latest batch of papers (image mosaics) have been some of the most interesting for me.. I think the idea of using multiple frames to increase image quality was very thought provoking. When I read this it made me think about high definition tv. One of the problems facing HDTV is how can you generate a high resolution picture without increasing transmission bandwidth while remaining compatible with the existing NTSC.

Using image mosaics to generate higher definition and higher resolution descriptions of scenes would be a way to "fill in" the additional pixels provided by a high definition tv set.. Also, since the extra information would be generated by previous frames, there would be no need to transmit extra data other than the standard televsion signal.

Tong Jin

Leslie Kuczynski

Mosaic Based Representations of Video Sequences and Their Applications

Video Mosaics for Virtual Environments

Both of the above papers deal with issues regarding the building of mosaics from video sequences (still camera shots can also be used). A number of practical applications are presented. Some examples include; creation of virtual reality environments, still panorama pictures, compression schemes, and low-bit rate transmission of video. I have been fortunate enough to have seen demos of such technology and to have worked first-hand with currently available image mosaic tools. Steve Mann, of MIT's Media Lab spoke on November 20, 1996 at the 4th Annual ACM MM Conference in Boston, MA. He is currently involved in developing such technologies and presented demo's of image mosaic work. The algorithms he presented for composing video mosaics are similar to those presented in the Szeliski paper (Video Mosaics for Virtual Environments). I have also used Apple's Quick-Time VR application to develop virtual environments. However, I did not use real video sequences, or real still camera shots. All testing was done with fabricated, video sequences composed from computer graphics generated images. This eliminated many problems that would have otherwise come into play (e.g., lighting changes within an image sequence, changes in exposure, etc.). Additionally, I did not deal with the notion that objects in the video sequence might be moving. Irani et al. (Mosaic Based Representations of Video Sequences and Their Applications) present algorithms to handle moving objects within a video sequence. The basic idea is that (assuming rather slow movement) of a dynamic mosaic, which is sequence of evolving mosaic images, where the content of each now mosaic image is updated with the most current information from the most recent frame. Clearly, a technique which requires only an incremental update to the image sequence and not an entire update would be ideal for compression, storage and delivery of video.

Hyun Young Lee

Video Mosaics for Virtual Environments

In constructing virtual environments with panoramic view, video mosaics can be used. From input video sequence, the frames are composited into large mosaics and so higher quality resolution can be achieved.

Based on the basic imaging equations for alignment transformations, local and global image registration techniques are introduced. And not only for 2D projectory images, but the depth information can be recovered to give the illusion of 3D, by modeling the scene as piece-wise planar as an assumption or more generally, recovering the 3D depth map.

Such techniques can support many interesting applications such as telepresentation and virtual reality. The approach introduced in this article is efficient in the sense that it produces high-resolution scenes from video images by automatically registering and simultaneously recovering 3D information.

Mosaic Based Representations of ...

To represent video sequencies efficiently, a mosaic image is constructed from all frames in the scene. Two different types of mosaics called static and dynamic mosaic are used for different needs of applications. The static mosaic is for still images, without considering dynamic information, and useful for efficient storage of video sequences. The dynamic mosaic, as a sequence of evolving mosaic images, is ideal for low bitrate transmission in real time. The algorithm consists of three parts such that image alignment and integration and significant residual computation.

Many interesting applications are introduced and especially in the transmission of compressed video images, it is very interesting that this mosaic-based method gives much better quality than that of standard MPEG (which I have thought as a universal solution for transmitting video images up to...). Ref. [2] may show details and I hope, after reading the paper, to see exactly how.

Ilya Levin

Yong Liu

Video sequences can be represented in forms of mosaic. This technique has been found useful in mosaic based video compression, very low bitrate transmission, and compression for storrage. David Sarnoff Research Center's Michal Irani. P. Anandan and Steve Hsu proposed the idea in their paper, Mosaic Based Representation of Video sequences and Their Applications. (0-8186-7042-8/95, 1995 IEEE).

In their paper, they classified tradiational mosaic representation as static mosaic. Their video sequence derived mosaic was classified as dynamic mosaic. The difference is in the way mosaics are respresented. In addition, they also proposed the idea of temporal pyramid and multiresolution mosaic. The former is a hierachy of static mosaics whose levels corrresponds to different amounts of temporal integration. The latter captures information from each new frame at its closest corresponding resolution level in a mosaic pyramid.

Mosaic construction includes three steps:

Image alignment. The authors provided three alternatives for accomplishing the image alignment.
Image integration. Mosaic image is produced based on the aligned frames. The authors provided five diferent approaches.
Significant residual estimation. This information is neccessary if you want to construct frames based on the mosaic.

Although many approaches were discussed for alignment and integration. No discussion of strength and weakness was seen in the article. In this context, Richard, szeliski's article,Video Mosaics for vitual Environments offers an alogorithm for aligning image and composite scenes of increasing complexity ( 0272-17-16/96 1996 IEEE).

Of particular importance is the image registration method mentioned in this article. There are some key points to remeber here. First, transformation is represented in a different form to generate the basis for interpolation(integral values are needed to reconstruct I'). Second,Minimization of errors are performed by using the Levenberg-Marquardt iterative nonlinear minimization. Thrid solve the motion parameter equation. Fourth, check the error term to determine further steps of iteration.

Nagendra Mishr

Video Mosiacs for Virtual Environments

The author presents a method for constructing single large image mosiacs which consist of individual pictures of the image taken at different angles.

He describes a technique which can be used to detect image allignment which allows the images to be distorted in prespective, rigid, or affine domains. The basis of the work is using matrix multiplication and so is pretty efficient. The author then extends his technique to construct panaromic views using ordinary cameras. The work is concluded by a foundation for future work which states that these panaromic views would be realy useful if they could construct 3D information from the static iformation. The author also describes a technique by which some 3D depth information can be extracted and used for motion video.

I think for realistic 3D though, you need more sophisticated imaging algorithms and that is currently not avilable.

Mosiac Based Representation of Video Sequences and Their Applications

The authors divide the relm of vosiacs into two categories, "still" and "dynamic". Subsequently they extend their analysis to say that dynamic image vosiacs can be used to efficiently compress the information. The compressed information can then be used to efficiently transmit vosiac information over low bandwidth pipes.

As part of their dynamic discussion they mention techniques where interval differences are used to capture the dynamic aspect of the vosiac. The more interesting technique uses a Laplacian pyramid to convey the change in information.

They present one particular technique which I think is interesting: the notion that by using multiple frames of video captured at low resolutions, we can construct a high resolution image from it. I think there is a limit to the amount of detail one can gather from the technique, certainly you cannot get a 320 x 200 resolution image from a 1x1 pixel input image.

Romer Rosales

Mosaic Based Representations of Video Sequences

(Article Review)

This work is based on the approach of still mosaics and creates an extension of the concept for an efficient representation of video sequences.

The approach is also based on the fact that video is a source of information that can be modeled in different ways in order to provide a more robust representation of the captured environment.

The use of mosaics to represent collection of frames has created a more efficient way to present a scene, among other things, it provides a significant reduction in the amount of data needed. This work goes beyond the representation of still scenes and creates a technique to represent video sequences in a more efficient way.

This work also studies different kind of mosaic representations and their use in different types of applications.

A static mosaic is normally built from an input video sequence that is segmented into contiguous scene sequences. A static image is then formed by aligning all frames to a fixed coordinate system. A static mosaic contains the common information for a set of frames.

The creation of dynamic mosaics defined as a sequence of evolving mosaic images becomes very efficient if each new mosaic image is updated with the current information from recent frames.

The approach consists on representing the first dynamic mosaic and an incremental alignment and the incremental residuals (which represent the changes) . These residuals are very small so that an efficient dynamic scene representation can be achieved.

In order to create an implementation, a pyramid representation for dynamic mosaics is also discussed. This structure is composed of static mosaics in every level, each levels correspond to different time scale representations of the scene. The terminal level corresponds to the temporal sampling of the input sequence. Lower levels are based on successive increases in temporal integration and downsampling The bottom of the pyramid represents a single static mosaic.

Properties of a multiresolution mosaic is also explained, so that it is possible to handle variations in image resolution. The resolution levels are not complete, sometimes it is possible to present high resolution mosaics, others only low resolution mosaics.

The steps for the construction of a mosaic are explained with some detail, in general: alignment of the images in the sequence, integration of images into a mosaic, the calculation of residuals between the mosaic and each frame.

Some applications of this approach are discussed, among them: video compression, scene change detection, video search and indexing, video editing...

In general this approach can be very helpful in many video applications. Low bit transfer makes it excellent for simple video transmission, but it is important to notice that it is an incremental approach, it is necessary to have every previous frame (of the sequence) in order to show an image, so it can create some problems for easy editing. I think that most of this problems can be solved with an appropriate technique for storage and retrieval of related still mosaics.

Video Mosaics for Virtual Environments

(Article Review)

This article presents some techniques for automatically generate high resolution, photorealistic imagery based on low resolution media (with a limited field of view). It seems that in theory, it could achieve unlimited resolution. Also this technique can be invariant with respect to range or scale.

In practice, a set of low resolution inputs (video, imagery) can be obtained for example by panning a camera over a scene. This technique can composite the video frames a create panoramic images of arbitrary characteristics. Planar, panoramic and deep-variation scenes are used to describe this technique.

The approach is based on the fact the it is possible to automatically align different tiles from a scene into a larger mosaic of the same scene (higher resolution) and seamlessly blend the image together. In order to achieve this, it is necessary to derive the alignment or warping transform directly from the images.

These transformations can be obtained without knowledge about camera calibration parameters or about the relative motion between frames.

The importance of this technique is that it does not require to identify feature points. Also it can estimate once it is in the vicinity of the true solution.

In general the algorithm works as following: using a 2D transformation, it computes the position of every point (xi,yi) into the other image (xi,yi) and then it calculate the error in the intensity level between these points, the intensity gradient (with respect to x and y)is also calculated. At the end it generates a system of equations and iterates until it finds the best transformation M ((x,y)->(x,y)).

To blend the resampled image bilinear weighting function. It hides the edges of the component images. We should expect that a low frequency motting remains if the individual tiles have different exposures.

This approach was tested on a image sequence on a whiteboard, on a panoramic view or environment map. Some details of the process are detailed in this work. Projective deep recovery is also analyzed, which i think is very helpful specially for real-world scenes.

The article also discusses applications of these techniques, ranging from scanning whiteboards, creating game scenarios, special effects, virtual environments and experiences such as virtual travel, home walkthroughs, and home supermarket shopping.

In general this approach has some advantages on others based on feature detection and tracking. For this, it can work where features cannot be seen due to high texture patterns or other image complexities. But, it assumes an ideal pinhole camera. It can be highly sensitive to intensity changes, also, it is necessary to have a good alignment of rotational axes in panoramic compositing. It may have problems with temporal events that occur at a given instant in part of the scanned image. A possible extension could be the representation of motion.

Natasha Tatarchuk

Video Mosaics for Virtual Environments

Richard Szeliski

Richard Szeliski in this paper presents a technique for automatically registering video frames into 2D and limited 3D scene mosaics. An algorithm is described to create planar image mosaics using transformations. The author also describes the method of creating panoramic image mosaics. To do that, you can rotate a camera around its optical center, which is similar to the still photographic camera methods, and then use image alignment to compose the mosaics. Also, it is noted that some research is being done in recovering projective depth of the video sequence, in order to construct a full 3D mosaics. As of now, only limited 3D scene models can be constructed.

This technique is a very useful one, and we can see many different applications rise up to use this method. As the title of the paper suggests, these recovered mosaics in 3D can be used as environment maps for virtual reality scenes, or just for any applications that requires knowledge about the scene around them, something like a computer for cruise control in a car. Also, as the author suggests, this can be used for videoconferencing in scanning the scenery or whiteboards or blackboards and then composing the mosaics of the transfered image, which seems to produce higher resolution results than just taking a shot with wide-lens camera.

Mosaic Based Representations of Video Sequences and Their Applications

Michal Irani, P. Anandan, and Steve Hsu

In this paper the authors investigate how to use mosaics as the basis for efficient video compression, along with other applications of this method, such as video visualization, video enhancement, etc. Two different types of mosaics are presented here, namely, static mosaic and dynamic mosaic. The way that a mosaic image is constructed from all frames in a scene sequence, giving a panoramic view of the scene. One of the types of the mosaics is the static mosaic, which is the most common mosaic representation. It is constructed by aligning all frames of a contiguous scene subsequences (which is taken from the input video sequence) to a fixed coordinate system.

The dynamic mosaic is a sequence of evolving mosaic images, where the contend of each new mosaic is updated with the most current information from the most recent frame. Also, the authors developed so called Multiresolution Mosaic, which is very useful for handling the variations in image resolution that occur due to camera zooming and other reasons.

Leonid Taycher

Mosaic Based Representations of Video Sequences and Their Applications

This paper contains mainly an overview of different types of image mosaics and their applications. It introduces the notion of Dynamic Mosaic, a mosaic, which is actually a sequence of evolving mosaic frames, updated with each new frame, and thus the objects which appear differently in the different frames in the vieo sequence will appear normally in the dynamic mosaic, while in the static mosaic they will appear transparent or not appear at all. IT also introduces the notion of a temporal pyramid which is a hierarchy of the static mosaics created on the different time scales (from a single frame mosaics on the finest level to a single static mosaic on the coarsest level). One of the problems with the dynamic mosaic is that it is going to break when only a part of the moving is going to appear in the frame, so only it's updated position will be represented in the mosaic, not the whole object. For example, in the fig. 3, when the camera moves to the right so that only player's hand is visible, the hand's position will be updated, while the player's will not, so the discrepancy will occur.

Video mosaics of Virtual environments.

This paper discusses the registration of (that is approximation of the movement between) the frames inthe video sequence. And creating of that is called static mosaics in the previous paper.

Alex Vlachos



Stan Sclaroff

Created:  Nov 21, 1996

Last Modified: Nov 21, 1996