Readings:
This article "Extracting Shape from Shading" introduces the idea of extracting shape from regions around shaded areas. Two such methods were determined for this. One is called Local-shape-from-shading and the other called Global-shape-from-shading. The global method uses the combinations of the local method and boundary conditions to solve the problem. The article says that the local algorithm provides robust estimates of surface shape, while the global algorithm produces accurate estimates of the 3D surface shape.
This article was quite informative, however, it was like a double edged sword. In one aspect the authors gave all the math involved in each process, this is helpful but can get slightly confusing. This occurs especially if you start to ignore the equations and they are then referred to 5 pages later. Despite the over all nature of the article, the introduction was not as informative as it should have been, in fact if I had not read the title before reading the article I might not have had a clue as to what was going on. As the reading continued though the ideas became clearer.
It is interesting how the Gaussian seems to be appearing every where. I would never have assumed it to be related to shading. The article did not explain what a Lambertian reflectance was. It was fortunate for me that my professor showed the class a Lambertian image prior to my reading the article. It is also interesting that for the local algorithm the concepts of the early human visual system are employed. One would assume the model to be more complex. What is an isointensity line?
Many objects have relatively smooth, homogeneous surfaces, and shading is important in perspecting opaque and solid bodies. So we can model images as being composed of homogeneous patches. Existing vision techniques are doing well at edges but not in the interior of a homogeneous region. Some other techniques are needed to recover 3D structure from a image.
This paper studied several previous works done by other researchers, such as Horn's thesis and Oliensis and Dupuis's. Horn's idea is to fill in the surface between edges which were detected by usual vision technique. This approach has too general to most real situations. Oliensis and Dupuis's work was feasible by using a conservative minimization process to successively approximate the solution surface and then leads to a stable solution. Authors combine the main idea of Oliensis and Dupuis's method and the simple characteristic-strips method originally used by Horn to produce a stable, accurate, and efficient solution.
There are two classes of algorithm. One is local algorithm and the other is global algorithm. Local algorithm is similar to human visual processing, it attempt to estimate shape from local variations in image intensity but it can not recover metrically accurate estimates of surface shape. Global algorithms attempt to propagate imformation across a shaded surface starting from points with know surface orientation.
In the paper, authors use the reflectance map to present the reflectance properties of a surface. In a small region, the reflectance map can be approximateed by linear function of the partial derivatives. Filter is used to remove noise and nonlinear components of the image to improve the recovery process. This is a local algorithm similar to the human visual system, that is, recover surface shape from filter set. To get a global solution, authors bring up a solution by linking adjoining patches together, using reflectance map, is to integrate the information along the direction of steepest descent on the reflectance map. That is the method of characteristic strips. A conservative "minimum downhill" rule is used to deal with singular points ambiguity. Both general and discrete implementation are given and the Lambertian reflectance law is applied to test cases.
In the experiments, different noise and erroneous light-source tilt is added to see the sensitivity of the algorithm. However, the result images are hard to perspect by using human vision system.
In this paper, Pentland and Bischel discuss the problem of extracting shape from shading given homogeneous, uniformly lit regions and present two solutions, one for local estimations and one for combining local estimations into global estimations. While these assumptions are usually not accurate for real world images, this is an important first step in extracting data 3D data from an image as most images can be segmented into regions for which these assumptions are reasonably accurate.
The first method presented, the local one, is initially based on the planar approximation of the reflectance map. It is then transformed into the Fourier domain and eventually an equation for the Fourier transformation of the z (depth) data is obtained. This equation is then improved with some more assumptions they mention. While the math seems complex, it is apparently more efficient than the planar form. It also seems that this could be grossly expanded over a much larger homogenous region though the increase in calculation would probably not be worth it.
The second method presented is for the integration of patches obtained from local methods. They build on the "method of characteristic strips" for the extrapolation of the surfaces, but introduce the "mimimum downhill" rule to add stability by only allowing changes to propagate in one direction while also trying to stay in regions of mimimum change.
These methods seem to work well and a lot of information towards implementation is given. At some points though, they seem to skip steps in the analysis, especially in the math, but enough clues are given so that someone familiar with the literature and the necessary mathematics could probably reproduce it.
"The problem is defined as follows: Solve the brightness equation R(n(x,y)) = E(x,y). Assuming that brightness can be represented by a function R(n) = (n1,n2,n3)." In other words to find the correct normal one has to correlate neighboring pixels and find which normal will apply to the brightness over this particular surface. The key to the working of the first algorithm is the idea that in particular relevant areas of the reflectance map the iosintensity lines are almost parallel. "So they provide a good way to approximate the reflectance map by a linear function of the partial derivatives (p,q)." The second algorithm is not as simple. It tries to find the correct way to correlating the various patterns by using a method of characteristic strips. The key to making this method work and avoiding the ambiguities that can arises is by following the "downhill rule". This rule allows one to avoid ambiguities by guaranteeing that there will be a smooth continuous surface.
Both of these algorithms are good approximations of the shape of Lambertian surfaces. The strength of both of them is there easy of use and general simplicity. The code that is given in appendix B is somewhat clear after working through the point notation. The major question that I have is why the experiments weigh so heavily upon the Lambertian reflectance law. Granted the assumption of ambient light is a good one but, it seems very restrictive to only stress this surface. The one example given with a shiny surface seemed to work relatively well and more experiments like this would of helped to more fully demonstrate the robustness of this technique.
The goal of shape-from-shading is to extract information about the shape and 3-d orientation of homogenous surfaces in images. Because the surfaces are smooth, the only visual cues as to the shape of the surface in the image are given by the intensity of the reflection of light over the surface. This article presents techniques for finding the shape of a surface based on the variations in intensity within the surface in a given image. Two general approaches for this are described... Local algorithms extract imformation abotu the surface shape based on variations in intensity within a small neighborhood. Global algorithms, using a starting point with a known orientation, sweep out the shape of the surface by finding the orientation of any one area based on the orientation of the area next to it. It is noted that the local method is though to be the best model of the shape-from-shading procedure that takes place in the human visual system.
The article describes both a local and a global procedure for finding shape from shading. The mathematics of each approach are described in detail. Both are shown to have performed well in the experiments that were done.
The method discussed in the article seems like a reasonably good way recover three dimensional information from a 2D image through looking at the shading. Although the algorithm seems like it could be useful, it is also very limited in the sorts of images that it can interpret.
The method assumes that the object is a uniform color and does not produce any specular highlights. It assumes that the object is completely matte. This would be a problem if you were using it to reconstruct real world objects. In a controlled environment where you are trying to produce a 3D description of a clay model this algorithm should work well. If it were trying to create 3D information about something like a face or, even worse, a car it would fail.
The algorithm definately has it's uses but it is by no means a universal way of creating 3D information from an image. I don't think the authors claim that it is.
Extracting the shape of a surface from an intensity
image is a very important, yet difficult problem in
machine vision.
The authors Alex Pentland and Martin Bichsel distinguish
two general approaches which have been used to solve the problem:
local algorithms and global algorithms.
Two specific techniques are outlined, the first taking
the local approach, and another using global methods.
1. Small regions of the reflectance map (gradient space)
are considered, so the isointensity lines can be approximated
as linear.
Multiplying the fourier spectrum of this approximated region,
and the inverse of the transfer function H gives the approximate
shape of the surface.
Some improvements are made, one of which is to
use Wiener filtering for noise removal.
Several examples show that the techniqe performs well
for a nice lambertian surface, but is less accurate when
surfaces are shiny, or specular.
2. The second technique is a modification to
Horn's method of characteristic strips. Two
rules are added to improve performance. For each point,
the line with the steepest slope passing through is selected
as the path for integration.
The downhill rule says that direction of this line should be
chosen as moving away from the light (decreasing intensity) to
integrate along.
The minimum distance rule says to pick the height with
minimal distance to the light source for all angles. Doing so
will ensure the algorithm converges.
Examples are shown showing that the algorithm is fairly robust
to noise, and it's results are accurate.
The experimental results were quite impressive. I liked the
fact that they provided code in the appendix, and the discussion
on implementation for the global algorithm. The comparison of
local methods with the human visual system was interesting.
However this technique is very limited as it can only be applied to solid color lambertian surfaces. I suppose if texture information and the phong illumination model were provided for a particular surface, this technique could be applied to textured, shiny surfaces perhaps.
Its very nice how they provide code in their paper, it shows that they have nothing to hide, and that their paper isn't purely theoretical, with specially tweaked examples to make their paper seem like it works.
Authors A. P. Pentland and M. Bichsel discuss two general approaches to recovering the shape of an object in an image from variations in image intensity. The two classes of algorithms presented are local algorithms and global algorithms.
The local algorithms concentrate on 'small' patches of an image. If the patch is sufficiently small, computations become linear and thereby relatively simple to compute. However, the authors note that although local algorithms produce good qualitative estimates of shape, they do not recover metrically accurate estimates of surface shape. As an example, consider the recovered images of the nickles shown in the paper. We definiately see success in the fact that if we did not have access to the original image, we could still identify the recovered image, but notice the distortions in the recovered image shape. If we had never seen a nickle before, we would not know that it's shape should be a circle, we would assume that an oval was correct. Therefore, it appears that local algorithms require fundamental knowlege of the shape attempting to be recoved.
Global shape-from-shading is accomplished quite differently. Instead of concentrating on small patches of image intensity variation, the focus is on 'solving' the global intensity variation equation. The algorithm is rather clever in that it begins at one point (area) in the image and 'grows' outward from there. The concept is accomplished by choosing a step size and increasing this step size on each interation through the algorithm. The recovered image is quite accurate in it's representation of the actual shape of a 3D image. The Elvis example in the paper demonstrates this. However, actual surface properties are not as successfully recovered as they were using local algorithms.
I will conjecture that global algorithms would be useful in an applications where you were searching for something in an image while local algorithms would be more useful if you were actually interested in the details of what you were searching for. It appears that the two algorithms are quite complementary, that is, use a global algorithm to locate something and then use a local algorithm to identify its properties.
In modeling an image as a mosaic of homogeneous patches, recovering 3D structure from the image, based only on most-familiar vision techniques such as stereo and motion analysis, is difficult. Thus, by employing the shading information from varying intensity of a single homogeneous patch, not only the depth information at the edges, but also 3D structure from the image can be extracted.
Two classes of such algorithm are local one and global one:
local algorithm estimates shape from local variations in image intensity
by use of linear filters and point nonlinearities and so provides
robust estimates. Global algorithm propagates information across a
shaded surface starting from points with known surface orientation,
which uses boundary conditions and local shading information together,
and thus produces more accurate estimates.
Obtaining local solution requires first approximating the reflectance
map by a linear function of the partial derivatives in the surface
orientation space. And then the surface shape can be estimated in
closed form with the inverse transfer function. Wiener filtering
can be used to improve the recovery process. To obtain a global
solution, method of characteristic strips technique can
be used, together with regularization methods which help producing
a stable solution. And in such integration step, a minimum downhill
rule should be applied to avoid the problem of singular point ambiguity
so as to be proven to converge to a unique, correct surface.
It is interesting that this method is based on the intensity information being caused by illuminants in an infinitesimal area of an image. Besides the results shown in the book, some curious thought comes up about the conditions on the lighting circumstances such that, to get an optimal 3D structure of an image, how bright the illuminant should be, from which direction the light should come, and so on. There may be no universal solution for these questions; or else exhaustive simulation could be performed.
This paper, by Alex P. Pentland and Martin Bichsel, describes the algorithms that can extract shape from the variations in intensity that exist within single homogeneous patch. The explanation of mathematical foundation for the algorithm was very brief and lucked the necessary details to understand it at all. All the examples/images, provided by the author, did not help me to understand the methods used to obtain them. Basically I found this paper very confusing and very difficult to read.
Images are composed of homogenous patches. Depth information on stereo, motion or focus is provided at the patch edges. Therefore, extract shape from shading is a very important machine vision problem. Alex P. Pentland and Martin Bichsel discussed their alorgorithm for extracting shape from the variations in intensity that exist within a single homogenous patch in chapter six of 'Handbook of Pattern Recognition and Image Processing: Computer Vision' (Academic Press, 1994 pp161-183).
Two solutions were proposed in this chapter. The first solution uses linear filters and point non-lineariities to obtain local estimate of shape. The second solution links boundary conditions and local shading information together to obtain a global solution.
In their discussion of the first solution, they link it to biological mechanism . They specifically pointed out that early stages of the human visual system can be regarded as composed of filters tuned to orientation, spatial frequency and phase. They illustrated this linkage with three examples.
The Article by Pentland, and Bichsel describes two methods for retreiving the shape of an image from a gray scale image. The methods can be thought of as a bottom up and a top down aproach. The authors call them Local and Global respectively. The article contends that both methods are feasable and yield similar results except for one ambiguity which I did not understand. They state that the local algorithm generates the shape and is robust while the global algorithm generates a 3D shape acurately. I did not understand why they didn't use the same terms for the comparison.
The local algorithm takes its inspiration from biological models which seperate the incomming image into its component frequencies. These data are independant of each other and as such are calculated in parallel. The results are then normalized, and phase shifted such that all neighbors talk the same "language" and local shapes are estimated from this process. The biological model then shifts attention to defferent areas of the image to obtain a full 3d understanding.
The global algorithm is more synthetic and has stability problems which must be addressed by sohpisticated abmiguity resolution logic. The authors give basic code which contains the logic and also point out that one of the earlier aproaches to the problem used this technique. The basic idea is that the shape of an image can be estimated if we know the orientation of a particular patch in the shape. Given the orientation of the starting point, the images' illumination angle is used to plot a gradient line. The direction of the line is determined by using a "minimum downhill" hill climbing algorithm. The authors go on to describe what happens in the computer world where you have a descrete world, but you still need to look at the neighbors.
The algorithms are defecient in dealing with multipe light sources and image surfaces which have mixed reflectivity properties. The images are also assumed to be the same color all-around. Multi-colored images would throw off the global logic. The local logic might have hope in dealing with multi-colored images.
Although many computer vision techniques are oriented to analyze images as a serie of homogeneous texture patches, the tri-dimensionality of the real world make this approach work weakly in the interior of these homogeneous regions. This is why extraction of shape by considering image shading (or intensity) is likely to be a more useful model for recognition. Our perception make us think about the real world as a mosaic of relatively smooth and homogenous surfaces.
This work is based in the extraction of shape from variations in intensity that can exist within a single homogeneous patch.
Many assumptions have been made to simplify this very complicated problem, for example: the depth and orientation of some points have to be know (the brightest or the edge points), non consideration to smooth variations in surface color, nearby illuminants, reflection from other surfaces, etc...
The main idea: this work defines the brightness E(x,y) at the image plane as only dependent on the orientation of the surface, due to this, it specifically assumes that surface patches are homogeneous an uniformly lit by distant light sources. The brightness is then represented as a function R(n)=R(n1,n2,n3).
In this way, the shape form shaded problem is considering as the solution for:
According to this work, there is a problem: an infinite number of normals (n1,n2,n3) satisfy this equation. Another assumption is made, we cannot observe brightness directly, so a measurement of image-plane intensity is used considered a good approximation.
For sufficiently small image patches, the isointensity line of reflectance could be considered parallel lines. This means that an approximation of the reflectance is possible by using linear functions (of the partial derivatives).
This is E(x,y)=k1+h2p+k3q. (Refer to the article)
An approximation specific for the Lambertian reflectance function, which kis are known, is then specified.
This approach was tested with an image with a very linear reflectance function, so as we can expect the resulting image can give an accurate impression of the real surface shape. A second image, from a more shiny metal surface was used (same object, different reflectance) and the recovery was somewhat less accurate, however the differences were very small. A third higher-detail image was used and the recovered surface was generally correct.
All this experiments were made considering a sufficiently small area or the image. To obtain a global solution, to the brightness equation adjoining patches are linked together, i.e. linking boundary conditions and local shading information to obtain a global solution.
For a global recovery a technique that grows the solution by integrating the information along the direction of steepest descent on the reflectance map was used, the normal of the surface patch. The method of characteristic strips.
This work tries to solve a very complicated problem, characterized for the necessity of controlling more variables and environment distortions that we can actually represent in a model. Due to this, many assumptions are made to build it. The result a model that maybe very helpful under controlled situations, but weak in the free real world environment.
We can expect that when these assumptions are implicit in the image, it is going to proportionate a solution that can be considered as a good recovery of the actual image. This is what the experiments showed. In the case of the global algorithm, a good estimate of the 3D surface shape is proportionated, although in some cases, like in the Elvis bust, some deviations from the true surface (around the edge of the nose) are shown, according to this work due to self-illumination of the surface.
Stan Sclaroff
Created: Sep 26, 1996
Last Modified: Nov 1, 1996