## CAS CS 585 Image and Video Computing - Spring 2021

## Assignment 4

This assignment is due midnight (11:59 PM), Friday, March 26th.
**Please scan your written assignment as a single PDF file and use GradeScope
to submit. **

**Exercise 1: Optical Flow
**
(a) Draw an optical flow field that shows an object translation that is parallel
to the image plane. The motion is twice as fast horizontally than vertically.

(b) Draw an optical flow field that has a focus of expansion at the principal
point and describes a motion an object towards the camera.

(c) Describe a scenario where the object is not moving but the optical flow
field is not zero.

(d) The Constant Brightness Assumption (CBA) is used in both the Lucas and
Kanade Algorithm and the Horn and Schunk Algorithm. Describe how each algorithm
handles the fact that the assumption might be violated.

(e) Explain what it means that "the validity of the CBA depends on the spatial
frequency of an image."

(f) Explain why optical flow perpendicular to the brightness gradient cannot be
computed by giving a mathematical argument that uses the CBA equation.

(g) Spatial derivatives of the flow vector are computed to impose a smoothness
assumption in the Horn and Schunk algorithm. 1) Why is smoothness useful?
2) Give the
equations for the discrete approximation of each spatial derivative.

** Exercise 2: Binocular Stereo (3/23 Lecture) **

Suppose you set up the cameras of a binocular stereo system so that the optical
axes are parallel and pointing in the same direction and the distance between
the centers of projection of the cameras is 20 cm. Both cameras have a focal
length of 50 mm and a pixel width and height of 13 micrometers.

How far away is an object imaged for which you measured a disparity of 12
pixels? (Hints: Sketch the geometry. Make sure not to drop the physical units in
your calculations.)

**Exercise 3: Deep Learning and Computer Vision
**

(a) Explain briefly but concisely what it means that a neural network layer
performs a convolution of filter size 5x5x3 and stride 2.

(b) What is the difference between instance segmentation and semantic
segmentation? (1 sentence)

(c) What does "max unpooling" mean?

(d) What is meant by a "one hot" representation? What is a word vector
representation? Which representation yields better results with deep networks
that address the task of automatically developing captions for images to help
the visually impaired?