CAS CS 585 - Spring 2021
The assignment is due 11:59 PM (at midnight) EST, Wednesday, February 17, 2021.
Please scan your written assignment (Part 1) as
a single PDF file and use GradeScope to submit.
Instructions how to submit the programming assignment will follow.
You are encouraged to work in teams of two or four students. Each
student must submit his or her own solutions and acknowledge any
Part 1: Paper-and-pencil Assignment: 2nd Moments NCC, Evaluating Binary Classifiers of Videos
Please write legibly if you use paper and pencil. You may also type your solutions.
Exercise 1: Using 2nd moments for binary image analysis (Use material from lectures on Feb. 2 and 4)
- Given four objects with different shapes, please calculate the following values respectively for each object.
- Location: Centroid. Explain how you will assign object location when the centroid falls in between pixels.
- Circularity: E_min/E_max
- Orientation: Direction of axis of least inertia. Explain how you will assign object direction when this axis
- The coordinate origin of an image may be at the lower left corner or at the center of an image. Does the location of the coordinate origin change your answers above? Please answer yes or no and provide a 1-sentence explanation to your answer.
- The objects shown above are images with just a few pixels.
Assume your images have objects of the same shape but much larger
size. Are any of the three measures above "invariant to size," which
means yield the same value (ignore small round-off errors)? Please
answer yes or no and provide a short explanation.
Exercise 2: NCC (wait until lecture on Feb. 9, 2021)
Exercise 3: Evaluating Binary Classifiers of Videos (wait until lecture on Feb. 11, 2021)
Show that the normalized correlation coefficient r is invariant
to linear brightness changes in the scene s (or template
m). This means you need to prove r(m,s) = r(m,as+b) for
images m and s, and some constants a and
We defined the normalized correlation coefficient in class as
r= 1/n Σ_i [(s_i - mean(s)) (m_i- mean(m)) / (σ_s
where s_i and m_i are the respective
brightness values of the ith pixel, mean(m) and σ_m are mean and
standard deviation of all pixels in the template and mean(s) and σ_s are
mean and standard deviation of all pixels in the sub-image of the scene.
Explain why the linear invariance property of the normalized correlation
coefficient, shown in part (a), could be useful for image analysis
(< 2 sentences).
The range of the NCC is between -1 and 1: -1 <= r <= 1.
Explain why this property could be useful for image analysis (< 2 sentences).
The expected value of the NCC is 0: E[r]=0.
Explain why this property could be useful for image analysis (< 2 sentences).
Suppose that we have a test dataset of ultrasound videos of the lungs of 200 babies -- 120 have pneumonia and 80 do not have pneumonia. We use it to test the performance of
a 2-class video classification model that predicts which baby has pneumonia.
Our experiment produces the following confusion matrix.
For those of you interested in research: We are currently developing a
lung US video classifier for babies in Zambia. We are working with
Zambian doctors who are in the process of providing 200 videos for us.
U.S. radiologists in our team provide annotations of regions in the
lungs that show pneumonia ("consolidations"). In Zambia, an
under-resourced country, a clinic is much more likely to have an
ultrasound device (the cheapest go for $2,000) than an xray machine
(tens of thousands dollars). When there is no image analysis
available at a rural clinic (hardware and interpretation expertise),
babies typically receive antibiotics. This has led to antibiotics
becoming more and more ineffective. AI-supported US video analysis
may lead to fewer prescriptions of antibiotics for babies who do not
need them while ensuring that sick babies can be diagnosed and given
- What is X?
- What is Y?
- Compute recall.
- Compute precision
- Compute accuracy.
- Is this classifier specific for pneumonia? Explain.
- Is this classifier sensitive to pneumonia? Explain.
- Missing a diagnosis for pneumonia could mean that the baby dies. Providing a false alarm of a healthy baby typically means additional imaging is ordered, e.g., an Xray. Considering this background information, which of the properties of the binary classifier do you propose is most important to improve?
We meet every Wednesday morning
8:30-9:30 am on zoom if you want to listen. I can provide the link for
truly interested students.
Part 2: Programming Assignment (you may start -- some tools will be introduced in the lectures on Feb 9 & 11)
In this assignment, you will build upon the ideas you learned in class and in
the labs. Your team will design
and implement algorithms that recognize hand shapes or gestures, and create a
graphical display that responds to the recognition of the hand shapes or
Each team must write a webpage report on the assignment. You may use our
template. We provide this template for guidance on content -- you may develop your own html style file.
To ensure that each team member builds his or her own
electronic portfolio, we ask that everybody submits his or her own report to GradeScope.
- Read and display video frames from a webcam
- Learn about tracking by template matching
- Learn about analyzing properties of objects in an image, e.g. object centroid, axis of least inertia, shape (circularity)
- Create interesting and interactive graphical applications
Design and implement algorithms that recognize hand shapes (such as making a
fist, thumbs up, thumbs down, pointing with an index finger etc.) or gestures
(such as waving with one or both hands, swinging, drawing something in the air
etc.) and create a graphical display that responds to the recognition of the
hand shapes or gestures. For your system, you are encouraged to try out some of the following
computer vision techniques that were discussed in class and use at least a couple of techniques (in particular, binary object shape analysis):
- horizontal and vertical projections to find bounding boxes of ”movement blobs” or ”skin-color blobs”
- size, position, and orientation of object of interest
- circularity of object of interest
- template matching (e.g., create templates of a closed hand and an open hand)
- background differencing: D(x,y,t) = |I(x,y,t)-I(x,y,0)|
- frame-to-frame differencing: D’(x,y,t) = |I(x,y,t)-I(x,y,t-1)|
- motion energy templates (union of binary difference images over a window of time)
- skin-color detection (e.g., thresholding red and green pixel values)
- tracking the position and orientation of moving objects
You may use OpenCV library functions in your solution. If you do so, you
must understand the OpenCV function in detail -- both the mathematical
formulation and the algorithm. In particular, you must be able to explain the
function on a whiteboard without access to the OpenCV help pages.
Your algorithm should detect at least four different hand shapes or gestures.
In your report, create a confusion matrix to illustrate how well your system can classify the
hand shapes or gestures.
You are also asked to create a graphical display that responds to the movements
of the recognized gestures. The graphics should be tasteful and appropriate to
the gestural movements. Along with the program, submit the following
information about your graphics program:
The programming assignment (code), along with the webpage report (html) and results (video), should be submitted
to Gradescope under "HW2 - Programming". Please submit each file seperately (do not ZIP or archive them). Here are some guidelines to follow when submitting:
- An overall description
- How the graphics respond to different hand shapes and/or gestures
- Interesting and fun aspects of the graphics display
- Each student in the group can submit the same code
- Each student in the group needs to write their own report
- Each student in the group can submit the same video demo
- In your code AND report, please explicitly mention your team members
- If you are having trouble uploading the video to gradescope, please upload to either google drive or youtube (as an unlisted video) and upload a .txt file with a link to either the drive or youtube video. It is your responsibility to make sure that the correct permissions are set for graders to be able to view your demo.