Title: Efficient Techniques for Recovering 2D Human Body Poses from Images (PhD Thesis)
Authors: Tai-Peng Tian
Date: February 23, 2011
Abstract:
Human parsing recovers the 2D spatial layout of a human figure in an
image. First, patches in the image that resemble body parts, i.e.,
head, torso and limbs, are identified, then a coherent human figure is
assembled from these candidate positions. The human model is
represented as a graph where each vertex represents a body part and
each edge represents a relationship between parts. If the graph is a
tree, then the optimal solution can be recovered efficiently using the
Min-Sum (MS) algorithm. Tree models often return incorrect solutions
with the left and right legs stacked on top of one another. To
overcome this problem, we add constraints to the tree model, yielding
a graph that contains loops. Finding the optimal solution for a loopy
graph is computationally intensive. We propose a Branch and Bound
search algorithm to recover the optimal solution. Our algorithm
converges quickly in practice due to a novel tree structured lower
bound and a fast way for evaluating these lower bounds. Naively,
evaluating each lower bound requires $O(nh)$ time for a graph with $n$
vertices and $h$ candidate body part locations. We develop an $O(1)$
time method for evaluating the lower bound (in most iterations of the
algorithm) by reusing messages from the MS algorithm and using a Range
Minimum Query data structure. We also propose a human parsing model
that encodes the viewpoint and walking phase of the human figure using
the Common Factor Model (CFM). The main computational bottleneck of
the CFM human parsing algorithm involves message creation for each
iteration of the MS algorithm. The original CFM inference requires
$O(kn)$ messages to be created for $k$ iterations of the MS algorithm
in a graph with $n$ vertices. Our new algorithm reduces this to
$O(n)$ messages created. This speedup is based on the insight that
the messages are shifted from one iteration to the next and,
therefore, messages can be created once and then shifted in subsequent
iterations (shifting is an efficient operation which requires $O(1)$
time). In our experiments, the two proposed algorithms yield an order
of magnitude computational speedup over competing algorithms.