|
Abstract:
Intelligent perceptual tasks such as vision and audition require the
construction of good internal representations. Theoretical and
empirical evidence suggest that the perceptual world is best
represented by a multi-stage hierarchy in which features in successive
stages are increasingly global, invariant, and abstract. An important
challenge for Machine Learning is to devise "deep learning" methods
than can automatically learn good feature hierarchies from labeled and
unlabeled data.
A class of such methods that combine unsupervised sparse coding, and
supervised refinement will be described. We demonstrate the use of
deep learning methods to train convolutional networks
(ConvNets). ConvNets are biologically-inspired architectures
consisting of multiple stages of filter banks, non-linear operations,
and spatial pooling operations, analogous to the simple cells and
complex cells in the mammalian visual cortex.
A number of applications will be shown through videos and live demos,
including a category-level object recognition system that can be
trained on the fly, a pedestrian detector, and system that recognizes
human activities in videos, and a trainable vision system for off-road
mobile robot navigation.
A new kind of "dataflow" computer architecture, dubbed NeuFlow, was
designed to run these algorithms (and other vision and recognition
algorithms) in real time on small, embeddable platforms. an FPGA
implementation of NeuFlow running various vision applications will be
demonstrated. An ASIC is being designed in collaboration with e-lab at
Yale, which will be capable of 700 Giga-operations per second for
less than 3 watts.
|