Boston University
Distinguished Lecture Series 2011
Learning Feature Hierarchies for Vision
Yann LeCun
Courant Institute and Center for Neural Science
New York University
Time: March 30, 11am
Place: SAR 102
635 Commonwealth Ave

Abstract: Intelligent perceptual tasks such as vision and audition require the construction of good internal representations. Theoretical and empirical evidence suggest that the perceptual world is best represented by a multi-stage hierarchy in which features in successive stages are increasingly global, invariant, and abstract. An important challenge for Machine Learning is to devise "deep learning" methods than can automatically learn good feature hierarchies from labeled and unlabeled data. A class of such methods that combine unsupervised sparse coding, and supervised refinement will be described. We demonstrate the use of deep learning methods to train convolutional networks (ConvNets). ConvNets are biologically-inspired architectures consisting of multiple stages of filter banks, non-linear operations, and spatial pooling operations, analogous to the simple cells and complex cells in the mammalian visual cortex. A number of applications will be shown through videos and live demos, including a category-level object recognition system that can be trained on the fly, a pedestrian detector, and system that recognizes human activities in videos, and a trainable vision system for off-road mobile robot navigation. A new kind of "dataflow" computer architecture, dubbed NeuFlow, was designed to run these algorithms (and other vision and recognition algorithms) in real time on small, embeddable platforms. an FPGA implementation of NeuFlow running various vision applications will be demonstrated. An ASIC is being designed in collaboration with e-lab at Yale, which will be capable of 700 Giga-operations per second for less than 3 watts.

The speaker: Yann LeCun is Silver Professor of Computer Science and Neural Science at the Courant Institute of Mathematical Sciences and at the Center for Neural Science of New York University. Previously, he worked as head of the Image Processing Research Department at AT&T Labs-Research and as Fellow at the NEC Research Institute in Princeton. His current interests include machine learning, computer vision, pattern recognition, mobile robotics, and computational neuroscience. He has published extensively on these topics as well as on neural networks, handwriting recognition, image processing and compression, and VLSI design. His handwriting recognition technology is used by several banks around the world to read checks. His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to distribute and access scanned documents on the Web, and his image recognition technique, called Convolutional Network, has been deployed by industry leaders as well as several startup companies for document recognition, human-computer interaction, image indexing, and video analytics. He has been on the editorial board of numerous journals, was program chair of a number of conferences, and is chair of the annual Learning Workshop. He is on the science advisory board of Institute for Pure and Applied Mathematics, and is the co-founder of MuseAmi, a music technology company.