Courant Institute, NYU
Energy-Based Models: Structured Learning Beyond Likelihoods
3:30 - 5:30pm Monday, December 04, 2006
Regency F
Energy-Based Models (EBM) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Given a set of observed variables, an EBM inference consists in finding configurations of unobserved variables that minimize the energy. Training an EBM consists in designing a loss function whose minimization will shape the energy surface so that desired variable configurations have lower energies than undesired configurations. EBM approaches have been applied with considerable success to such problems as natural language processing, biological sequence analysis, computer vision (object detection and recognition), image segmentation, image restoration, unsupervised feature learning, and dimensionality reduction. The first part of the tutorial will introduce the concepts of energy-based inference, will discuss the relationships with non-probabilistic forms of graphical models (un-normalized factor graphs), and will give the conditions that the loss function must satisfy so that its minimization will cause the model to produce good decisions. The second part will discuss the relative merits of EBM approaches and probabilistic approaches. EBMs provide more flexibility than probabilistic approaches in the design of the energy function because of the absence of normalization. More importantly, when training complex probabilistic models, one is often faced with the problem of evaluating (or approximating) intractable sums or integrals. EBMs trained with appropriate loss functions sidestep this problem altogether. The third part will present several popular learning models in the light of the EBM framework. In particular, discriminative learning methods for "structured" outputs will be discussed including: discriminative HMMs, Graph Transformer Networks, Conditional Random Fields, Maximum Margin Markov Networks, and related approaches. A simple interpretation will be given for several approximate maximum likelihood methods such as products of experts models, variational bound methods, and Hinton's Contrastive Divergence. Lastly, a number of applications to vision, NLP and bio-informatics will be discussed.
http://www.cs.nyu.edu/~yann/talks/tutorial-nips-2006.html
Yann LeCun received an Electrical Engineer Diploma from Ecole Supérieure d'Ingénieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Université Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ, in 1988, and became head of the Image Processing Research Department at AT&T Labs-Research in 1996. In 2002 he became a Fellow at the NEC Research Institute in Princeton. He has been a professor of computer science at NYU's Courant Institute of Mathematical Sciences since 2003. Yann's research interests include computational and biological models of learning and perception, computer vision, mobile robotics, data compression, digital libraries, and the physical basis of computation. His image compression technology, called DjVu, is used by thousands of libraries and publishers to distribute scanned documents on-line, and his handwriting recognition technology is used to process a large percentage of bank checks in the US. He has been general chair of the annual Learning at Snowbird workshop since 1997, and program chair of CVPR 2006.