### Tutorial

## Energy-Based Models: Structured Learning Beyond Likelihoods

### Yann LeCun

##### Regency F

Energy-Based Models (EBM) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Given a set of observed variables, an EBM inference consists in finding configurations of unobserved variables that minimize the energy. Training an EBM consists in designing a loss function whose minimization will shape the energy surface so that desired variable configurations have lower energies than undesired configurations. EBM approaches have been applied with considerable success to such problems as natural language processing, biological sequence analysis, computer vision (object detection and recognition), image segmentation, image restoration, unsupervised feature learning, and dimensionality reduction.

The first part of the tutorial will introduce the concepts of energy-based inference, will discuss the relationships with non-probabilistic forms of graphical models (un-normalized factor graphs), and will give the conditions that the loss function must satisfy so that its minimization will cause the model to produce good decisions. The second part will discuss the relative merits of EBM approaches and probabilistic approaches. EBMs provide more flexibility than probabilistic approaches in the design of the energy function because of the absence of normalization. More importantly, when training complex probabilistic models, one is often faced with the problem of evaluating (or approximating) intractable sums or integrals. EBMs trained with appropriate loss functions sidestep this problem altogether. The third part will present several popular learning models in the light of the EBM framework. In particular, discriminative learning methods for "structured" outputs will be discussed including: discriminative HMMs, Graph Transformer Networks, Conditional Random Fields, Maximum Margin Markov Networks, and related approaches. A simple interpretation will be given for several approximate maximum likelihood methods such as products of experts models, variational bound methods, and Hinton's Contrastive Divergence. Lastly, a number of applications to vision, NLP and bio-informatics will be discussed.