NIPS 2011 Tutorials

Linear Programming Relaxations for Graphical Models

Amir Globerson · Tommi Jaakkola

[ Manuel de Falla ]

Many probabilistic modeling tasks rely on solving challenging inference problems. These combinatorial problems arise, e.g., in predicting likely values for variables as in selecting and orienting residues in protein design, parsing in natural language processing, or when learning the model structure itself. In many cases, the inference problems involve densely connected variables (or higher order dependences) and are provably hard. However, recent research has shown that some of these difficulties can be overcome by a careful choice of approximation schemes and learning algorithms. These have yielded very encouraging results in wide array of fields, from machine vision and natural language processing to computational biology and signal processing. In this tutorial, we will focus on linear programming (LP) relaxations which have been particularly successful in solving inference problems. Intuitively, LP relaxations decompose a complex problem into a set of simpler subproblems that are subsequently encouraged to agree. If the subproblems agree about a solution, then the solution is the optimal one, otherwise it is fractional. Geometrically, the relaxation maintains an outer bound approximation to a polytope whose vertexes correspond to valid solutions. We will introduce and explain key ideas behind recent approaches, discuss when they can and cannot be applied, how …

Flexible, Multivariate Point Process Models for Unlocking the Neural Code

Jonathan W Pillow

[ Andulucia II & III ]

Abstract

A central goal of computational neuroscience is to understand the neural code, the semantic relationship between neural spike trains and the extrinsic (sensory, motor, & cognitive) variables that they represent. One powerful approach to this problem involves "cascade" point process models, which describe the neural encoding process in terms of a cascade of three stages: (1) linear dimensionality-reduction of a high-dimensional stimulus space; (2) a nonlinear transformation from feature space to spike rate; and (3) an inhomogeneous, conditional renewal (e.g., Poisson) spiking process. These models have been shown to provide accurate descriptions of single- and multi-neuron spike responses in a wide variety of brain areas, and have shed light on the fundamental units (rates, spike times, correlations, oscillations) that neurons use to convey information. Recent innovations have focused on extending these models to incorporate richer nonlinear dependencies and dynamics, and to capture more biologically realistic features of neural spike trains.

In this tutorial, I will provide a general introduction to cascade neural encoding models and then discuss some more recent advances, including models for non-Poisson spike trains and correlated neural population responses. Topics will include: Poisson & renewal processes, reverse correlation, spike-triggered average / covariance (STA/STC) analysis, inverse regression, maximally …

Lagrangian Relaxation Algorithms for Inference in Natural Language Processing

Alexander Rush · Michael Collins

[ Andulucia II & III ]

Abstract

here has been a long history in combinatorial optimization of methods that exploit structure in complex problems, using methods such as dual decomposition or Lagrangian relaxation. These methods leverage the observation that complex inference problems can often be decomposed into efficiently solvable sub-problems. Recent work within the machine learning community has explored algorithms for MAP inference in graphical models using these methods, as an alternative for example to max-product loopy belief propagation.

In this tutorial, we give an overview of Lagrangian relaxation for inference problems in natural language processing. The goals of the tutorial will be two-fold:

1) to give an introduction to key inference problems in NLP: for example problems arising in machine translation, sequence modeling, parsing, and information extraction.

2) to give a formal and practical overview of Lagrangrian relaxation as a method for deriving inference algorithms for NLP problems. In general, the algorithms we describe combine combinatorial optimization methods (for example dynamic programming, exact belief propagation, minimum spanning tree, all-pairs shortest path) with subgradient methods from the optimization community. Formal guarantees for the algorithms come from a close relationship to linear programming relaxations.

For many of the problems that we consider, the resulting algorithms produce exact solutions, …

Modern Bayesian Nonparametrics

Peter Orbanz · Yee Whye Teh

[ Manuel de Falla ]

Abstract

A nonparametric model is a model on an infinite dimensional parameter space. The parameter space represents the set of all possible solutions for a given learning problem -- for example, the set of smooth functions in nonlinear regression, or of all probability densities in a density estimation problem. A Bayesian nonparametric model defines a prior distribution on such an infinite dimensional space, where the traditional prior assumptions (e.g. "the parameter is likely to be small") are replaced by structural assumptions ("the function is likely to be smooth"), and learning then requires computation of the posterior distribution given data.

The tutorial will provide a high-level introduction to modern Bayesian nonparametrics. Since first attracting attention at NIPS a decade ago, the field has evolved substantially in the machine learning, statistics and probability communities. We now have a much improved understanding of how novel models can be used effectively in applications, of their theoretical properties, of techniques for posterior computation, and of how they can be combined to fit the requirements of a given problem. In the tutorial, we will survey the current state of the art with a focus on recent developments of interest in machine learning.

Graphical Models for the Internet

Amr Ahmed · Alexander Smola

[ Andulucia II & III ]

Abstract

In this tutorial we give an overview over applications and scalable inference in graphical models for the internet. Structured data analysis has become a key enabling technique to process significant amounts of data, ranging from entity extraction on webpages to sentiment and topic analysis for news articles and comments. Our tutorial covers large scale sampling and optimization methods for Nonparametric Bayesian models such as Latent Dirichlet Allocation, both from a statistics and a systems perspective. Subsequently we give an overview over a range of generative models to elicit sentiment, ideology, time dependence, hierarchical structure, and multilingual similarity from data. We conclude with an overview of recent advances in (semi)supervised information extraction methods based on conditional random fields and related undirected graphical models.

Information Theory in Learning and Control

Naftali Tishby

[ Manuel de Falla ]

Abstract

The concept of information plays a fundamental role in modern theories of cognition, in particular as regards perception, learning and behavior. However, the formal links between information theory and cognitive neuroscience remain elusive, and information theoretic measures are all too frequently misapplied.

In this tutorial I present a principled overview of some recent links between Shannon's theory of communication and statistical learning theory, and then put these in a more general framework of information theory of perception and control. I begin with the well-known links between statistical inference and information, from simple hypothesis testing, parameter estimation and the concept of sufficient statistics. An information theoretic generalization of minimal sufficient statistics leads to a natural optimization problem; i.e., the information bottleneck principle (IB), which is directly connected to classical models of communication with side information. While the IB optimization problem is generally non-convex, it is efficiently solvable for multivariate Gaussian variables. This special case was recently generalized using the Kernel trick to a wide class of dimensionality reduction problems, similar to Kernel-CCA. This version makes the information bottleneck method completely applicable for a wide range of practical problems. I will discuss the advantages of these algorithms over K-CCA and describe the …