### Workshop

## Modelling and inference for dynamics on complex interaction networks: joining up machine learning and statistical physics

### Manfred Opper · Yasser Roudi · Peter Sollich

##### 511 e

Fri 11 Dec, 5:30 a.m. PST

**Invited speakers**

Jose Bento Ayres Pereira, Boston College

Alfredo Braunstein, Politecnico di Torino

Ramon Grima, University of Edinburgh

Jakob Macke, MPI Biological Cybernetics Tuebingen

Andrea Montanari, Stanford University

Graham Taylor, University of Guelph

This workshop is co-sponsored by the European Network "NETADIS" (Statistical Physics Approaches to Networks Across Disciplines). See http://www.netadis.eu for further information and workshop details (NIPS 2015 tab).

**Workshop overview**

Inference and learning on large graphical models, i.e. large systems of simple probabilistic units linked by a complex network of interactions, is a classical topic in machine learning. Such systems are also an active research topic in the field of statistical physics.

The main interaction between statistical physics and machine learning has so far been in the area of analysing data sets without explicit temporal structure. Here methods of equilibrium statistical physics, developed for studying Boltzmann distributions on networks of nodes with e.g. pairwise interactions, are closely related to graphical model inference techniques; accordingly there has been much cross-fertilization leading to both conceptual insights and more efficient algorithms. Models can be learned from recorded experimental or other empirical data, but even when samples come from e.g. a time series this aspect of the data is typically ignored.

More recently, interest has shifted towards **dynamical models**. This shift has occurred for two main reasons:

(a) Most of the interesting systems for which statistical analysis techniques are required, e.g. networks of biological neurons, gene regulatory networks, protein-protein interaction networks, stock markets, exhibit very rich temporal or spatiotemporal dynamics; if this is ignored by focusing on stationary distributions alone this can lead to the loss of a significant amount of interesting information and possibly even qualitatively wrong conclusions.

(b) Current technological breakthroughs in collecting data from the complex systems referred to above are yielding ever increasing temporal resolution. This in turn allows in depth analyses of the fundamental temporal aspects of the function of the system, if combined with strong theoretical methods. It is widely accepted that these dynamical aspects are crucial for understanding the function of biological and financial systems, warranting the development of techniques for studying them.

In the past, the fields of machine learning and statistical physics have cross-fertilised each other significantly. E.g. the establishment of the relation between loopy belief propagation, message passing algorithms and the Bethe free energy formulation has stimulated a large amount of research in approximation techniques for inference and the corresponding equilibrium analysis of disordered systems in statistical physics.

It is the goal of the proposed workshop to bring together researchers from the fields of machine learning and statistical physics in order to discuss the new challenges originating from dynamical data. Such data are modelled using a variety of approaches such as dynamic belief networks, continuous time analogues of these – as often used for disordered spin systems in statistical physics –, coupled stochastic differential equations for continuous random variables etc. The workshop will provide a forum for exploring possible synergies between the inference and learning approaches developed for the various models. The experience from joint advances in the equilibrium domain suggests that there is much unexplored scope for progress on dynamical data.

Possible topics to be addressed will be:

**Inference on state dynamics:**

- efficient approximation of dynamics on a given network, filtering, smoothing

- inference with hidden nodes

- existing methods including dynamical belief propagation & expectation propagation, variational approximations, mean-field and Plefka approximations; relations between these, advantages, drawbacks

- alternative approaches

**Learning model/network parameters: **

- with/without hidden nodes

**Learning network structure:**

- going beyond correlation information

**Abstracts of invited talks**

**Jose Bento**: Learning Stochastic Differential Equations – Fundamental limits and efficient algorithms

Models based on stochastic differential equations (SDEs) play a crucial role in several domains of science and technology, ranging from chemistry to finance.
In this talk I consider the problem of learning the drift coefficient of a p-dimensional stochastic differential equation from a sample path of length T. I assume that the drift is parametrized by a high dimensional vector, and study the support recovery problem in the case where p is allowed to grow with T.
In particular, I describe a general lower bound on the sample-complexity T by using a characterization of mutual information as time integral of conditional variance, due to Kadota, Zakai, and Ziv. For linear stochastic differential equations, the drift coefficient is parametrized by a p by p matrix which describes which degrees of freedom interact under the dynamics. In this case, I analyze an L1-regularized least-squares estimator and describe an upper bound on T that nearly matches the lower bound on specific classes of sparse matrices.
I describe how this same algorithm can be used to learn non-linear SDEs and in addition show by means of a numerical experiment why one should expect the sample-complexity to be of the same order as that for linear SDEs.

**Alfredo Braunstein**: Bayesian inference of cascades on networks

We present a method based on Belief Propagation to study a series of inference problems on discrete dynamical cascade models based on partial and/or noisy observations of the cascades. The problems include the identification of the source, the discovery of undetected infected nodes, prediction of features of the future evolution, and the inference of the supporting network.

**Ramon Grima**: Exact and approximate solutions for spatial stochastic models of chemical systems

Stochastic effects in chemical reaction systems have been mostly studied via the chemical master equation, a non-spatial discrete stochastic formulation of chemical kinetics which assumes well-mixing and point-like interactions between molecules. These assumptions are in direct contrast with what experiments tells us about the nature of the intracellular environment, namely that diffusion plays a fundamental role in intracellular dynamics and that the environment itself is highly non-dilute (or crowded). I will here describe our recent work on obtaining (i) exact expressions for the solution of the reaction-diffusion master equation (RDME) and its crowded counterpart (cRDME) in equilibrium conditions and (ii) approximate expressions for the moments in non-equilibrium conditions. The solutions portray an emerging picture of the combined influence of diffusion and crowding on the stochastic properties of chemical reaction networks.

**Jakob Macke**: Correlations and signatures of criticality in neural population models

Large-scale recording methods make it possible to measure the statistics of neural population activity, and thereby to gain insights into the principles that govern the collective activity of neural ensembles. One hypothesis that has emerged from this approach is that neural populations are poised at a ‘thermo-dynamic critical point’, and that this has important functional consequences (Tkacik et al 2014). Support for this hypothesis has come from studies that computed the specific heat, a measure of global population statistics, for groups of neurons subsampled from population recordings. These studies have found two effects which—in physical systems—indicate a critical point: First, specific heat diverges with population size N. Second, when manipulating population statistics by introducing a ’temperature’ in analogy to statistical mechanics, the maximum heat moves towards unit-temperature for large populations.
What mechanisms can explain these observations? We show that both effects arise in a simple simulation of retinal population activity. They robustly appear across a range of parameters including biologically implausible ones, and can be understood analytically in simple models. The specific heat grows with N whenever the (average) correlation is independent of N, which is always true when uniformly subsampling a large, correlated population. For weakly correlated populations, the rate of divergence of the specific heat is proportional to the correlation strength. Thus, if retinal population codes were optimized to maximize specific heat, then this would predict that they seek to increase correlations. This is incongruent with theories of efficient coding that make the opposite prediction. We find criticality in a simple and parsimonious model of retinal processing, and without the need for fine-tuning or adaptation. This suggests that signatures of criticality might not require an optimized coding strategy, but rather arise as consequence of sub-sampling a stimulus-driven neural population (Aitchison et al 2014).

**Andrea Montanari**: Information-theoretic bounds on learning network dynamics

How long should we observe the trajectory of a system before being able to characterize its underlying network dynamics? I will present a brief review of information-theoretic tools to establish lower bounds on the required length of observation. I will illustrate the use of these tools with a few examples: linear and nonlinear stochastic differential equations, dynamical Bayesian networks
and so on. For each of these examples, I will discuss whether the ultimate information limit has been achieved by practical algorithms or not.

**Graham Taylor**: Learning Multi-scale Temporal Dynamics with Recurrent Neural Networks

The last three years have seen an explosion of activity studying recurrent neural networks (RNNs), a generalization of feedforward neural networks which can map sequences to sequences. Training RNNs using backpropagation through time can be difficult, and was thought up until recently to be hopeless due to vanishing and exploding gradients used in training. Recent advances in optimization methods and architectures have led to impressive results in modeling speech, handwriting and language. Applications to other areas are emerging. In this talk, I will review some recent progress on RNNs and discuss our work on extending and improving the Clockwork RNN (Koutnick et al.), a simple yet powerful model that partitions its hidden units to model specific temporal scales. Our “Dense clockworks” are a shift-invariant form of the architecture which which we show to be more efficient and effective than their predecessor. I will also describe a recent collaboration with Google in which we apply Dense clockworks to authenticating mobile phone users based on the movement of the device as captured by the accelerometer and gyroscope.

Live content is unavailable. Log in and register to view live content