Skip to yearly menu bar Skip to main content


Session

Oral Session 7

Geoffrey Gordon

Abstract:
Chat is not available.

Wed 10 Dec. 11:00 - 11:50 PST

Invited Talk (Posner Lecture)
Using the Emergent Dynamics of Attractor Networks for Computation

John J. Hopfield

In higher animals such as mammals, complex collective behaviors emerge from the microscopic properties of large structured ensembles of neurons. I will describe a model example of emergent computational dynamics, based on old-brain cortical properties. This collective (or emergent) description is derivable from the dynamical activity of neurons but has a completely different mathematical structure from the underlying neural network dynamics. The utility of understanding collective dynamics will first be illustrated by showing how it generates a natural solution to the ‘time-warp’ problem that occurs in recognizing time-varying stimulus patterns having a substantial variation in cadence (e.g. spoken words). The model of emergent dynamics will be shown to be capable of producing goal-directed motor behavior, object-based attention, and rudimentary thinking.

Wed 10 Dec. 11:50 - 12:10 PST

Oral
"How hard is my MDP?" The distribution-norm to the rescue

Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor

In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel $p$. In many problems, a good approximation of $p$ is not needed. For instance, if from one state-action pair $(s,a)$, one can only transit to states with the same value, learning $p(\cdot|s,a)$ accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) we call the {\em distribution-norm}. The distribution-norm w.r.t.~a measure $\nu$ is defined on zero $\nu$-mean functions $f$ by the standard variation of $f$ with respect to $\nu$. We first provide a concentration inequality for the dual of the distribution-norm. This allows us to replace the generic but loose $||\cdot||_1$ concentration inequalities used in most previous analysis of RL algorithms, to benefit from this new hardness measure. We then show that several common RL benchmarks have low hardness when measured using the new norm. The distribution-norm captures finer properties than the number of states or the diameter and can be used to assess the difficulty of MDPs.

Wed 10 Dec. 12:10 - 12:30 PST

Oral
On Communication Cost of Distributed Statistical Estimation and Dimensionality

Ankit Garg · Tengyu Ma · Huy Nguyen

We explore the connection between dimensionality and communication cost in distributed learning problems. Specifically we study the problem of estimating the mean $\vectheta$ of an unknown $d$ dimensional gaussian distribution in the distributed setting. In this problem, the samples from the unknown distribution are distributed among $m$ different machines. The goal is to estimate the mean $\vectheta$ at the optimal minimax rate while communicating as few bits as possible. We show that in this setting, the communication cost scales linearly in the number of dimensions i.e. one needs to deal with different dimensions individually. Applying this result to previous lower bounds for one dimension in the interactive setting \cite{ZDJW13} and to our improved bounds for the simultaneous setting, we prove new lower bounds of $\Omega(md/\log(m))$ and $\Omega(md)$ for the bits of communication needed to achieve the minimax squared loss, in the interactive and simultaneous settings respectively. To complement, we also demonstrate an interactive protocol achieving the minimax squared loss with $O(md)$ bits of communication, which improves upon the simple simultaneous protocol by a logarithmic factor. Given the strong lower bounds in the general setting, we initiate the study of the distributed parameter estimation problems with structured parameters. Specifically, when the parameter is promised to be $s$-sparse, we show a simple thresholding based protocol that achieves the same squared loss while saving a $d/s$ factor of communication. We conjecture that the tradeoff between communication and squared loss demonstrated by this protocol is essentially optimal up to logarithmic factor.