### NIPS 2017 Events with Videos

## Invited Talks

## Invited Talk (Breiman Lecture)s

## Invited Talk (Posner Lecture)s

## Orals

- Test of Time Award
- Diffusion Approximations for Online Principal Component Estimation and Global Convergence
- On the Optimization Landscape of Tensor Decompositions
- Positive-Unlabeled Learning with Non-Negative Risk Estimator
- Robust Optimization for Non-Convex Objectives
- Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search
- Bayesian Optimization with Gradients
- Streaming Weak Submodularity: Interpreting Neural Networks on the Fly
- Safe and Nested Subgame Solving for Imperfect-Information Games
- A unified approach to interpreting model predictions
- A graph-theoretic approach to multitasking
- Unsupervised learning of object frames by dense equivariant image labelling
- A Linear-Time Kernel Goodness-of-Fit Test
- Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts
- Generalization Properties of Learning with Random Features
- Eigen-Distortions of Hierarchical Representations
- Communication-Efficient Distributed Learning of Discrete Distributions
- On Structured Prediction Theory with Calibrated Convex Surrogate Losses
- TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
- REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
- Train longer, generalize better: closing the generalization gap in large batch training of neural networks
- Variance-based Regularization with Convex Objectives
- End-to-end Differentiable Proving
- Gradient descent GAN optimization is locally stable
- ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games
- The Marginal Value of Adaptive Gradient Methods in Machine Learning
- Imagination-Augmented Agents for Deep Reinforcement Learning
- Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
- Off-policy evaluation for slate recommendation
- Reliable Decision Support using Counterfactual Models
- Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
- Convolutional Gaussian Processes
- Inverse Reward Design
- Counterfactual Fairness
- Toward Goal-Driven Neural Network Models for the Rodent Whisker-Trigeminal System
- Masked Autoregressive Flow for Density Estimation
- Model-based Bayesian inference of neural activity and connectivity from all-optical interrogation of a neural circuit
- Deep Sets
- Quantifying how much sensory information in a neural code is relevant for behavior
- From Bayesian Sparsity to Gated Recurrent Nets

## Posters

- Learning Hierarchical Information Flow with Recurrent Neural Modules
- Decoupling "when to update" from "how to update"
- PRUNE: Preserving Proximity and Global Ranking for Network Embedding
- Structured Bayesian Pruning via Log-Normal Multiplicative Noise
- Non-convex Finite-Sum Optimization Via SCSG Methods
- Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs
- Non-monotone Continuous DR-submodular Maximization: Structure and Algorithms
- Large-Scale Quadratically Constrained Quadratic Program via Low-Discrepancy Sequences
- Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems
- A Screening Rule for l1-Regularized Ising Model Estimation
- Concentration of Multilinear Functions of the Ising Model with Applications to Network Data
- Differentially private Bayesian learning on distributed data
- Model-Powered Conditional Independence Test
- Probabilistic Models for Integration Error in the Assessment of Functional Cardiac Models
- Scalable Levy Process Priors for Spectral Kernel Learning
- Hybrid Reward Architecture for Reinforcement Learning
- Task-based End-to-end Model Learning in Stochastic Optimization
- The Expressive Power of Neural Networks: A View from the Width
- Effective Parallelisation for Machine Learning
- Noise-Tolerant Interactive Learning Using Pairwise Comparisons
- Sample and Computationally Efficient Learning Algorithms under S-Concave Distributions
- Learning Identifiable Gaussian Bayesian Networks in Polynomial Time and Sample Complexity
- On the Power of Truncated SVD for General High-rank Matrix Estimation Problems
- AdaGAN: Boosting Generative Models
- Discovering Potential Correlations via Hypercontractivity
- Adaptive Classification for Prediction Under a Budget
- Inferring Generative Model Structure with Static Analysis
- Maximum Margin Interval Trees
- Hierarchical Methods of Moments
- Testing and Learning on Distributions with Symmetric Noise Invariance
- SafetyNets: Verifiable Execution of Deep Neural Networks on an Untrusted Cloud
- The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process
- Label Efficient Learning of Transferable Representations acrosss Domains and Tasks
- Universal Style Transfer via Feature Transforms
- Attend and Predict: Understanding Gene Regulation by Selective Attention on Chromatin
- Visual Reference Resolution using Attention Memory for Visual Dialog
- Matching neural paths: transfer from recognition to correspondence search
- Pose Guided Person Image Generation
- Toward Multimodal Image-to-Image Translation
- Bregman Divergence for Stochastic Variance Reduction: Saddle-Point and Adversarial Prediction
- An inner-loop free solution to inverse problems using deep neural networks
- Structured Embedding Models for Grouped Data
- Hierarchical Attentive Recurrent Tracking
- NeuralFDR: Learning Discovery Thresholds from Hypothesis Features
- Eigen-Distortions of Hierarchical Representations
- Learning Affinity via Spatial Propagation Networks
- Deep Hyperspherical Learning
- Collaborative Deep Learning in Fixed Topology Networks
- Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization
- Streaming Weak Submodularity: Interpreting Neural Networks on the Fly
- Gradient Descent Can Take Exponential Time to Escape Saddle Points
- Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
- Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls
- Adaptive SVRG Methods under Error Bound Conditions with Unknown Growth Parameter
- Acceleration and Averaging in Stochastic Descent Dynamics
- Multiscale Semi-Markov Dynamics for Intracortical Brain-Computer Interfaces
- EEG-GRAPH: A Factor-Graph-Based Model for Capturing Spatial, Temporal, and Observational Relationships in Electroencephalograms
- Parallel Streaming Wasserstein Barycenters
- Bayesian Optimization with Gradients
- Scalable Log Determinants for Gaussian Process Kernel Learning
- Linearly constrained Gaussian processes
- Safe Model-based Reinforcement Learning with Stability Guarantees
- Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee
- Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent
- Estimating Mutual Information for Discrete-Continuous Mixtures
- Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos
- Safe and Nested Subgame Solving for Imperfect-Information Games
- Affinity Clustering: Hierarchical Clustering at Scale
- Inhomogeneous Hypergraph Clustering with Applications
- A Unified Approach to Interpreting Model Predictions
- Matrix Norm Estimation from a Few Entries
- Learning Low-Dimensional Metrics
- Consistent Robust Regression
- Gaussian Quadrature for Kernel Features
- Learning Populations of Parameters
- Deep Recurrent Neural Network-Based Identification of Precursor microRNAs
- Learning Spherical Convolution for Fast Features from 360Â° Imagery
- Deep Mean-Shift Priors for Image Restoration
- MarrNet: 3D Shape Reconstruction via 2.5D Sketches
- Self-Supervised Intrinsic Image Decomposition
- Dynamic Routing Between Capsules
- f-GANs in an Information Geometric Nutshell
- Generalizing GANs: A Turing Perspective
- Bayesian GAN
- Triple Generative Adversarial Nets
- PixelGAN Autoencoders
- Learning to Compose Domain-Specific Transformations for Data Augmentation
- Unsupervised Image-to-Image Translation Networks
- Train longer, generalize better: closing the generalization gap in large batch training of neural networks
- Learning Combinatorial Optimization Algorithms over Graphs
- Toward Goal-Driven Neural Network Models for the Rodent Whisker-Trigeminal System
- Learning to See Physics via Visual De-animation
- Shape and Material from Sound
- Deep Hyperalignment
- Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
- Beyond Worst-case: A Probabilistic Analysis of Affine Policies in Dynamic Optimization
- Clustering with Noisy Queries
- Convergence Analysis of Two-layer Neural Networks with ReLU Activation
- Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
- Straggler Mitigation in Distributed Optimization Through Data Encoding
- ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization
- Safe Adaptive Importance Sampling
- REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
- Excess Risk Bounds for the Bayes Risk using Variational Inference in Latent Gaussian Models
- Reliable Decision Support using Counterfactual Models
- Multi-Information Source Optimization
- Multiresolution Kernel Approximation for Gaussian Process Regression
- Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
- Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets
- Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning
- Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues
- Learning Mixture of Gaussians with Streaming Data
- Online control of the false discovery rate with decaying memory
- Multi-View Decision Processes: The Helper-AI Problem
- Identifying Outlier Arms in Multi-Armed Bandit
- Adaptive Active Hypothesis Testing under Limited Information
- Near-Optimal Edge Evaluation in Explicit Generalized Binomial Graphs
- Hypothesis Transfer Learning via Transformation Functions
- An Empirical Bayes Approach to Optimizing Machine Learning Algorithms
- Online to Offline Conversions, Universality and Adaptive Minibatch Sizes
- Practical Locally Private Heavy Hitters
- Thy Friend is My Friend: Iterative Collaborative Filtering for Sparse Matrix Estimation
- Optimized Pre-Processing for Discrimination Prevention
- From Parity to Preference-based Notions of Fairness in Classification
- Beyond Parity: Fairness Objectives for Collaborative Filtering
- Balancing information exposure in social networks
- Scalable Demand-Aware Recommendation
- Style Transfer from Non-Parallel Text by Cross-Alignment
- Gradient descent GAN optimization is locally stable

## Spotlights

- Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
- Gradient Descent Can Take Exponential Time to Escape Saddle Points
- Communication-Efficient Stochastic Gradient Descent, with Applications to Neural Networks
- Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration
- Inhomogoenous Hypergraph Clustering with Applications
- Limitations on Variance-Reduction and Acceleration Schemes for Finite Sums Optimization
- K-Medoids For K-Means Seeding
- Implicit Regularization in Matrix Factorization
- Online Learning with Transductive Regret
- Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls
- Matrix Norm Estimation from a Few Entries
- Acceleration and Averaging in Stochastic Descent Dynamics
- Semisupervised Clustering, AND-Queries and Locally Encodable Source Coding
- When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent
- Differentiable Learning of Submodular Functions
- Information-theoretic analysis of generalization capability of learning algorithms
- Generalized Linear Model Regression under Distance-to-set Penalties
- Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee
- Decomposable Submodular Function Minimization: Discrete and Continuous
- Clustering Billions of Reads for DNA Data Storage
- Unbiased estimates for linear regression via volume sampling
- On the Complexity of Learning Neural Networks
- On Frank-Wolfe and Equilibrium Computation
- Multiplicative Weights Update with Constant Step-Size in Congestion Games: Convergence, Limit Cycles and Chaos
- On Separability of Loss Functions, and Revisiting Discriminative Vs Generative Models
- Estimating Mutual Information for Discrete-Continuous Mixtures
- Towards Accurate Binary Convolutional Neural Network
- Posterior sampling for reinforcement learning: worst-case regret bounds
- Deep Learning for Precipitation Nowcasting: A Benchmark and A New Model
- Regret Analysis for Continuous Dueling Bandit
- PoincarĂ© Embeddings for Learning Hierarchical Representations
- Minimal Exploration in Structured Stochastic Bandits
- Deep Hyperspherical Learning
- Fast Rates for Bandit Optimization with Upper-Confidence Frank-Wolfe
- What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision?
- Diving into the shallows: a computational perspective on large-scale shallow learning
- One-Sided Unsupervised Domain Mapping
- Monte-Carlo Tree Search by Best Arm Identification
- Deep Mean-Shift Priors for Image Restoration
- A framework for Multi-A(rmed)/B(andit) Testing with Online FDR Control
- Deep Voice 2: Multi-Speaker Neural Text-to-Speech
- Parameter-Free Online Learning via Model Selection
- Graph Matching via Multiplicative Update Algorithm
- Bregman Divergence for Stochastic Variance Reduction: Saddle-Point and Adversarial Prediction
- Dynamic Routing Between Capsules
- Gaussian Quadrature for Kernel Features
- Modulating early visual processing by language
- Online Learning of Linear Dynamical Systems
- Submultiplicative Glivenko-Cantelli and Uniform Convergence of Revenues
- f-GANs in an Information Geometric Nutshell
- Fast Black-box Variational Inference through Stochastic Trust-Region Optimization
- Unsupervised Image-to-Image Translation Networks
- A Universal Analysis of Large-Scale Regularized Least Squares Solutions
- The Numerics of GANs
- A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning
- Dual Discriminator Generative Adversarial Nets
- Accelerated Stochastic Greedy Coordinate Descent by Soft Thresholding Projection onto Simplex
- Bayesian GANs
- Early stopping for kernel boosting algorithms: A general analysis with localized complexities
- Approximation and Convergence Properties of Generative Adversarial Learning
- Spectrally-normalized margin bounds for neural networks
- Dualing GANs
- The Scaling Limit of High-Dimensional Online Independent Component Analysis
- Generalizing GANs: A Turing Perspective
- Dual Path Networks
- Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
- A simple neural network module for relational reasoning
- Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure
- Process-constrained batch Bayesian optimisation
- Attention is All you Need
- Safe Adaptive Importance Sampling
- Learning Combinatorial Optimization Algorithms over Graphs
- Beyond Worst-case: A Probabilistic Analysis of Affine Policies in Dynamic Optimization
- Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
- Straggler Mitigation in Distributed Optimization Through Data Encoding
- Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning
- An Empirical Bayes Approach to Optimizing Machine Learning Algorithms
- Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning
- PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference
- Repeated Inverse Reinforcement Learning
- Multiresolution Kernel Approximation for Gaussian Process Regression
- Learning multiple visual domains with residual adapters
- Multi-Information Source Optimization
- Natural Value Approximators: Learning when to Trust Past Estimates
- Doubly Stochastic Variational Inference for Deep Gaussian Processes
- EX2: Exploration with Exemplar Models for Deep Reinforcement Learning
- Permutation-based Causal Inference Algorithms with Interventions
- Regret Minimization in MDPs with Options without Prior Knowledge
- Gradients of Generative Models for Improved Discriminative Analysis of Tandem Mass Spectra
- Successor Features for Transfer in Reinforcement Learning
- Style Transfer from Non-parallel Text by Cross-Alignment
- Overcoming Catastrophic Forgetting by Incremental Moment Matching
- Premise Selection for Theorem Proving by Deep Graph Embedding
- Fair Clustering Through Fairlets
- Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks
- Fitting Low-Rank Tensors in Constant Time
- Unsupervised Learning of Disentangled Representations from Video
- Scene Physics Acquisition via Visual De-animation
- Self-Normalizing Neural Networks
- Shape and Material from Sound
- Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
- Deep Hyperalignment
- Nonlinear random matrix theory for deep learning
- Fast amortized inference of neural activity from calcium imaging data with variational autoencoders
- Spherical convolutions and their application in molecular modelling
- Tensor encoding and decomposition of brain connectomes with application to tractography evaluation
- Translation Synchronization via Truncated Least Squares
- Targeting EEG/LFP Synchrony with Neural Nets
- Self-supervised Learning of Motion Capture
- Deep Networks for Decoding Natural Images from Retinal Signals
- Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification

## Symposiums

## Tutorials

- A Primer on Optimal Transport
- Deep Learning: Practice and Trends
- Reinforcement Learning with People
- Fairness in Machine Learning
- Deep Probabilistic Modelling with Gaussian Processes
- Statistical Relational Artificial Intelligence: Logic, Probability and Computation
- Differentially Private Machine Learning: Theory, Algorithms and Applications
- Geometric Deep Learning on Graphs and Manifolds
- Engineering and Reverse-Engineering Intelligence Using Probabilistic Programs, Program Induction, and Deep Learning

Report issues here.