Timezone: »
Reinforcement learning (RL) algorithms learn through rewards and a process of trial-and-error. This approach was strongly inspired by the study of animal behaviour and has led to outstanding achievements in machine learning (e.g. in games, robotics, science). However, artificial agents still struggle with a number of difficulties, such as sample efficiency, learning in dynamic environments and over multiple timescales, generalizing and transferring knowledge. On the other end, biological agents excel at these tasks. The brain has evolved to adapt and learn in dynamic environments, while integrating information and learning on different timescales and for different duration. Animals and humans are able to extract information from the environment in efficient ways by directing their attention and actively choosing what to focus on. They can achieve complicated tasks by solving sub-problems and combining knowledge as well as representing the environment in efficient ways and plan their decisions off-line. Neuroscience and cognitive science research has largely focused on elucidating the workings of these mechanisms. Learning more about the neural and cognitive underpinnings of these functions could be key to developing more intelligent and autonomous agents. Similarly, having a computational and theoretical framework, together with a normative perspective to refer to, could and does contribute to elucidate the mechanisms used by animals and humans to perform these tasks. Building on the connection between biological and artificial reinforcement learning, our workshop will bring together leading and emergent researchers from Neuroscience, Psychology and Machine Learning to share: (i) how neural and cognitive mechanisms can provide insights to tackle challenges in RL research and (ii) how machine learning advances can help further our understanding of the brain and behaviour.
Fri 9:00 a.m. - 9:15 a.m.
|
Opening Remarks
(
Talk
)
|
Raymond Chua · Feryal Behbahani · Sara Zannone · Rui Ponte Costa · Claudia Clopath · Doina Precup · Blake Richards 🔗 |
Fri 9:15 a.m. - 9:45 a.m.
|
Invited Talk #1: From brains to agents and back
(
Talk
)
|
Jane Wang 🔗 |
Fri 9:45 a.m. - 10:30 a.m.
|
Coffee Break & Poster Session
(
Poster Session
)
|
Samia Mohinta · Andrea Agostinelli · Alexandra Moringen · Jee Hang Lee · Yat Long Lo · Wolfgang Maass · Blue Sheffer · Colin Bredenberg · Benjamin Eysenbach · Liyu Xia · Efstratios Markou · Jan Lichtenberg · Pierre Richemond · Tony Zhang · JB Lanier · Baihan Lin · William Fedus · Glen Berseth · Marta Sarrico · Matthew Crosby · Stephen McAleer · Sina Ghiassian · Franz Scherr · Guillaume Bellec · Darjan Salaj · Arinbjörn Kolbeinsson · Matthew Rosenberg · Jaehoon Shin · Sang Wan Lee · Guillermo Cecchi · Irina Rish · Elias Hajek
|
Fri 10:30 a.m. - 10:45 a.m.
|
Contributed Talk #1: Humans flexibly transfer options at multiple levels of abstractions
(
Talk
)
Humans are great at using prior knowledge to solve novel tasks, but how they do so is not well understood. Recent work showed that in contextual multi-armed bandits environments, humans create simple one-step policies that they can transfer to new contexts by inferring context clusters. However, the daily tasks humans face are often temporally extended, and demand more complex, hierarchically structured skills. The options framework provides a potential solution for representing such transferable skills. Options are abstract multi-step policies, assembled from simple actions or other options, that can represent meaningful reusable skills. We developed a novel two-stage decision making protocol to test if humans learn and transfer multi-step options. We found transfer effects at multiple levels of policy complexity that could not be explained by flat reinforcement learning models. We also devised an option model that can qualitatively replicate the transfer effects in human participants. Our results provide evidence that humans create options, and use them to explore in novel contexts, consequently transferring past knowledge and speeding up learning. |
Liyu Xia 🔗 |
Fri 10:45 a.m. - 11:00 a.m.
|
Contributed Talk #2: Slow processes of neurons enable a biologically plausible approximation to policy gradient
(
Talk
)
Recurrent neural networks underlie the astounding information processing capabilities of the brain, and play a key role in many state-of-the-art algorithms in deep reinforcement learning. But it has remained an open question how such networks could learn from rewards in a biologically plausible manner, with synaptic plasticity that is both local and online. We describe such an algorithm that approximates actor-critic policy gradient in recurrent neural networks. Building on an approximation of backpropagation through time (BPTT): e-prop, and using the equivalence between forward and backward view in reinforcement learning (RL), we formulate a novel learning rule for RL that is both online and local, called reward-based e-prop. This learning rule uses neuroscience inspired slow processes and top-down signals, while still being rigorously derived as an approximation to actor-critic policy gradient. To empirically evaluate this algorithm, we consider a delayed reaching task, where an arm is controlled using a recurrent network of spiking neurons. In this task, we show that reward-based e-prop performs as well as an agent trained with actor-critic policy gradient with biologically implausible BPTT. |
Wolfgang Maass 🔗 |
Fri 11:00 a.m. - 11:30 a.m.
|
Invited Talk 2: Understanding information demand at different levels of complexity
(
Talk
)
In the 1950s, Daniel Berlyne wrote extensively about the importance of curiosity – our intrinsic desire to know. To understand curiosity, Berlyne argued, we must explain why humans exert so much effort to obtain knowledge, and how they decide which questions to explore, given that exploration is difficult and its long-term benefits are impossible to ascertain. I propose that these questions, although relatively neglected in neuroscience research, are key to understanding cognition and complex decision making of the type that humans routinely engage in and autonomous agents only aspire to. I will describe our investigations of these questions in two types of paradigms. In one paradigm, agents are placed in contexts with different levels of uncertainty and reward probability and can sample information about the eventual outcome. We find that, in humans and monkeys, information sampling is partially sensitive to uncertainty but is also biased by Pavlovian tendencies, which push agents to engage with signals predicting positive outcomes and avoid those predicting negative outcomes in ways that interfere with a reduction of uncertainty. In a second paradigm, agents are given several tasks of different difficulty and can freely organize their exploration in order to learn. In these contexts, uncertainty-based heuristics become ineffective, and optimal strategies are instead based on learning progress – the ability to first engage with and later reduce uncertainty. I will show evidence that humans are motivated to select difficult tasks consistent with learning maximization, but they guide their task selection according to success rates rather than learning progress per se, which risks trapping them in tasks with too high levels of difficulty (e.g., random unlearnable tasks). Together, the results show that information demand has consistent features that can be quantitatively measured at various levels of complexity, and a research agenda exploring these features will greatly expand our understanding of complex decision strategies. |
Jacqueline Gottlieb 🔗 |
Fri 11:30 a.m. - 12:00 p.m.
|
Invited Talk #3: Predictive Cognitive Maps with Multi-scale Successor Representations and Replay
(
Talk
)
Reinforcement Learning's principles of temporal difference learning can drive representation learning, even in the absence of rewards. Representation learning is especially important in problems that require a cognitive map (Tollman, 1947), common in mammalian spatial navigation and non-spatial inference, e.g., shortcut- and latent learning, policy revaluation, and remapping. Here I focus on models of predictive cognitive maps that learn successor representations (SR) at multiple scales, and use replay to update SR maps similar to Dyna models (SR-Dyna). SR- and SR-Dyna based representation learning capture biological representation learning reflected in place-, grid-, and distance to goal cell firing patterns (Stachenfled et al. 2017, Momennejad and Howard 2018), the interaction between boundary vector cells and place cells (De Cothi and Barry 2019), subgoal learning (Weinstein and Botvinick 2014), remapping, policy revaluation, and latent learning behavior (Momennejad et al. 2017; Russek, Momennejad et al. 2017). The SR framework makes testable predictions about representation learning in biological systems: e.g., about how predictive features are extracted from visual experience and abstracted into spatial representations that guide navigation. Specifically, the SR is sensitive to the policy the animal has taken during navigation - generating predictions about the representation of goals and how rewarding locations distort the predictive map. Finally, deep RL using SR has been shown to support option discovery, which is especially useful for empowering agents with intrinsic motivation in environments that have sparse rewards and complex structures. These findings can lead to novel directions of human and animal experimentation. I will summarize behavioral and neural findings in human and rodent studies by us and other groups and discuss the road ahead. |
Ida Momennejad 🔗 |
Fri 12:00 p.m. - 2:00 p.m.
|
Lunch Break & Poster Session
|
🔗 |
Fri 2:00 p.m. - 2:30 p.m.
|
Invited Talk #4: Multi-Agent Interaction and Online Optimization in RL
(
Talk
)
AI and robotics have made inspiring progress over the recent years on training systems to solve specific, well-defined tasks. But the need to specify tasks bounds the level of complexity that can ultimately be reached in training with such an approach. The sharp distinction between training and deployment stages likewise limits the degree to which these systems can improve and adapt after training. In my talk, I will advocate for multi-agent interaction and online optimization processes as key ingredients to towards overcoming these limitations. In the first part, I will show that through multi-agent competition, a simple objective such as hide-and-seek game, and standard reinforcement learning algorithms at scale, agents can create a self-supervised autocurriculum with multiple distinct rounds of emergent strategy, many of which require sophisticated tool use and coordination. Multi-agent interaction leads to behaviors that center around more human-relevant skills than other self-supervised reinforcement learning methods such as intrinsic motivation and holds promise of open-ended growth of complexity. In the second part, I will argue for usefulness and generality of online optimization processes and show examples of incorporating them in model-based control and generative modeling contexts via energy-based models. I will show intriguing advantages, such as compositionality, robustness to distribution shift, non-stationarity, and adversarial attacks in generative modeling problems and planned exploration and fast adaptation to changing environments in control problems. This is joint work with many wonderful colleagues and students at OpenAI, MIT, University of Washington, and UC Berkeley. |
Igor Mordatch 🔗 |
Fri 2:30 p.m. - 3:00 p.m.
|
Invited Talk #5 : Materials Matter: How biologically inspired alternatives to conventional neural networks improve meta-learning and continual learning
(
Talk
)
I will describe how alternatives to conventional neural networks that are very loosely biologically inspired can improve meta-learning, including continual learning. First I will summarize differentiable Hebbian learning and differentiable neuromodulated Hebbian learning (aka “backpropamine”). Both are techniques for training deep neural networks with synaptic plasticity, meaning the weights can change during meta-testing/inference. Whereas meta-learned RNNs can only store within-episode information in their activations, such plastic Hebbian networks can store information in their weights in addition to its activations, improving performance on some classes of problems. Second, I will describe a new, unpublished method that improves the state of the art in continual learning. ANML (A Neuromodulated Meta-Learning algorithm) meta-learns a neuromodulatory network that gates the activity of the main prediction network, enabling the learning of up to 600 simple tasks sequentially. |
Jeff Clune 🔗 |
Fri 3:00 p.m. - 3:30 p.m.
|
Invited Talk #6: Features or Bugs: Synergistic Idiosyncrasies in Human Learning and Decision-Making
(
Talk
)
Combining a multi-armed bandit task and Bayesian computational modeling, we find that humans systematically under-estimate reward availability in the environment. This apparent pessimism turns out to be an optimism bias in disguise, and one that compensates for other idiosyncrasies in human learning and decision-making under uncertainty, such as a default tendency to assume non-stationarity in environmental statistics as well as the adoption of a simplistic decision policy. In particular, reward rate underestimation discourages the decision-maker from switching away from a “good” option, thus achieving near-optimal behavior (which never switches away after a win). Furthermore, we demonstrate that the Bayesian model that best predicts human behavior is equivalent to a particular form of Q-learning often used in the brain sciences, thus providing statistical, normative grounding to phenomenological models of human and animal behavior. |
Angela Yu 🔗 |
Fri 3:30 p.m. - 4:15 p.m.
|
Coffee Break & Poster Session
(
Poster Session
)
|
🔗 |
Fri 4:15 p.m. - 4:30 p.m.
|
Contributed Talk #3 MEMENTO: Further Progress Through Forgetting
(
Talk
)
Modern Reinforcement Learning (RL) algorithms, even those with intrinsic reward bonuses, suffer performance plateaus in hard-exploration domains suggesting these algorithms have reached their ceiling. However, in what we describe as the MEMENTO observation, we find that new agents launched from the position where the previous agent saturated, can reliably make further progress. We show that this is not an artifact of limited model capacity or training duration, but rather indicative of interference in learning dynamics between various stages of the domain [Schaul et al., 2019], signatures of multi-task and continual learning. To mitigate interference we design an end-to-end learning agent which partitions the environment into various segments, and models the value function separately in each score context per Jain et al. [2019]. We demonstrate increased learning performance by this ensemble of agents on Montezuma’s Revenge and further show how this ensemble can be distilled into a single agent with the same model capacity as the original learner. Since the solution is empirically expressible by the original network, this provides evidence of interference and our approach validates an avenue to circumvent it. |
William Fedus 🔗 |
Fri 4:30 p.m. - 5:00 p.m.
|
Invited Talk #7: Richard Sutton
(
Talk
)
|
Richard Sutton 🔗 |
Fri 5:00 p.m. - 6:00 p.m.
|
Panel Discussion led by Grace Lindsay
(
Discussion Panel
)
|
Grace Lindsay · Blake Richards · Doina Precup · Jacqueline Gottlieb · Jeff Clune · Jane Wang · Richard Sutton · Angela Yu · Ida Momennejad 🔗 |
Author Information
Raymond Chua (McGill University / Mila)
Sara Zannone (ICL)
Feryal Behbahani (DeepMind)
Rui Ponte Costa (University of Bristol)
Claudia Clopath (Imperial College London)
Blake Richards (University of Toronto)
Doina Precup (McGill University / Mila / DeepMind Montreal)
More from the Same Authors
-
2021 : Single-Shot Pruning for Offline Reinforcement Learning »
Samin Yeasar Arnob · · Sergey Plis · Doina Precup -
2021 : Importance of Empirical Sample Complexity Analysis for Offline Reinforcement Learning »
Samin Yeasar Arnob · Riashat Islam · Doina Precup -
2022 : The Paradox of Choice: On the Role of Attention in Hierarchical Reinforcement Learning »
Andrei Nica · Khimya Khetarpal · Doina Precup -
2022 : Multi-Environment Pretraining Enables Transfer to Action Limited Datasets »
David Venuto · Mengjiao (Sherry) Yang · Pieter Abbeel · Doina Precup · Igor Mordatch · Ofir Nachum -
2022 : Bayesian Q-learning With Imperfect Expert Demonstrations »
Fengdi Che · Xiru Zhu · Doina Precup · David Meger · Gregory Dudek -
2022 : Complete the Missing Half: Augmenting Aggregation Filtering with Diversification for Graph Convolutional Networks »
Sitao Luan · Mingde Zhao · Chenqing Hua · Xiao-Wen Chang · Doina Precup -
2022 : Bayesian Q-learning With Imperfect Expert Demonstrations »
Fengdi Che · Xiru Zhu · Doina Precup · David Meger · Gregory Dudek -
2022 : When Does Re-initialization Work? »
Sheheryar Zaidi · Tudor Berariu · Hyunjik Kim · Jorg Bornschein · Claudia Clopath · Yee Whye Teh · Razvan Pascanu -
2022 Spotlight: Lightning Talks 3B-3 »
Sitao Luan · Zhiyuan You · Ruofan Liu · Linhao Qu · Yuwei Fu · Jiaxi Wang · Chunyu Wei · Jian Liang · xiaoyuan luo · Di Wu · Yun Lin · Lei Cui · Ji Wu · Chenqing Hua · Yujun Shen · Qincheng Lu · XIANGLIN YANG · Benoit Boulet · Manning Wang · Di Liu · Lei Huang · Fei Wang · Kai Yang · Jiaqi Zhu · Jin Song Dong · Zhijian Song · Xin Lu · Mingde Zhao · Shuyuan Zhang · Yu Zheng · Xiao-Wen Chang · Xinyi Le · Doina Precup -
2022 Spotlight: Revisiting Heterophily For Graph Neural Networks »
Sitao Luan · Chenqing Hua · Qincheng Lu · Jiaqi Zhu · Mingde Zhao · Shuyuan Zhang · Xiao-Wen Chang · Doina Precup -
2022 : Simulating Human Gaze with Neural Visual Attention »
Leo Schwinn · Doina Precup · Bjoern Eskofier · Dario Zanca -
2022 : Simulating Human Gaze with Neural Visual Attention »
Leo Schwinn · Doina Precup · Bjoern Eskofier · Dario Zanca -
2022 Workshop: 3rd Offline Reinforcement Learning Workshop: Offline RL as a "Launchpad" »
Aviral Kumar · Rishabh Agarwal · Aravind Rajeswaran · Wenxuan Zhou · George Tucker · Doina Precup · Aviral Kumar -
2022 Poster: Revisiting Heterophily For Graph Neural Networks »
Sitao Luan · Chenqing Hua · Qincheng Lu · Jiaqi Zhu · Mingde Zhao · Shuyuan Zhang · Xiao-Wen Chang · Doina Precup -
2022 Poster: Single-phase deep learning in cortico-cortical networks »
Will Greedy · Heng Wei Zhu · Joseph Pemberton · Jack Mellor · Rui Ponte Costa -
2022 Poster: Lost in Latent Space: Examining failures of disentangled models at combinatorial generalisation »
Milton Montero · Jeffrey Bowers · Rui Ponte Costa · Casimir Ludwig · Gaurav Malhotra -
2022 Poster: Continuous MDP Homomorphisms and Homomorphic Policy Gradient »
Sahand Rezaei-Shoshtari · Rosie Zhao · Prakash Panangaden · David Meger · Doina Precup -
2021 Workshop: Offline Reinforcement Learning »
Rishabh Agarwal · Aviral Kumar · George Tucker · Justin Fu · Nan Jiang · Doina Precup · Aviral Kumar -
2021 Poster: Cortico-cerebellar networks as decoupling neural interfaces »
Joseph Pemberton · Ellen Boven · Richard Apps · Rui Ponte Costa -
2020 : Closing remarks »
Raymond Chua · Feryal Behbahani · Julie J Lee · Rui Ponte Costa · Doina Precup · Blake Richards · Ida Momennejad -
2020 : Invited Talk #7 QnA - Yael Niv »
Yael Niv · Doina Precup · Raymond Chua · Feryal Behbahani -
2020 : Speaker Introduction: Yael Niv »
Doina Precup · Raymond Chua · Feryal Behbahani -
2020 : Speaker Introduction: Contributed talk#3 speaker »
Feryal Behbahani · Raymond Chua -
2020 : Invited Talk #6 QnA - Catherine Hartley »
Catherine Hartley · Julie J Lee · Raymond Chua · Feryal Behbahani -
2020 : Speaker Introduction: Catherine Hartley »
Julie J Lee · Raymond Chua · Feryal Behbahani -
2020 : Invited Talk #5 QnA - Ishita Dasgupta »
Ishita Dasgupta · Julie J Lee · Feryal Behbahani · Raymond Chua -
2020 : Speaker Introduction: Ishita Dasgupta »
Julie J Lee · Raymond Chua · Feryal Behbahani -
2020 Workshop: Offline Reinforcement Learning »
Aviral Kumar · Rishabh Agarwal · George Tucker · Lihong Li · Doina Precup · Aviral Kumar -
2020 : Panel Discussions »
Grace Lindsay · George Konidaris · Shakir Mohamed · Kimberly Stachenfeld · Peter Dayan · Yael Niv · Doina Precup · Catherine Hartley · Ishita Dasgupta -
2020 : Invited Talk #4 QnA - George Konidaris »
George Konidaris · Raymond Chua · Feryal Behbahani -
2020 : Speaker Introduction: George Konidaris »
Raymond Chua · Feryal Behbahani -
2020 : Invited Talk #3 QnA - Kim Stachenfeld »
Kimberly Stachenfeld · Ida Momennejad · Feryal Behbahani · Raymond Chua -
2020 : Speaker Introduction: Kim Stachenfeld »
Ida Momennejad · Raymond Chua · Feryal Behbahani -
2020 : Speaker Introduction: Contributed talk#2 »
Raymond Chua · Feryal Behbahani · Sara Zannone -
2020 : Speaker Introduction: Contributed talk#1 »
Raymond Chua · Feryal Behbahani -
2020 : Invited Talk #2 QnA - Claudia Clopath (Live, no recording) »
Claudia Clopath · Rui Ponte Costa · Raymond Chua · Feryal Behbahani -
2020 : Invited Talk #2 Claudia Clopath (Live, no recording) - Continual learning with different timescales. »
Claudia Clopath -
2020 : Speaker Introduction: Claudia Clopath »
Raymond Chua · Feryal Behbahani · Rui Ponte Costa -
2020 : Invited talk 1 QnA: Shakir Mohamed »
Shakir Mohamed · Feryal Behbahani · Raymond Chua -
2020 : Speaker Introduction: Shakir Mohamed »
Feryal Behbahani · Raymond Chua -
2020 Workshop: Biological and Artificial Reinforcement Learning »
Raymond Chua · Feryal Behbahani · Julie J Lee · Sara Zannone · Rui Ponte Costa · Blake Richards · Ida Momennejad · Doina Precup -
2020 : Organizers Opening Remarks »
Raymond Chua · Feryal Behbahani · Julie J Lee · Ida Momennejad · Rui Ponte Costa · Blake Richards · Doina Precup -
2020 : Keynote: Doina Precup »
Doina Precup -
2020 Poster: Reward Propagation Using Graph Convolutional Networks »
Martin Klissarov · Doina Precup -
2020 Spotlight: Reward Propagation Using Graph Convolutional Networks »
Martin Klissarov · Doina Precup -
2020 Poster: Modular Meta-Learning with Shrinkage »
Yutian Chen · Abram Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas -
2020 Spotlight: Modular Meta-Learning with Shrinkage »
Yutian Chen · Abram Friesen · Feryal Behbahani · Arnaud Doucet · David Budden · Matthew Hoffman · Nando de Freitas -
2020 : Women at DeepMind: Applying for technical roles »
Feryal Behbahani · Mihaela Rosca · Kate Parkyn -
2020 Poster: An Equivalence between Loss Functions and Non-Uniform Sampling in Experience Replay »
Scott Fujimoto · David Meger · Doina Precup -
2020 Poster: Forethought and Hindsight in Credit Assignment »
Veronica Chelu · Doina Precup · Hado van Hasselt -
2019 : Panel Session: A new hope for neuroscience »
Yoshua Bengio · Blake Richards · Timothy Lillicrap · Ila Fiete · David Sussillo · Doina Precup · Konrad Kording · Surya Ganguli -
2019 : Invited Talk: Sensory prediction error signals in the neocortex »
Blake Richards -
2019 : Poster Presentations »
Rahul Mehta · Andrew Lampinen · Binghong Chen · Sergio Pascual-Diaz · Jordi Grau-Moya · Aldo Faisal · Jonathan Tompson · Yiren Lu · Khimya Khetarpal · Martin Klissarov · Pierre-Luc Bacon · Doina Precup · Thanard Kurutach · Aviv Tamar · Pieter Abbeel · Jinke He · Maximilian Igl · Shimon Whiteson · Wendelin Boehmer · Raphaël Marinier · Olivier Pietquin · Karol Hausman · Sergey Levine · Chelsea Finn · Tianhe Yu · Lisa Lee · Benjamin Eysenbach · Emilio Parisotto · Eric Xing · Ruslan Salakhutdinov · Hongyu Ren · Anima Anandkumar · Deepak Pathak · Christopher Lu · Trevor Darrell · Alexei Efros · Phillip Isola · Feng Liu · Bo Han · Gang Niu · Masashi Sugiyama · Saurabh Kumar · Janith Petangoda · Johan Ferret · James McClelland · Kara Liu · Animesh Garg · Robert Lange -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : Panel Discussion »
Richard Sutton · Doina Precup -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 : Invited Talk: Hierarchical Reinforcement Learning: Computational Advances and Neuroscience Connections »
Doina Precup -
2019 : Panel Discussion led by Grace Lindsay »
Grace Lindsay · Blake Richards · Doina Precup · Jacqueline Gottlieb · Jeff Clune · Jane Wang · Richard Sutton · Angela Yu · Ida Momennejad -
2019 : Opening Remarks »
Raymond Chua · Feryal Behbahani · Sara Zannone · Rui Ponte Costa · Claudia Clopath · Doina Precup · Blake Richards -
2019 Poster: Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks »
Sitao Luan · Mingde Zhao · Xiao-Wen Chang · Doina Precup -
2018 Poster: Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures »
Sergey Bartunov · Adam Santoro · Blake Richards · Luke Marris · Geoffrey E Hinton · Timothy Lillicrap -
2018 Poster: Temporal Regularization for Markov Decision Process »
Pierre Thodoroff · Audrey Durand · Joelle Pineau · Doina Precup -
2018 Poster: Learning Safe Policies with Expert Guidance »
Jessie Huang · Fa Wu · Doina Precup · Yang Cai -
2017 : Panel Discussion »
Matt Botvinick · Emma Brunskill · Marcos Campos · Jan Peters · Doina Precup · David Silver · Josh Tenenbaum · Roy Fox -
2017 : Progress on Deep Reinforcement Learning with Temporal Abstraction (Doina Precup) »
Doina Precup -
2017 : Doina Precup »
Doina Precup -
2017 Workshop: Hierarchical Reinforcement Learning »
Andrew G Barto · Doina Precup · Shie Mannor · Tom Schaul · Roy Fox · Carlos Florensa -
2016 Workshop: The Future of Interactive Machine Learning »
Kory Mathewson @korymath · Kaushik Subramanian · Mark Ho · Robert Loftin · Joseph L Austerweil · Anna Harutyunyan · Doina Precup · Layla El Asri · Matthew Gombolay · Jerry Zhu · Sonia Chernova · Charles Isbell · Patrick M Pilarski · Weng-Keen Wong · Manuela Veloso · Julie A Shah · Matthew Taylor · Brenna Argall · Michael Littman -
2015 Poster: Data Generation as Sequential Decision Making »
Philip Bachman · Doina Precup -
2015 Spotlight: Data Generation as Sequential Decision Making »
Philip Bachman · Doina Precup -
2015 Poster: Basis refinement strategies for linear value function approximation in MDPs »
Gheorghe Comanici · Doina Precup · Prakash Panangaden -
2014 Workshop: From Bad Models to Good Policies (Sequential Decision Making under Uncertainty) »
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar -
2014 Poster: Optimizing Energy Production Using Policy Search and Predictive State Representations »
Yuri Grinberg · Doina Precup · Michel Gendreau -
2014 Poster: Learning with Pseudo-Ensembles »
Philip Bachman · Ouais Alsharif · Doina Precup -
2014 Spotlight: Optimizing Energy Production Using Policy Search and Predictive State Representations »
Yuri Grinberg · Doina Precup · Michel Gendreau -
2013 Poster: Learning from Limited Demonstrations »
Beomjoon Kim · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2013 Poster: Bellman Error Based Feature Generation using Random Projections on Sparse Spaces »
Mahdi Milani Fard · Yuri Grinberg · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2013 Spotlight: Learning from Limited Demonstrations »
Beomjoon Kim · Amir-massoud Farahmand · Joelle Pineau · Doina Precup -
2012 Poster: Value Pursuit Iteration »
Amir-massoud Farahmand · Doina Precup -
2012 Poster: On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization »
Andre S Barreto · Doina Precup · Joelle Pineau -
2011 Poster: Reinforcement Learning using Kernel-Based Stochastic Factorization »
Andre S Barreto · Doina Precup · Joelle Pineau -
2009 Poster: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2009 Spotlight: Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation »
Hamid R Maei · Csaba Szepesvari · Shalabh Batnaghar · Doina Precup · David Silver · Richard Sutton -
2008 Poster: Bounding Performance Loss in Approximate MDP Homomorphisms »
Doina Precup · Jonathan Taylor Taylor · Prakash Panangaden