firstbacksecondback
Filter by Keyword:
197 Results
Poster
|
Tue 17:30 |
On the Utility of Learning about Humans for Human-AI Coordination Micah Carroll · Rohin Shah · Mark Ho · Tom Griffiths · Sanjit Seshia · Pieter Abbeel · Anca Dragan |
|
Poster
|
Tue 17:30 |
Convergent Policy Optimization for Safe Reinforcement Learning Ming Yu · Zhuoran Yang · Mladen Kolar · Zhaoran Wang |
|
Poster
|
Thu 10:45 |
RUDDER: Return Decomposition for Delayed Rewards Jose A. Arjona-Medina · Michael Gillhofer · Michael Widrich · Thomas Unterthiner · Johannes Brandstetter · Sepp Hochreiter |
|
Poster
|
Tue 17:30 |
Learning dynamic polynomial proofs Alhussein Fawzi · Mateusz Malinowski · Hamza Fawzi · Omar Fawzi |
|
Poster
|
Tue 10:45 |
A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment Felix Leibfried · Sergio Pascual-Díaz · Jordi Grau-Moya |
|
Poster
|
Wed 10:45 |
Variance Reduced Policy Evaluation with Smooth Function Approximation Hoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang |
|
Poster
|
Thu 10:45 |
Control What You Can: Intrinsically Motivated Task-Planning Agent Sebastian Blaes · Marin Vlastelica Pogančić · Jiajie Zhu · Georg Martius |
|
Poster
|
Thu 10:45 |
Non-Stationary Markov Decision Processes, a Worst-Case Approach using Model-Based Reinforcement Learning Erwan Lecarpentier · Emmanuel Rachelson |
|
Poster
|
Tue 17:30 |
Real-Time Reinforcement Learning Simon Ramstedt · Chris Pal |
|
Poster
|
Tue 10:45 |
Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples Tengyu Xu · Shaofeng Zou · Yingbin Liang |
|
Poster
|
Wed 17:00 |
Neural Temporal-Difference Learning Converges to Global Optima Qi Cai · Zhuoran Yang · Jason Lee · Zhaoran Wang |
|
Poster
|
Thu 10:45 |
Planning in entropy-regularized Markov decision processes and games Jean-Bastien Grill · Omar Darwiche Domingues · Pierre Menard · Remi Munos · Michal Valko |