Timezone: »
A central problem in online learning and decision making---from bandits to reinforcement learning---is to understand what modeling assumptions lead to sample-efficient learning guarantees. We consider a general adversarial decision making framework that encompasses (structured) bandit problems with adversarial rewards and reinforcement learning problems with adversarial dynamics. Our main result is to show---via new upper and lower bounds---that the Decision-Estimation Coefficient, a complexity measure introduced by Foster et al. in the stochastic counterpart to our setting, is necessary and sufficient to obtain low regret for adversarial decision making. However, compared to the stochastic setting, one must apply the Decision-Estimation Coefficient to the convex hull of the class of models (or, hypotheses) under consideration. This establishes that the price of accommodating adversarial rewards or dynamics is governed by the behavior of the model class under convexification, and recovers a number of existing results --both positive and negative. En route to obtaining these guarantees, we provide new structural results that connect the Decision-Estimation Coefficient to variants of other well-known complexity measures, including the Information Ratio of Russo and Van Roy and the Exploration-by-Optimization objective of Lattimore and György.
Author Information
Dylan J Foster (Microsoft Research)
Alexander Rakhlin (MIT)
Ayush Sekhari (Cornell University)
Karthik Sridharan (Cornell University)
More from the Same Authors
-
2022 : Hybrid RL: Using both offline and online data can make RL efficient »
Yuda Song · Yifei Zhou · Ayush Sekhari · J. Bagnell · Akshay Krishnamurthy · Wen Sun -
2022 : Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks »
Jimmy Di · Jack Douglas · Jayadev Acharya · Gautam Kamath · Ayush Sekhari -
2022 : Hidden Poison: Machine unlearning enables camouflaged poisoning attacks »
Jimmy Di · Jack Douglas · Jayadev Acharya · Gautam Kamath · Ayush Sekhari -
2023 Poster: Convergence of Adam under Relaxed Assumptions »
Haochuan Li · Ali Jadbabaie · Alexander Rakhlin -
2023 Poster: Contextual Bandits and Imitation Learning with Preference-Based Active Queries »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 Poster: Selective Sampling and Imitation Learning via Online Regression »
Ayush Sekhari · Karthik Sridharan · Wen Sun · Runzhe Wu -
2023 Poster: Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks »
Jimmy Di · Jack Douglas · Jayadev Acharya · Gautam Kamath · Ayush Sekhari -
2023 Poster: Efficient Model-Free Exploration in Low-Rank MDPs »
Zak Mhammedi · Adam Block · Dylan J Foster · Alexander Rakhlin -
2023 Poster: Model-Free Reinforcement Learning with the Decision-Estimation Coefficient »
Dylan J Foster · Noah Golowich · Jian Qian · Alexander Rakhlin · Ayush Sekhari -
2023 Poster: Beyond Lipschitz Smoothness: A New Approach to Convex and Non-Convex Optimization »
Haochuan Li · Jian Qian · Yi Tian · Ali Jadbabaie · Alexander Rakhlin -
2023 Poster: When is Agnostic Reinforcement Learning Statistically Tractable? »
Zeyu Jia · Gene Li · Alexander Rakhlin · Ayush Sekhari · Nati Srebro -
2023 Poster: On the Variance, Admissibility, and Stability of Empirical Risk Minimization »
Gil Kur · Eli Putterman · Alexander Rakhlin -
2022 Panel: Panel 4A-2: Adaptively Exploiting d-Separators… & On the Complexity… »
Blair Bilodeau · Ayush Sekhari -
2022 Poster: Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2022 Poster: Interaction-Grounded Learning with Action-Inclusive Feedback »
Tengyang Xie · Akanksha Saran · Dylan J Foster · Lekan Molu · Ida Momennejad · Nan Jiang · Paul Mineiro · John Langford -
2022 Poster: Understanding the Eluder Dimension »
Gene Li · Pritish Kamath · Dylan J Foster · Nati Srebro -
2022 Poster: From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent »
Christopher De Sa · Satyen Kale · Jason Lee · Ayush Sekhari · Karthik Sridharan -
2020 : Keynote 4: Alexander Rakhlin »
Alexander Rakhlin -
2020 Poster: Online learning with dynamics: A minimax perspective »
Kush Bhatia · Karthik Sridharan -
2020 Poster: Reinforcement Learning with Feedback Graphs »
Christoph Dann · Yishay Mansour · Mehryar Mohri · Ayush Sekhari · Karthik Sridharan -
2020 Poster: Learning the Linear Quadratic Regulator from Nonlinear Observations »
Zakaria Mhammedi · Dylan Foster · Max Simchowitz · Dipendra Misra · Wen Sun · Akshay Krishnamurthy · Alexander Rakhlin · John Langford -
2019 Poster: Hypothesis Set Stability and Generalization »
Dylan Foster · Spencer Greenberg · Satyen Kale · Haipeng Luo · Mehryar Mohri · Karthik Sridharan -
2018 Poster: Uniform Convergence of Gradients for Non-Convex Learning and Optimization »
Dylan Foster · Ayush Sekhari · Karthik Sridharan -
2017 Poster: Spectrally-normalized margin bounds for neural networks »
Peter Bartlett · Dylan J Foster · Matus Telgarsky -
2017 Spotlight: Spectrally-normalized margin bounds for neural networks »
Peter Bartlett · Dylan J Foster · Matus Telgarsky -
2017 Poster: Parameter-Free Online Learning via Model Selection »
Dylan J Foster · Satyen Kale · Mehryar Mohri · Karthik Sridharan -
2017 Spotlight: Parameter-Free Online Learning via Model Selection »
Dylan J Foster · Satyen Kale · Mehryar Mohri · Karthik Sridharan -
2016 Poster: Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters »
Zeyuan Allen-Zhu · Yang Yuan · Karthik Sridharan -
2016 Poster: Learning in Games: Robustness of Fast Convergence »
Dylan Foster · zhiyuan li · Thodoris Lykouris · Karthik Sridharan · Eva Tardos -
2015 Poster: Adaptive Online Learning »
Dylan Foster · Alexander Rakhlin · Karthik Sridharan -
2015 Spotlight: Adaptive Online Learning »
Dylan Foster · Alexander Rakhlin · Karthik Sridharan