Timezone: »
We study Reinforcement Learning for partially observable systems using function approximation. We propose a new PO-bilinear framework, that is general enough to include models such as undercomplete tabular Partially Observable Markov Decision Processes (POMDPs), Linear Quadratic Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs. Under this framework, we propose an actor-critic style algorithm that is capable to performing agnostic policy learning. Given a policy class that consists of memory based policies (i.e., policy that looks at a fixed-length window of recent observations), and a value function class that consists of functions taking both memory and future observations as inputs, our algorithm learns to compete against the best memory-based policy among the policy class. For certain examples such as undercomplete POMDPs and LQGs, by leveraging their special properties, our algorithm is even capable of competing against the globally optimal policy without paying an exponential dependence on the horizon.
Author Information
Masatoshi Uehara (Cornell University)
Ayush Sekhari (Cornell University)
Jason Lee (University of Southern California)
Nathan Kallus (Cornell University)
Wen Sun (Cornell University)
More from the Same Authors
-
2021 : Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage »
Masatoshi Uehara · Wen Sun -
2022 : Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability »
Alex Damian · Eshaan Nichani · Jason Lee -
2022 : Hybrid RL: Using both offline and online data can make RL efficient »
Yuda Song · Yifei Zhou · Ayush Sekhari · J. Bagnell · Akshay Krishnamurthy · Wen Sun -
2022 : Provable Benefits of Representational Transfer in Reinforcement Learning »
Alekh Agarwal · Yuda Song · Kaiwen Wang · Mengdi Wang · Wen Sun · Xuezhou Zhang -
2022 : Hidden Poison: Machine Unlearning Enables Camouflaged Poisoning Attacks »
Jimmy Di · Jack Douglas · Jayadev Acharya · Gautam Kamath · Ayush Sekhari -
2022 : Hidden Poison: Machine unlearning enables camouflaged poisoning attacks »
Jimmy Di · Jack Douglas · Jayadev Acharya · Gautam Kamath · Ayush Sekhari -
2022 Panel: Panel 4A-2: Adaptively Exploiting d-Separators… & On the Complexity… »
Blair Bilodeau · Ayush Sekhari -
2022 Panel: Panel 3C-5: Biologically-Plausible Determinant Maximization… & What's the Harm? ... »
Bariscan Bozkurt · Nathan Kallus -
2022 Poster: Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials »
Eshaan Nichani · Yu Bai · Jason Lee -
2022 Poster: The Implicit Delta Method »
Nathan Kallus · James McInerney -
2022 Poster: Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent »
Zhiyuan Li · Tianhao Wang · Jason Lee · Sanjeev Arora -
2022 Poster: What's the Harm? Sharp Bounds on the Fraction Negatively Affected by Treatment »
Nathan Kallus -
2022 Poster: On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias »
Itay Safran · Gal Vardi · Jason Lee -
2022 Poster: From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent »
Christopher De Sa · Satyen Kale · Jason Lee · Ayush Sekhari · Karthik Sridharan -
2022 Poster: On the Complexity of Adversarial Decision Making »
Dylan J Foster · Alexander Rakhlin · Ayush Sekhari · Karthik Sridharan -
2021 : Representation Learning for Online and Offline RL in Low-rank MDPs »
Masatoshi Uehara · Xuezhou Zhang · Wen Sun -
2021 Workshop: Causal Inference Challenges in Sequential Decision Making: Bridging Theory and Practice »
Aurelien Bibaut · Maria Dimakopoulou · Nathan Kallus · Xinkun Nie · Masatoshi Uehara · Kelly Zhang -
2021 : Representation Learning for Online and Offline RL in Low-rank MDPs »
Masatoshi Uehara · Xuezhou Zhang · Wen Sun -
2021 : Q/A Session »
Wen Sun · Shixia Liu -
2021 : Speaker Introduction »
Wen Sun -
2021 Workshop: eXplainable AI approaches for debugging and diagnosis »
Roberto Capobianco · Biagio La Rosa · Leilani Gilpin · Wen Sun · Alice Xiang · Alexander Feldman -
2021 Poster: How Fine-Tuning Allows for Effective Meta-Learning »
Kurtland Chua · Qi Lei · Jason Lee -
2021 Poster: Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning »
Aurelien Bibaut · Nathan Kallus · Maria Dimakopoulou · Antoine Chambaz · Mark van der Laan -
2021 Poster: Label Noise SGD Provably Prefers Flat Global Minimizers »
Alex Damian · Tengyu Ma · Jason Lee -
2021 Poster: Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage »
Jonathan Chang · Masatoshi Uehara · Dhruv Sreenivas · Rahul Kidambi · Wen Sun -
2021 Poster: Going Beyond Linear RL: Sample Efficient Neural Function Approximation »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2021 Poster: Control Variates for Slate Off-Policy Evaluation »
Nikos Vlassis · Ashok Chandrashekar · Fernando Amat · Nathan Kallus -
2021 Poster: Predicting What You Already Know Helps: Provable Self-Supervised Learning »
Jason Lee · Qi Lei · Nikunj Saunshi · JIACHENG ZHUO -
2021 Poster: Optimal Gradient-based Algorithms for Non-concave Bandit Optimization »
Baihe Huang · Kaixuan Huang · Sham Kakade · Jason Lee · Qi Lei · Runzhe Wang · Jiaqi Yang -
2021 Poster: Post-Contextual-Bandit Inference »
Aurelien Bibaut · Maria Dimakopoulou · Nathan Kallus · Antoine Chambaz · Mark van der Laan -
2020 Workshop: Consequential Decisions in Dynamic Environments »
Niki Kilbertus · Angela Zhou · Ashia Wilson · John Miller · Lily Hu · Lydia T. Liu · Nathan Kallus · Shira Mitchell -
2020 : Spotlight Talk 4: Fairness, Welfare, and Equity in Personalized Pricing »
Nathan Kallus · Angela Zhou -
2020 Poster: Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning »
Nathan Kallus · Angela Zhou -
2020 Poster: Off-Policy Evaluation and Learning for External Validity under a Covariate Shift »
Masatoshi Uehara · Masahiro Kato · Shota Yasui -
2020 Spotlight: Off-Policy Evaluation and Learning for External Validity under a Covariate Shift »
Masatoshi Uehara · Masahiro Kato · Shota Yasui -
2020 Poster: Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies »
Nathan Kallus · Masatoshi Uehara -
2019 : Coffee Break and Poster Session »
Rameswar Panda · Prasanna Sattigeri · Kush Varshney · Karthikeyan Natesan Ramamurthy · Harvineet Singh · Vishwali Mhasawade · Shalmali Joshi · Laleh Seyyed-Kalantari · Matthew McDermott · Gal Yona · James Atwood · Hansa Srinivasan · Yonatan Halpern · D. Sculley · Behrouz Babaki · Margarida Carvalho · Josie Williams · Narges Razavian · Haoran Zhang · Amy Lu · Irene Y Chen · Xiaojie Mao · Angela Zhou · Nathan Kallus -
2019 : Opening Remarks »
Thorsten Joachims · Nathan Kallus · Michele Santacatterina · Adith Swaminathan · David Sontag · Angela Zhou -
2019 Workshop: “Do the right thing”: machine learning and causal inference for improved decision making »
Michele Santacatterina · Thorsten Joachims · Nathan Kallus · Adith Swaminathan · David Sontag · Angela Zhou -
2019 : Nathan Kallus: Efficiently Breaking the Curse of Horizon with Double Reinforcement Learning »
Nathan Kallus -
2019 Poster: The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the XAUC Metric »
Nathan Kallus · Angela Zhou -
2019 Poster: Assessing Disparate Impact of Personalized Interventions: Identifiability and Bounds »
Nathan Kallus · Angela Zhou -
2019 Poster: Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning »
Nathan Kallus · Masatoshi Uehara -
2019 Poster: Policy Evaluation with Latent Confounders via Optimal Balance »
Andrew Bennett · Nathan Kallus -
2019 Poster: Deep Generalized Method of Moments for Instrumental Variable Analysis »
Andrew Bennett · Nathan Kallus · Tobias Schnabel -
2018 : Contributed Talk 1 »
Jason Lee -
2018 Workshop: Challenges and Opportunities for AI in Financial Services: the Impact of Fairness, Explainability, Accuracy, and Privacy »
Manuela Veloso · Nathan Kallus · Sameena Shah · Senthil Kumar · Isabelle Moulinier · Jiahao Chen · John Paisley -
2018 Poster: Causal Inference with Noisy and Missing Covariates via Matrix Factorization »
Nathan Kallus · Xiaojie Mao · Madeleine Udell -
2018 Poster: Implicit Bias of Gradient Descent on Linear Convolutional Networks »
Suriya Gunasekar · Jason Lee · Daniel Soudry · Nati Srebro -
2018 Poster: Removing Hidden Confounding by Experimental Grounding »
Nathan Kallus · Aahlad Puli · Uri Shalit -
2018 Spotlight: Removing Hidden Confounding by Experimental Grounding »
Nathan Kallus · Aahlad Puli · Uri Shalit -
2018 Poster: Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced »
Simon Du · Wei Hu · Jason Lee -
2018 Poster: Confounding-Robust Policy Improvement »
Nathan Kallus · Angela Zhou -
2018 Poster: Adding One Neuron Can Eliminate All Bad Local Minima »
SHIYU LIANG · Ruoyu Sun · Jason Lee · R. Srikant -
2018 Poster: Balanced Policy Evaluation and Learning »
Nathan Kallus -
2018 Poster: Provably Correct Automatic Sub-Differentiation for Qualified Programs »
Sham Kakade · Jason Lee -
2018 Poster: On the Convergence and Robustness of Training GANs with Regularized Optimal Transport »
Maziar Sanjabi · Jimmy Ba · Meisam Razaviyayn · Jason Lee -
2017 Workshop: From 'What If?' To 'What Next?' : Causal Inference and Machine Learning for Intelligent Decision Making »
Ricardo Silva · Panagiotis Toulis · John Shawe-Taylor · Alexander Volfovsky · Thorsten Joachims · Lihong Li · Nathan Kallus · Adith Swaminathan -
2017 Poster: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Spotlight: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2016 Oral: Matrix Completion has No Spurious Local Minimum »
Rong Ge · Jason Lee · Tengyu Ma -
2016 Poster: Matrix Completion has No Spurious Local Minimum »
Rong Ge · Jason Lee · Tengyu Ma -
2015 Poster: Evaluating the statistical significance of biclusters »
Jason D Lee · Yuekai Sun · Jonathan E Taylor -
2014 Poster: Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices »
Austin Benson · Jason D Lee · Bartek Rajwa · David F Gleich -
2014 Spotlight: Scalable Methods for Nonnegative Matrix Factorizations of Near-separable Tall-and-skinny Matrices »
Austin Benson · Jason D Lee · Bartek Rajwa · David F Gleich -
2014 Poster: Exact Post Model Selection Inference for Marginal Screening »
Jason D Lee · Jonathan E Taylor -
2013 Poster: On model selection consistency of penalized M-estimators: a geometric theory »
Jason D Lee · Yuekai Sun · Jonathan E Taylor -
2013 Poster: Using multiple samples to learn mixture models »
Jason D Lee · Ran Gilad-Bachrach · Rich Caruana -
2013 Spotlight: Using multiple samples to learn mixture models »
Jason D Lee · Ran Gilad-Bachrach · Rich Caruana -
2012 Poster: Proximal Newton-type Methods for Minimizing Convex Objective Functions in Composite Form »
Jason D Lee · Yuekai Sun · Michael Saunders -
2010 Poster: Practical Large-Scale Optimization for Max-norm Regularization »
Jason D Lee · Benjamin Recht · Russ Salakhutdinov · Nati Srebro · Joel A Tropp