Timezone: »
Poster
Near-Optimal Time and Sample Complexities for Solving Markov Decision Processes with a Generative Model
Aaron Sidford · Mengdi Wang · Xian Wu · Lin Yang · Yinyu Ye
In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time. Given such a DMDP with states $\states$, actions $\actions$, discount factor $\gamma\in(0,1)$, and rewards in range $[0, 1]$ we provide an algorithm which computes an $\epsilon$-optimal policy with probability $1 - \delta$ where {\it both} the run time spent and number of sample taken is upper bounded by
\[
O\left[\frac{|\cS||\cA|}{(1-\gamma)^3 \epsilon^2} \log \left(\frac{|\cS||\cA|}{(1-\gamma)\delta \epsilon}
\right)
\log\left(\frac{1}{(1-\gamma)\epsilon}\right)\right] ~.
\]
For fixed values of $\epsilon$, this improves upon the previous best known bounds by a factor of $(1 - \gamma)^{-1}$ and matches the sample complexity lower bounds proved in \cite{azar2013minimax} up to logarithmic factors.
We also extend our method to computing $\epsilon$-optimal policies for finite-horizon MDP with a generative model and provide a nearly matching sample complexity lower bound.
Author Information
Aaron Sidford (Stanford)
Mengdi Wang (Princeton University)
Mengdi Wang is interested in data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi became an assistant professor at Princeton in 2014. She received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years).
Xian Wu (Stanford University)
Lin Yang (Princeton University)
Yinyu Ye (Standord)
More from the Same Authors
-
2021 Spotlight: On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method »
Junyu Zhang · Chengzhuo Ni · zheng Yu · Csaba Szepesvari · Mengdi Wang -
2022 : DIMENSION-REDUCED ADAPTIVE GRADIENT METHOD »
Jingyang Li · Pan Zhou · Kuangyu Ding · Kim-Chuan Toh · Yinyu Ye -
2022 : Semi-Random Sparse Recovery in Nearly-Linear Time »
Jonathan Kelner · Jerry Li · Allen Liu · Aaron Sidford · Kevin Tian -
2022 : How Small Amount of Data Sharing Benefits Higher-Order Distributed Optimization and Learning »
Mingxi Zhu · Yinyu Ye -
2022 : Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization »
Kaixuan Huang · Yu Wu · Xuezhou Zhang · Shenyinying Tu · Qingyun Wu · Mengdi Wang · Huazheng Wang -
2022 : Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization »
Kaixuan Huang · Yu Wu · Xuezhou Zhang · Shenyinying Tu · Qingyun Wu · Mengdi Wang · Huazheng Wang -
2022 : Pareto-Optimal Diagnostic Policy Learning in Clinical Applications via Semi-Model-Based Deep Reinforcement Learning »
zheng Yu · Yikuan Li · Joseph Kim · Kaixuan Huang · Yuan Luo · Mengdi Wang -
2022 : Pareto-Optimal Diagnostic Policy Learning in Clinical Applications via Semi-Model-Based Deep Reinforcement Learning »
zheng Yu · Yikuan Li · Joseph Kim · Kaixuan Huang · Yuan Luo · Mengdi Wang -
2022 : Provable Benefits of Representational Transfer in Reinforcement Learning »
Alekh Agarwal · Yuda Song · Kaiwen Wang · Mengdi Wang · Wen Sun · Xuezhou Zhang -
2023 Poster: Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective »
Zeyu Zhang · Yi Su · Hui Yuan · Yiran Wu · Rishab Balasubramanian · Qingyun Wu · Huazheng Wang · Mengdi Wang -
2023 Poster: Parallel Submodular Function Minimization »
Deeparnab Chakrabarty · Andrei Graur · Haotian Jiang · Aaron Sidford -
2023 Poster: Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations »
Minshuo Chen · Yu Bai · H. Vincent Poor · Mengdi Wang -
2023 Poster: Structured Semidefinite Programming for Recovering Structured Preconditioners »
Arun Jambulapati · Jerry Li · Christopher Musco · Kirankumar Shiragur · Aaron Sidford · Kevin Tian -
2023 Poster: Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation »
Lijing Kuang · Ming Yin · Mengdi Wang · Yu-Xiang Wang · Yian Ma -
2023 Poster: Quantum speedups for stochastic optimization »
Aaron Sidford · Chenyi Zhang -
2023 Poster: Towards Optimal Effective Resistance Estimation »
Rajat Vadiraj Dwaraknath · Ishani Karmarkar · Aaron Sidford -
2023 Poster: Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement »
Hui Yuan · Kaixuan Huang · Chengzhuo Ni · Minshuo Chen · Mengdi Wang -
2023 Workshop: OPT 2023: Optimization for Machine Learning »
Cristóbal Guzmán · Courtney Paquette · Katya Scheinberg · Aaron Sidford · Sebastian Stich -
2022 : Aaron Sidford, Efficiently Minimizing the Maximum Loss »
Aaron Sidford -
2022 Poster: Communication Efficient Distributed Learning for Kernelized Contextual Bandits »
Chuanhao Li · Huazheng Wang · Mengdi Wang · Hongning Wang -
2022 Poster: Optimal and Adaptive Monteiro-Svaiter Acceleration »
Yair Carmon · Danielle Hausler · Arun Jambulapati · Yujia Jin · Aaron Sidford -
2022 Poster: Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks »
Shuoguang Yang · Xuezhou Zhang · Mengdi Wang -
2022 Poster: Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization »
Hui Yuan · Chengzhuo Ni · Huazheng Wang · Xuezhou Zhang · Le Cong · Csaba Szepesvari · Mengdi Wang -
2022 Poster: On the Efficient Implementation of High Accuracy Optimality of Profile Maximum Likelihood »
Moses Charikar · Zhihao Jiang · Kirankumar Shiragur · Aaron Sidford -
2021 : Online Learning via Linear Programming, Yinyu Ye »
Yinyu Ye -
2021 Poster: Stochastic Bias-Reduced Gradient Methods »
Hilal Asi · Yair Carmon · Arun Jambulapati · Yujia Jin · Aaron Sidford -
2021 Poster: On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method »
Junyu Zhang · Chengzhuo Ni · zheng Yu · Csaba Szepesvari · Mengdi Wang -
2020 Poster: Simple and Fast Algorithm for Binary Integer and Online Linear Programming »
Xiaocheng Li · Chunlin Sun · Yinyu Ye -
2020 Poster: Generalized Leverage Score Sampling for Neural Networks »
Jason Lee · Ruoqi Shen · Zhao Song · Mengdi Wang · zheng Yu -
2020 Poster: High-Dimensional Sparse Linear Bandits »
Botao Hao · Tor Lattimore · Mengdi Wang -
2020 Poster: Conic Descent and its Application to Memory-efficient Optimization over Positive Semidefinite Matrices »
John Duchi · Oliver Hinder · Andrew Naber · Yinyu Ye -
2020 Poster: Instance Based Approximations to Profile Maximum Likelihood »
Nima Anari · Moses Charikar · Kirankumar Shiragur · Aaron Sidford -
2020 Poster: Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms »
Dheeraj Nagaraj · Xian Wu · Guy Bresler · Prateek Jain · Praneeth Netrapalli -
2020 Poster: Acceleration with a Ball Optimization Oracle »
Yair Carmon · Arun Jambulapati · Qijia Jiang · Yujia Jin · Yin Tat Lee · Aaron Sidford · Kevin Tian -
2020 Poster: Large-Scale Methods for Distributionally Robust Optimization »
Daniel Levy · Yair Carmon · John Duchi · Aaron Sidford -
2020 Spotlight: Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms »
Dheeraj Nagaraj · Xian Wu · Guy Bresler · Prateek Jain · Praneeth Netrapalli -
2020 Oral: Acceleration with a Ball Optimization Oracle »
Yair Carmon · Arun Jambulapati · Qijia Jiang · Yujia Jin · Yin Tat Lee · Aaron Sidford · Kevin Tian -
2020 Poster: Distributionally Robust Local Non-parametric Conditional Estimation »
Viet Anh Nguyen · Fan Zhang · Jose Blanchet · Erick Delage · Yinyu Ye -
2020 Poster: Variational Policy Gradient Method for Reinforcement Learning with General Utilities »
Junyu Zhang · Alec Koppel · Amrit Singh Bedi · Csaba Szepesvari · Mengdi Wang -
2020 Spotlight: Variational Policy Gradient Method for Reinforcement Learning with General Utilities »
Junyu Zhang · Alec Koppel · Amrit Singh Bedi · Csaba Szepesvari · Mengdi Wang -
2020 Poster: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces »
Zhuoran Yang · Chi Jin · Zhaoran Wang · Mengdi Wang · Michael Jordan -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 : Unsupervised State Embedding and Aggregation towards Scalable Reinforcement Learning »
Mengdi Wang -
2019 : Poster Session »
Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis -
2019 Poster: A General Framework for Symmetric Property Estimation »
Moses Charikar · Kirankumar Shiragur · Aaron Sidford -
2019 Poster: Variance Reduction for Matrix Games »
Yair Carmon · Yujia Jin · Aaron Sidford · Kevin Tian -
2019 Oral: Variance Reduction for Matrix Games »
Yair Carmon · Yujia Jin · Aaron Sidford · Kevin Tian -
2019 Poster: Principal Component Projection and Regression in Nearly Linear Time through Asymmetric SVRG »
Yujia Jin · Aaron Sidford -
2019 Spotlight: Principal Component Projection and Regression in Nearly Linear Time through Asymmetric SVRG »
Yujia Jin · Aaron Sidford -
2019 Poster: State Aggregation Learning from Markov Transition Data »
Yaqi Duan · Zheng Tracy Ke · Mengdi Wang -
2019 Poster: Interior-Point Methods Strike Back: Solving the Wasserstein Barycenter Problem »
DongDong Ge · Haoyue Wang · Zikai Xiong · Yinyu Ye -
2019 Poster: Complexity of Highly Parallel Non-Smooth Convex Optimization »
Sebastien Bubeck · Qijia Jiang · Yin-Tat Lee · Yuanzhi Li · Aaron Sidford -
2019 Spotlight: Complexity of Highly Parallel Non-Smooth Convex Optimization »
Sebastien Bubeck · Qijia Jiang · Yin-Tat Lee · Yuanzhi Li · Aaron Sidford -
2019 Poster: Learning low-dimensional state embeddings and metastable clusters from time series data »
Yifan Sun · Yaqi Duan · Hao Gong · Mengdi Wang -
2019 Poster: A Direct tilde{O}(1/epsilon) Iteration Parallel Algorithm for Optimal Transport »
Arun Jambulapati · Aaron Sidford · Kevin Tian -
2018 Poster: Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization »
Minshuo Chen · Lin Yang · Mengdi Wang · Tuo Zhao -
2018 Poster: Exploiting Numerical Sparsity for Efficient Learning : Faster Eigenvector Computation and Regression »
Neha Gupta · Aaron Sidford -
2018 Poster: The Physical Systems Behind Optimization Algorithms »
Lin Yang · Raman Arora · Vladimir Braverman · Tuo Zhao -
2017 Poster: Diffusion Approximations for Online Principal Component Estimation and Global Convergence »
Chris Junchi Li · Mengdi Wang · Tong Zhang -
2017 Oral: Diffusion Approximations for Online Principal Component Estimation and Global Convergence »
Chris Junchi Li · Mengdi Wang · Tong Zhang -
2016 Poster: Accelerating Stochastic Composition Optimization »
Mengdi Wang · Ji Liu · Ethan Fang