Timezone: »
Poster
Preference-based Reinforcement Learning with Finite-Time Guarantees
Yichong Xu · Ruosong Wang · Lin Yang · Aarti Singh · Artur Dubrawski
Preference-based Reinforcement Learning (PbRL) replaces reward values in traditional reinforcement learning by preferences to better elicit human opinion on the target objective, especially when numerical reward values are hard to design or interpret.
Despite promising results in applications, the theoretical understanding of PbRL is still in its infancy.
In this paper, we present the first finite-time analysis for general PbRL problems.
We first show that a unique optimal policy may not exist if preferences over trajectories are deterministic for PbRL.
If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability. Our method explores the state space by navigating to under-explored states, and solves PbRL using a combination of dueling bandits and policy search.
Experiments show the efficacy of our method when it is applied to real-world problems.
Author Information
Yichong Xu (Microsoft)
Ruosong Wang (Carnegie Mellon University)
Lin Yang (UCLA)
Aarti Singh (CMU)
Artur Dubrawski (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Spotlight: Preference-based Reinforcement Learning with Finite-Time Guarantees »
Wed. Dec 9th 04:20 -- 04:30 AM Room Orals & Spotlights: Reinforcement Learning
More from the Same Authors
-
2021 : Robust Interpretable Rule Learning to Identify Expertise Transfer Opportunities in Healthcare »
Willa Potosnak · Sebastian Caldas Rivera · Gilles Clermont · Kyle Miller · Artur Dubrawski -
2021 : Predicting Sufficiency for Hemorrhage Resuscitation Using Non-invasive Physiological Data without Reference to Personal Baselines »
Xinyu Li · Michael Pinsky · Artur Dubrawski -
2021 : Doubly Pessimistic Algorithms for Strictly Safe Off-Policy Optimization »
Sanae Amani · Lin Yang -
2022 : From Local to Global: Spectral-Inspired Graph Neural Networks »
Ningyuan Huang · Soledad Villar · Carey E Priebe · Da Zheng · Chengyue Huang · Lin Yang · Vladimir Braverman -
2022 Poster: Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning »
Dingwen Kong · Lin Yang -
2022 Poster: Learning from Distributed Users in Contextual Linear Bandits Without Sharing the Context »
Osama Hanna · Lin Yang · Christina Fragouli -
2022 Poster: Near-Optimal Sample Complexity Bounds for Constrained MDPs »
Sharan Vaswani · Lin Yang · Csaba Szepesvari -
2021 Poster: Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels »
Stefani Karp · Ezra Winston · Yuanzhi Li · Aarti Singh -
2021 Poster: An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap »
Yuanhao Wang · Ruosong Wang · Sham Kakade -
2021 Poster: Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs »
Han Zhong · Jiayi Huang · Lin Yang · Liwei Wang -
2021 Poster: On the Value of Interaction and Function Approximation in Imitation Learning »
Nived Rajaraman · Yanjun Han · Lin Yang · Jingbo Liu · Jiantao Jiao · Kannan Ramchandran -
2021 Poster: Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning »
Jingfeng Wu · Vladimir Braverman · Lin Yang -
2021 Oral: An Exponential Lower Bound for Linearly Realizable MDP with Constant Suboptimality Gap »
Yuanhao Wang · Ruosong Wang · Sham Kakade -
2020 : Contributed Talk 6: What are the Statistical Limits for Batch RL with Linear Function Approximation? »
Ruosong Wang -
2020 : ML4D Townhall »
Artur Dubrawski -
2020 Session: Orals & Spotlights Track 33: Health/AutoML/(Soft|Hard)ware »
Dustin Tran · Artur Dubrawski -
2020 Poster: Planning with General Objective Functions: Going Beyond Total Rewards »
Ruosong Wang · Peilin Zhong · Simon Du · Russ Salakhutdinov · Lin Yang -
2020 Poster: Is Long Horizon RL More Difficult Than Short Horizon RL? »
Ruosong Wang · Simon Du · Lin Yang · Sham Kakade -
2020 Poster: Toward the Fundamental Limits of Imitation Learning »
Nived Rajaraman · Lin Yang · Jiantao Jiao · Kannan Ramchandran -
2020 Poster: Is Plug-in Solver Sample-Efficient for Feature-based Reinforcement Learning? »
Qiwen Cui · Lin Yang -
2020 Poster: Agnostic $Q$-learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity »
Simon Du · Jason Lee · Gaurav Mahajan · Ruosong Wang -
2020 Poster: On Reward-Free Reinforcement Learning with Linear Function Approximation »
Ruosong Wang · Simon Du · Lin Yang · Russ Salakhutdinov -
2020 Poster: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang -
2020 Poster: Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension »
Ruosong Wang · Russ Salakhutdinov · Lin Yang -
2020 Poster: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Spotlight: Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity »
Kaiqing Zhang · Sham Kakade · Tamer Basar · Lin Yang -
2020 Spotlight: Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning »
Fei Feng · Ruosong Wang · Wotao Yin · Simon Du · Lin Yang -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : Poster Session »
Rishav Chourasia · Yichong Xu · Corinna Cortes · Chien-Yi Chang · Yoshihiro Nagano · So Yeon Min · Benedikt Boecking · Phi Vu Tran · Kamyar Ghasemipour · Qianggang Ding · Shouvik Mani · Vikram Voleti · Rasool Fakoor · Miao Xu · Kenneth Marino · Lisa Lee · Volker Tresp · Jean-Francois Kagy · Marvin Zhang · Barnabas Poczos · Dinesh Khandelwal · Adrien Bardes · Evan Shelhamer · Jiacheng Zhu · Ziming Li · Xiaoyan Li · Dmitrii Krasheninnikov · Ruohan Wang · Mayoore Jaiswal · Emad Barsoum · Suvansh Sanjeev · Theeraphol Wattanavekin · Qizhe Xie · Sifan Wu · Yuki Yoshida · David Kanaa · Sina Khoshfetrat Pakazad · Mehdi Maasoumy -
2019 Poster: On Testing for Biases in Peer Review »
Ivan Stelmakh · Nihar Shah · Aarti Singh -
2019 Spotlight: On Testing for Biases in Peer Review »
Ivan Stelmakh · Nihar Shah · Aarti Singh -
2019 Poster: Graph Neural Tangent Kernel: Fusing Graph Neural Networks with Graph Kernels »
Simon Du · Kangcheng Hou · Russ Salakhutdinov · Barnabas Poczos · Ruosong Wang · Keyulu Xu -
2019 Poster: Efficient Symmetric Norm Regression via Linear Sketching »
Zhao Song · Ruosong Wang · Lin Yang · Hongyang Zhang · Peilin Zhong -
2019 Poster: Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle »
Simon Du · Yuping Luo · Ruosong Wang · Hanrui Zhang -
2019 Poster: Mutually Regressive Point Processes »
Ifigeneia Apostolopoulou · Scott Linderman · Kyle Miller · Artur Dubrawski -
2019 Poster: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2019 Spotlight: On Exact Computation with an Infinitely Wide Neural Net »
Sanjeev Arora · Simon Du · Wei Hu · Zhiyuan Li · Russ Salakhutdinov · Ruosong Wang -
2018 : Introductory remarks »
Artur Dubrawski -
2018 Poster: How Many Samples are Needed to Estimate a Convolutional Neural Network? »
Simon Du · Yining Wang · Xiyu Zhai · Sivaraman Balakrishnan · Russ Salakhutdinov · Aarti Singh -
2018 Poster: Optimization of Smooth Functions with Noisy Observations: Local Minimax Rates »
Yining Wang · Sivaraman Balakrishnan · Aarti Singh -
2017 : Introductory remarks »
Artur Dubrawski -
2017 Poster: Hypothesis Transfer Learning via Transformation Functions »
Simon Du · Jayanth Koushik · Aarti Singh · Barnabas Poczos -
2017 Poster: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Spotlight: Gradient Descent Can Take Exponential Time to Escape Saddle Points »
Simon Du · Chi Jin · Jason D Lee · Michael Jordan · Aarti Singh · Barnabas Poczos -
2017 Poster: On the Power of Truncated SVD for General High-rank Matrix Estimation Problems »
Simon Du · Yining Wang · Aarti Singh -
2017 Poster: Noise-Tolerant Interactive Learning Using Pairwise Comparisons »
Yichong Xu · Hongyang Zhang · Aarti Singh · Artur Dubrawski · Kyle Miller -
2016 Poster: Data Poisoning Attacks on Factorization-Based Collaborative Filtering »
Bo Li · Yining Wang · Aarti Singh · Yevgeniy Vorobeychik -
2015 : Tsybakov Noise Adaptive Margin-Based Active Learning »
Aarti Singh -
2015 Poster: Differentially private subspace clustering »
Yining Wang · Yu-Xiang Wang · Aarti Singh -
2015 Demonstration: An interactive system for the extraction of meaningful visualizations from high-dimensional data »
Madalina Fiterau · Artur Dubrawski · Donghan Wang -
2013 Poster: Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic »
James L Sharpnack · Akshay Krishnamurthy · Aarti Singh -
2013 Poster: Low-Rank Matrix and Tensor Completion via Adaptive Sampling »
Akshay Krishnamurthy · Aarti Singh -
2013 Poster: Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation »
Martin Azizyan · Aarti Singh · Larry Wasserman -
2013 Poster: Cluster Trees on Manifolds »
Sivaraman Balakrishnan · Srivatsan Narayanan · Alessandro Rinaldo · Aarti Singh · Larry Wasserman -
2012 Workshop: Algebraic Topology and Machine Learning »
Sivaraman Balakrishnan · Alessandro Rinaldo · Donald Sheehy · Aarti Singh · Larry Wasserman -
2012 Poster: Projection Retrieval for Classification »
Madalina Fiterau · Artur Dubrawski -
2011 Poster: Minimax Localization of Structural Information in Large Noisy Matrices »
Mladen Kolar · Sivaraman Balakrishnan · Alessandro Rinaldo · Aarti Singh -
2011 Poster: Noise Thresholds for Spectral Clustering »
Sivaraman Balakrishnan · Min Xu · Akshay Krishnamurthy · Aarti Singh -
2011 Spotlight: Noise Thresholds for Spectral Clustering »
Sivaraman Balakrishnan · Min Xu · Akshay Krishnamurthy · Aarti Singh -
2011 Spotlight: Minimax Localization of Structural Information in Large Noisy Matrices »
Mladen Kolar · Sivaraman Balakrishnan · Alessandro Rinaldo · Aarti Singh -
2010 Oral: Identifying graph-structured activation patterns in networks »
James L Sharpnack · Aarti Singh -
2010 Poster: Identifying graph-structured activation patterns in networks »
James L Sharpnack · Aarti Singh -
2008 Poster: Unlabeled data: Now it helps, now it doesn't »
Aarti Singh · Rob Nowak · Jerry Zhu -
2008 Poster: Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text »
Yi Zhang · Jeff Schneider · Artur Dubrawski -
2008 Oral: Unlabeled data: Now it helps, now it doesn't »
Aarti Singh · Rob Nowak · Jerry Zhu