Timezone: »
In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints. For instance, the designer may want to limit the use of unsafe actions, increase the diversity of trajectories to enable exploration, or approximate expert trajectories when rewards are sparse. In this paper, we propose an algorithmic scheme that can handle a wide class of constraints in RL tasks: specifically, any constraints that require expected values of some vector measurements (such as the use of an action) to lie in a convex set. This captures previously studied constraints (such as safety and proximity to an expert), but also enables new classes of constraints (such as diversity). Our approach comes with rigorous theoretical guarantees and only relies on the ability to approximately solve standard RL tasks. As a result, it can be easily adapted to work with any model-free or model-based RL. In our experiments, we show that it matches previous algorithms that enforce safety via constraints, but can also enforce new properties that these algorithms do not incorporate, such as diversity.
Author Information
Sobhan Miryoosefi (Princeton University)
Kianté Brantley (The University of Maryland College Park)
Hal Daumé III (Microsoft Research & University of Maryland)
Hal Daumé III wields a professor appointment in Computer Science and Language Science at the University of Maryland, and spends time as a principal researcher in the machine learning group and fairness group at Microsoft Research in New York City. He and his wonderful advisees study questions related to how to get machines to become more adept at human language, by developing models and algorithms that allow them to learn from data. The two major questions that really drive their research these days are: (1) how can we get computers to learn language through natural interaction with people/users? and (2) how can we do this in a way that promotes fairness, transparency and explainability in the learned models?
Miro Dudik (Microsoft Research)
Robert Schapire (MIcrosoft Research)
More from the Same Authors
-
2021 Spotlight: Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms »
Chi Jin · Qinghua Liu · Sobhan Miryoosefi -
2021 Spotlight: Bayesian decision-making under misspecified priors with applications to meta-learning »
Max Simchowitz · Christopher Tosh · Akshay Krishnamurthy · Daniel Hsu · Thodoris Lykouris · Miro Dudik · Robert Schapire -
2021 : Poster: The Many Roles that Causal Reasoning Plays in Reasoning about Fairness in Machine Learning »
Irene Y Chen · Hal Daumé III · Solon Barocas -
2022 : $\ell$Gym: Natural Language Visual Reasoning with Reinforcement Learning »
Anne Wu · Kianté Brantley · Noriyuki Kojima · Yoav Artzi -
2022 Workshop: HCAI@NeurIPS 2022, Human Centered AI »
Michael Muller · Plamen P Angelov · Hal Daumé III · Shion Guha · Q.Vera Liao · Nuria Oliver · David Piorkowski -
2022 Workshop: InterNLP: Workshop on Interactive Learning for Natural Language Processing »
Kianté Brantley · Soham Dan · Ji Ung Lee · Khanh Nguyen · Edwin Simpson · Alane Suhr · Yoav Artzi -
2022 Poster: Provably sample-efficient RL with side information about latent dynamics »
Yao Liu · Dipendra Misra · Miro Dudik · Robert Schapire -
2021 : The Many Roles that Causal Reasoning Plays in Reasoning about Fairness in Machine Learning »
Irene Y Chen · Hal Daumé III · Solon Barocas -
2021 Poster: Multiclass Boosting and the Cost of Weak Learning »
Nataly Brukhim · Elad Hazan · Shay Moran · Indraneel Mukherjee · Robert Schapire -
2021 Poster: Bellman Eluder Dimension: New Rich Classes of RL Problems, and Sample-Efficient Algorithms »
Chi Jin · Qinghua Liu · Sobhan Miryoosefi -
2021 Poster: Bayesian decision-making under misspecified priors with applications to meta-learning »
Max Simchowitz · Christopher Tosh · Akshay Krishnamurthy · Daniel Hsu · Thodoris Lykouris · Miro Dudik · Robert Schapire -
2020 Session: Orals & Spotlights Track 20: Social/Adversarial Learning »
Steven Wu · Miro Dudik -
2020 Poster: Constrained episodic reinforcement learning in concave-convex and knapsack settings »
Kianté Brantley · Miro Dudik · Thodoris Lykouris · Sobhan Miryoosefi · Max Simchowitz · Aleksandrs Slivkins · Wen Sun -
2019 Tutorial: Imitation Learning and its Application to Natural Language Generation »
Kyunghyun Cho · Hal Daumé III -
2018 Workshop: Wordplay: Reinforcement and Language Learning in Text-based Games »
Adam Trischler · Angeliki Lazaridou · Yonatan Bisk · Wendy Tay · Nate Kushman · Marc-Alexandre Côté · Alessandro Sordoni · Daniel Ricks · Tom Zahavy · Hal Daumé III -
2018 Poster: On Oracle-Efficient PAC RL with Rich Observations »
Christoph Dann · Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2018 Spotlight: On Oracle-Efficient PAC RL with Rich Observations »
Christoph Dann · Nan Jiang · Akshay Krishnamurthy · Alekh Agarwal · John Langford · Robert Schapire -
2017 : Competition V: Human-Computer Question Answering »
Jordan Boyd-Graber · Hal Daumé III · He He · Mohit Iyyer · Pedro Rodriguez -
2017 Poster: Off-policy evaluation for slate recommendation »
Adith Swaminathan · Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik · John Langford · Damien Jose · Imed Zitouni -
2017 Poster: A Decomposition of Forecast Error in Prediction Markets »
Miro Dudik · Sebastien Lahaie · Ryan Rogers · Jennifer Wortman Vaughan -
2017 Oral: Off-policy evaluation for slate recommendation »
Adith Swaminathan · Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik · John Langford · Damien Jose · Imed Zitouni -
2016 Workshop: Let's Discuss: Learning Methods for Dialogue »
Hal Daumé III · Paul Mineiro · Amanda Stent · Jason E Weston -
2016 Poster: Contextual semibandits via supervised learning oracles »
Akshay Krishnamurthy · Alekh Agarwal · Miro Dudik -
2016 Poster: Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits »
Vasilis Syrgkanis · Haipeng Luo · Akshay Krishnamurthy · Robert Schapire -
2016 Poster: A Credit Assignment Compiler for Joint Prediction »
Kai-Wei Chang · He He · Stephane Ross · Hal Daumé III · John Langford -
2015 Poster: Efficient and Parsimonious Agnostic Active Learning »
Tzu-Kuo Huang · Alekh Agarwal · Daniel Hsu · John Langford · Robert Schapire -
2015 Spotlight: Efficient and Parsimonious Agnostic Active Learning »
Tzu-Kuo Huang · Alekh Agarwal · Daniel Hsu · John Langford · Robert Schapire -
2015 Poster: Fast Convergence of Regularized Learning in Games »
Vasilis Syrgkanis · Alekh Agarwal · Haipeng Luo · Robert Schapire -
2015 Oral: Fast Convergence of Regularized Learning in Games »
Vasilis Syrgkanis · Alekh Agarwal · Haipeng Luo · Robert Schapire -
2014 Workshop: Second Workshop on Transfer and Multi-Task Learning: Theory meets Practice »
Urun Dogan · Tatiana Tommasi · Yoshua Bengio · Francesco Orabona · Marius Kloft · Andres Munoz · Gunnar Rätsch · Hal Daumé III · Mehryar Mohri · Xuezhi Wang · Daniel Hernández-lobato · Song Liu · Thomas Unterthiner · Pascal Germain · Vinay P Namboodiri · Michael Goetz · Christopher Berlind · Sigurd Spieckermann · Marta Soare · Yujia Li · Vitaly Kuznetsov · Wenzhao Lian · Daniele Calandriello · Emilie Morvant -
2014 Workshop: Representation and Learning Methods for Complex Outputs »
Richard Zemel · Dale Schuurmans · Kilian Q Weinberger · Yuhong Guo · Jia Deng · Francesco Dinuzzo · Hal Daumé III · Honglak Lee · Noah A Smith · Richard Sutton · Jiaqian YU · Vitaly Kuznetsov · Luke Vilnis · Hanchen Xiong · Calvin Murdock · Thomas Unterthiner · Jean-Francis Roy · Martin Renqiang Min · Hichem SAHBI · Fabio Massimo Zanzotto -
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck -
2014 Poster: Learning to Search in Branch and Bound Algorithms »
He He · Hal Daumé III · Jason Eisner -
2013 Poster: Binary to Bushy: Bayesian Hierarchical Clustering with the Beta Coalescent »
Yuening Hu · Jordan Boyd-Graber · Hal Daumé III · Z. Irene Ying -
2012 Poster: Imitation Learning by Coaching »
He He · Hal Daumé III · Jason Eisner -
2012 Poster: Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression »
Piyush Rai · Abhishek Kumar · Hal Daumé III -
2012 Poster: Learned Prioritization for Trading Off Accuracy and Speed »
Jiarong Jiang · Adam Teichert · Hal Daumé III · Jason Eisner -
2011 Poster: Message-Passing for Approximate MAP Inference with Latent Variables »
Jiarong Jiang · Piyush Rai · Hal Daumé III -
2011 Poster: Co-regularized Multi-view Spectral Clustering »
Abhishek Kumar · Piyush Rai · Hal Daumé III -
2010 Poster: Learning Multiple Tasks using Manifold Regularization »
Arvind Agarwal · Hal Daumé III · Samuel Gerber -
2010 Poster: Co-regularization Based Semi-supervised Domain Adaptation »
Hal Daumé III · Abhishek Kumar · Avishek Saha -
2009 Poster: Multi-Label Prediction via Sparse Infinite CCA »
Piyush Rai · Hal Daumé III -
2008 Poster: Nonparametric Bayesian Sparse Hierarchical Factor Modeling and Regression »
Piyush Rai · Hal Daumé III -
2007 Poster: Bayesian Agglomerative Clustering with Coalescents »
Yee Whye Teh · Hal Daumé III · Daniel Roy -
2007 Oral: Bayesian Agglomerative Clustering with Coalescents »
Yee Whye Teh · Hal Daumé III · Daniel Roy