Timezone: »
The classic Reinforcement Learning (RL) formulation concerns the maximization of a scalar reward function. More recently, convex RL has been introduced to extend the RL formulation to all the objectives that are convex functions of the state distribution induced by a policy. Notably, convex RL covers several relevant applications that do not fall into the scalar formulation, including imitation learning, risk-averse RL, and pure exploration. In classic RL, it is common to optimize an infinite trials objective, which accounts for the state distribution instead of the empirical state visitation frequencies, even though the actual number of trajectories is always finite in practice. This is theoretically sound since the infinite trials and finite trials objectives are equivalent and thus lead to the same optimal policy. In this paper, we show that this hidden assumption does not hold in convex RL. In particular, we prove that erroneously optimizing the infinite trials objective in place of the actual finite trials one, as it is usually done, can lead to a significant approximation error. Since the finite trials setting is the default in both simulated and real-world RL, we believe shedding light on this issue will lead to better approaches and methodologies for convex RL, impacting relevant research areas such as imitation learning, risk-averse RL, and pure exploration among others.
Author Information
Mirco Mutti (Politecnico di Milano, Università di Bologna)
Riccardo De Santi (ETH Zurich)
Piersilvio De Bartolomeis (ETH Zürich)
Marcello Restelli (Politecnico di Milano)
More from the Same Authors
-
2021 Spotlight: Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning »
Alberto Maria Metelli · Alessio Russo · Marcello Restelli -
2021 : Policy Optimization via Optimal Policy Evaluation »
Alberto Maria Metelli · Samuele Meta · Marcello Restelli -
2022 : Multi-Armed Bandit Problem with Temporally-Partitioned Rewards »
Giulia Romano · Andrea Agostini · Francesco Trovò · Nicola Gatti · Marcello Restelli -
2022 : Provably Efficient Causal Model-Based Reinforcement Learning for Environment-Agnostic Generalization »
Mirco Mutti · Riccardo De Santi · Emanuele Rossi · Juan Calderon · Michael Bronstein · Marcello Restelli -
2022 : Certified defences hurt generalisation »
Piersilvio De Bartolomeis · Jacob Clarysse · Fanny Yang · Amartya Sanyal -
2022 : Certified defences hurt generalisation »
Piersilvio De Bartolomeis · Jacob Clarysse · Fanny Yang · Amartya Sanyal -
2022 : Piersilvio De Bartolomeis: Certified defences hurt generalisation »
Piersilvio De Bartolomeis -
2022 Poster: Multi-Fidelity Best-Arm Identification »
Riccardo Poiani · Alberto Maria Metelli · Marcello Restelli -
2022 Poster: Off-Policy Evaluation with Deficient Support Using Side Information »
Nicolò Felicioni · Maurizio Ferrari Dacrema · Marcello Restelli · Paolo Cremonesi -
2021 Poster: Learning in Non-Cooperative Configurable Markov Decision Processes »
Giorgia Ramponi · Alberto Maria Metelli · Alessandro Concetti · Marcello Restelli -
2021 Poster: Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning »
Alberto Maria Metelli · Alessio Russo · Marcello Restelli -
2020 Poster: An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits »
Andrea Tirinzoni · Matteo Pirotta · Marcello Restelli · Alessandro Lazaric -
2020 Poster: Inverse Reinforcement Learning from a Gradient-based Learner »
Giorgia Ramponi · Gianluca Drappo · Marcello Restelli -
2020 Session: Orals & Spotlights Track 31: Reinforcement Learning »
Dotan Di Castro · Marcello Restelli -
2019 Poster: Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters »
Alberto Maria Metelli · Amarildo Likmeta · Marcello Restelli -
2018 Poster: Policy Optimization via Importance Sampling »
Alberto Maria Metelli · Matteo Papini · Francesco Faccio · Marcello Restelli -
2018 Poster: Transfer of Value Functions via Variational Methods »
Andrea Tirinzoni · Rafael Rodriguez Sanchez · Marcello Restelli -
2018 Oral: Policy Optimization via Importance Sampling »
Alberto Maria Metelli · Matteo Papini · Francesco Faccio · Marcello Restelli -
2017 Poster: Compatible Reward Inverse Reinforcement Learning »
Alberto Maria Metelli · Matteo Pirotta · Marcello Restelli -
2017 Poster: Adaptive Batch Size for Safe Policy Gradients »
Matteo Papini · Matteo Pirotta · Marcello Restelli -
2014 Poster: Sparse Multi-Task Reinforcement Learning »
Daniele Calandriello · Alessandro Lazaric · Marcello Restelli -
2013 Poster: Adaptive Step-Size for Policy Gradient Methods »
Matteo Pirotta · Marcello Restelli · Luca Bascetta -
2011 Poster: Transfer from Multiple MDPs »
Alessandro Lazaric · Marcello Restelli -
2007 Spotlight: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods »
Alessandro Lazaric · Marcello Restelli · Andrea Bonarini -
2007 Poster: Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods »
Alessandro Lazaric · Marcello Restelli · Andrea Bonarini