From Bad Models to Good Policies (Sequential Decision Making under Uncertainty)

Workshop

From Bad Models to Good Policies (Sequential Decision Making under Uncertainty)

Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar

Level 5; room 512 c,g

[ Abstract ] Workshop Website

[ Youtube Video - Androulakis] [ Youtube Video - Bacon] [ Youtube Video - Dietterich] [ Youtube Video - Fonteneau] [ Youtube Video - Grunwald] [ Youtube Video - Jiang] [ Youtube Video - Ortner] [ Youtube Video - Petrik] [ Youtube Video - Pineau] [ Youtube Video - Tamar] [ Youtube Video - Theocharous]

OVERVIEW This workshop aims to gather researchers in the area of sequential decision making to discuss recent findings and new challenges around the concept of model misspecification. A misspecified model is a model that either (1) cannot be tractably solved, (2) solving the model does not produce an acceptable solution for the target problem, or (3) the model clearly does not describe the available data perfectly. However, even though the model has its issues, we are interested in finding a good policy. The question is thus: How can misspecified models be made to lead to good policies?

We refer to the following (non exhaustive) types of misspecification.
1. States and Context. A misspecified state representation relates to research problems such as Hidden Markov Models, Predictive State Representations, Feature Reinforcement Learning, Partially Observable Markov Decision Problems, etc. The related question of misspecified context in contextual bandits is also relevant.
2. Dynamics. Consider learning a policy for a class of several MDPs rather than a single MDP, or optimizing a risk averse (as opposed to expected) objective. These approaches could be used to derive a reasonable policy for the target MDP even if the model we solved to obtain it is misspecified. Thus, robustness, safety, and risk-aversion are examples of relevant approaches to this question.
3. Actions. The underlying insight of working with high-level actions built on top of lower-level actions is that if we had the right high-level actions, we would have faster learning/planning. However, finding an appropriate set of high-level actions can be difficult. One form of model misspecification occurs when the given high-level actions cannot be combined to derive an acceptable policy.

More generally, since misspecification may slow learning or prevent an algorithm from finding any acceptable solution, improving the efficiency of planning and learning methods under misspecification is of primary importance. At another level, all these challenges can benefit greatly from the identification of finer properties of MDPs (local recoverability, etc.) and better notions of complexity. These questions are deeply rooted in theory and in recent applications in fields diverse as air-traffic control, marketing, and robotics. We thus also want to encourage presentations of challenges that provide a red-line and agenda for future research, or a survey of the current achievements and difficulties. This includes concrete problems like Energy management, Smart grids, Computational sustainability and Recommender systems.

We welcome contributions on these exciting questions, with the goals of (1) helping close the gap between strong theoretical guarantees and challenging application requirements, (2) identifying promising directions of near future research, for both applications and theory of sequential decision making, and (3) triggering collaborations amongst researchers on learning good policies despite being given misspecified models.

MOTIVATION, OBJECTIVES Despite the success of sequential decision making theory at providing solutions to challenging settings, the field faces a limitation. Often strong theoretical guarantees depend on the assumption that a solution to the class of models considered is a good solution to the target problem. A popular example is that of finite-state MDP learning for which the model of the state-space is assumed known. Such an assumption is however rarely met in practice. Similarly, in recommender systems and contextual bandits, the context may not capture an accurate summary of the users. Developing a methodology for finding, estimating, and dealing with the limitations of the model is paramount to the success of sequential decision processes. Another example of model misspecification occurs in Hierarchical Reinforcement Learning: In many real-world applications, we could solve the problem easily if we had the right set of high-level actions. Instead, we need to find a way to build those from a cruder set of primitive actions or existing high-level actions that do not suit the current task.
Yet another applicative challenge is when we face a process that can only be modeled as an MDP evolving in some class of MDPs, instead of a fixed MDP. leading to robust reinforcement learning, or when we call for safety or risk-averse guarantees.

These problems are important bottlenecks standing in the way of applying sequential decision making to challenging application, and motivate the triple goal of this workshop.

RELEVANCE TO THE COMMUNITY Misspecification of models (in the senses we consider here) is an important problem that is faced in many – if not all – real-world applications of sequential decision making under uncertainty. While theoretical results have primarily focused on the case when models of the environment are well-specified, little work has been done on extending the theory to the case of misspecification. Attempting at understanding why and when incorrectly specified models lead to good empirical performance beyond what the current theory explains is also an important goal. We believe that this workshop will be of great interest for both theoreticians and applied researchers in the field.

PAPER SUBMISSIONS The workshop aims to spark vibrant discussion with talks from invited speakers, presentations from authors of accepted papers, and a poster session. We are soliciting two types of contributions:
• Papers (4-6 pages) for oral or interactive poster presentations
• Extended abstracts (2 pages) for interactive poster presentation
We encourage submissions from different fields of sequential decision making (e.g., reinforcement learning, online learning, active learning), as well as application-domain experts (from e.g., digital marketing, recommender systems, personalized medicine, etc.) addressing the following (non-
exhaustive) list of questions and topics:
• Misspecification in model selection.
• State-representations in Reinforcement learning: Hidden Markov Models, Predictive State Representations, Feature Reinforcement Learning, Partially Observable Markov Decision Processes.
• Latent variables in sequential decision making and techniques to handle them.
• Robustness, Safety and Risk-aversion in Reinforcement Learning.
• Curiosity and Autonomous learning (reward misspecification).
• Reinforcement Learning with Options.
• Application for the Reinforcement Learning community (Computational Sustainability, Smart Cities, Smart grids, etc.).
• Other topics whose relevance to the workshop is well supported.
Solutions to such challenges will benefit the machine learning community at large, since they also appear in many real-world applications.

Live content is unavailable. Log in and register to view live content