Timezone: »

From Bad Models to Good Policies (Sequential Decision Making under Uncertainty)
Odalric-Ambrym Maillard · Timothy A Mann · Shie Mannor · Jeremie Mary · Laurent Orseau · Thomas Dietterich · Ronald Ortner · Peter Grünwald · Joelle Pineau · Raphael Fonteneau · Georgios Theocharous · Esteban D Arcaute · Christos Dimitrakakis · Nan Jiang · Doina Precup · Pierre-Luc Bacon · Marek Petrik · Aviv Tamar

Fri Dec 12 05:30 AM -- 03:30 PM (PST) @ Level 5; room 512 c,g
Event URL: https://sites.google.com/site/badmodelssdmuworkshop2014/ »

OVERVIEW This workshop aims to gather researchers in the area of sequential decision making to discuss recent findings and new challenges around the concept of model misspecification. A misspecified model is a model that either (1) cannot be tractably solved, (2) solving the model does not produce an acceptable solution for the target problem, or (3) the model clearly does not describe the available data perfectly. However, even though the model has its issues, we are interested in finding a good policy. The question is thus: How can misspecified models be made to lead to good policies?

We refer to the following (non exhaustive) types of misspecification.
1. States and Context. A misspecified state representation relates to research problems such as Hidden Markov Models, Predictive State Representations, Feature Reinforcement Learning, Partially Observable Markov Decision Problems, etc. The related question of misspecified context in contextual bandits is also relevant.
2. Dynamics. Consider learning a policy for a class of several MDPs rather than a single MDP, or optimizing a risk averse (as opposed to expected) objective. These approaches could be used to derive a reasonable policy for the target MDP even if the model we solved to obtain it is misspecified. Thus, robustness, safety, and risk-aversion are examples of relevant approaches to this question.
3. Actions. The underlying insight of working with high-level actions built on top of lower-level actions is that if we had the right high-level actions, we would have faster learning/planning. However, finding an appropriate set of high-level actions can be difficult. One form of model misspecification occurs when the given high-level actions cannot be combined to derive an acceptable policy.

More generally, since misspecification may slow learning or prevent an algorithm from finding any acceptable solution, improving the efficiency of planning and learning methods under misspecification is of primary importance. At another level, all these challenges can benefit greatly from the identification of finer properties of MDPs (local recoverability, etc.) and better notions of complexity. These questions are deeply rooted in theory and in recent applications in fields diverse as air-traffic control, marketing, and robotics. We thus also want to encourage presentations of challenges that provide a red-line and agenda for future research, or a survey of the current achievements and difficulties. This includes concrete problems like Energy management, Smart grids, Computational sustainability and Recommender systems.

We welcome contributions on these exciting questions, with the goals of (1) helping close the gap between strong theoretical guarantees and challenging application requirements, (2) identifying promising directions of near future research, for both applications and theory of sequential decision making, and (3) triggering collaborations amongst researchers on learning good policies despite being given misspecified models.

MOTIVATION, OBJECTIVES Despite the success of sequential decision making theory at providing solutions to challenging settings, the field faces a limitation. Often strong theoretical guarantees depend on the assumption that a solution to the class of models considered is a good solution to the target problem. A popular example is that of finite-state MDP learning for which the model of the state-space is assumed known. Such an assumption is however rarely met in practice. Similarly, in recommender systems and contextual bandits, the context may not capture an accurate summary of the users. Developing a methodology for finding, estimating, and dealing with the limitations of the model is paramount to the success of sequential decision processes. Another example of model misspecification occurs in Hierarchical Reinforcement Learning: In many real-world applications, we could solve the problem easily if we had the right set of high-level actions. Instead, we need to find a way to build those from a cruder set of primitive actions or existing high-level actions that do not suit the current task.
Yet another applicative challenge is when we face a process that can only be modeled as an MDP evolving in some class of MDPs, instead of a fixed MDP. leading to robust reinforcement learning, or when we call for safety or risk-averse guarantees.

These problems are important bottlenecks standing in the way of applying sequential decision making to challenging application, and motivate the triple goal of this workshop.

RELEVANCE TO THE COMMUNITY Misspecification of models (in the senses we consider here) is an important problem that is faced in many – if not all – real-world applications of sequential decision making under uncertainty. While theoretical results have primarily focused on the case when models of the environment are well-specified, little work has been done on extending the theory to the case of misspecification. Attempting at understanding why and when incorrectly specified models lead to good empirical performance beyond what the current theory explains is also an important goal. We believe that this workshop will be of great interest for both theoreticians and applied researchers in the field.

PAPER SUBMISSIONS The workshop aims to spark vibrant discussion with talks from invited speakers, presentations from authors of accepted papers, and a poster session. We are soliciting two types of contributions:
• Papers (4-6 pages) for oral or interactive poster presentations
• Extended abstracts (2 pages) for interactive poster presentation
We encourage submissions from different fields of sequential decision making (e.g., reinforcement learning, online learning, active learning), as well as application-domain experts (from e.g., digital marketing, recommender systems, personalized medicine, etc.) addressing the following (non-
exhaustive) list of questions and topics:
• Misspecification in model selection.
• State-representations in Reinforcement learning: Hidden Markov Models, Predictive State Representations, Feature Reinforcement Learning, Partially Observable Markov Decision Processes.
• Latent variables in sequential decision making and techniques to handle them.
• Robustness, Safety and Risk-aversion in Reinforcement Learning.
• Curiosity and Autonomous learning (reward misspecification).
• Reinforcement Learning with Options.
• Application for the Reinforcement Learning community (Computational Sustainability, Smart Cities, Smart grids, etc.).
• Other topics whose relevance to the workshop is well supported.
Solutions to such challenges will benefit the machine learning community at large, since they also appear in many real-world applications.

Author Information

Odalric-Ambrym Maillard (INRIA)
Timothy A Mann (The Technion)
Shie Mannor (Technion)
Jeremie Mary (INRIA / Univ. Lille)
Laurent Orseau (AgroParisTech/INRA)
Thomas Dietterich (Oregon State University)

Tom Dietterich (AB Oberlin College 1977; MS University of Illinois 1979; PhD Stanford University 1984) is Professor and Director of Intelligent Systems Research at Oregon State University. Among his contributions to machine learning research are (a) the formalization of the multiple-instance problem, (b) the development of the error-correcting output coding method for multi-class prediction, (c) methods for ensemble learning, (d) the development of the MAXQ framework for hierarchical reinforcement learning, and (e) the application of gradient tree boosting to problems of structured prediction and latent variable models. Dietterich has pursued application-driven fundamental research in many areas including drug discovery, computer vision, computational sustainability, and intelligent user interfaces. Dietterich has served the machine learning community in a variety of roles including Executive Editor of the Machine Learning journal, co-founder of the Journal of Machine Learning Research, editor of the MIT Press Book Series on Adaptive Computation and Machine Learning, and editor of the Morgan-Claypool Synthesis series on Artificial Intelligence and Machine Learning. He was Program Co-Chair of AAAI-1990, Program Chair of NIPS-2000, and General Chair of NIPS-2001. He was first President of the International Machine Learning Society (the parent organization of ICML) and served a term on the NIPS Board of Trustees and the Council of AAAI.

Ronald Ortner (Montanuniversitaet Leoben)
Peter Grünwald (CWI and Leiden University)
Joelle Pineau (McGill University)

Joelle Pineau is an Associate Professor and William Dawson Scholar at McGill University where she co-directs the Reasoning and Learning Lab. She also leads the Facebook AI Research lab in Montreal, Canada. She holds a BASc in Engineering from the University of Waterloo, and an MSc and PhD in Robotics from Carnegie Mellon University. Dr. Pineau's research focuses on developing new models and algorithms for planning and learning in complex partially-observable domains. She also works on applying these algorithms to complex problems in robotics, health care, games and conversational agents. She serves on the editorial board of the Journal of Artificial Intelligence Research and the Journal of Machine Learning Research and is currently President of the International Machine Learning Society. She is a recipient of NSERC's E.W.R. Steacie Memorial Fellowship (2018), a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Senior Fellow of the Canadian Institute for Advanced Research (CIFAR) and in 2016 was named a member of the College of New Scholars, Artists and Scientists by the Royal Society of Canada.

Raphael Fonteneau (Université de Liège)
Georgios Theocharous (Adobe Research)
Esteban D Arcaute (@WalmartLabs)
Christos Dimitrakakis (University of Oslo)
Nan Jiang (University of Illinois at Urbana-Champaign)
Doina Precup (McGill University / Mila / DeepMind Montreal)
Pierre-Luc Bacon (McGill University)
Marek Petrik (University of New Hampshire)
Aviv Tamar (Technion)

More from the Same Authors