Foundation Models for Decision Making

Workshop

Foundation Models for Decision Making

Mengjiao (Sherry) Yang · Yilun Du · Jack Parker-Holder · Siddharth Karamcheti · Igor Mordatch · Shixiang (Shane) Gu · Ofir Nachum

Sat 3 Dec, 6:50 a.m. PST

[ Abstract ] Workshop Website

[ Contact: fmdm-external@google.com ]

Humans acquire vision, language, and decision making abilities through years of experience, arguably corresponding to millions of video frames, audio clips, and interactions with the world. Following this data-driven approach, recent foundation models trained on large and diverse datasets have demonstrated emergent capabilities and fast adaptation to a wide range of downstream vision and language tasks (e.g., BERT, DALL-E, GPT-3, CLIP). Meanwhile in the decision making and reinforcement learning (RL) literature, foundation models have yet to fundamentally shift the traditional paradigm in which an agent learns from its own or others’ collected experience, typically on a single-task and with limited prior knowledge. Nevertheless, there has been a growing body of foundation-model-inspired research in decision making that often involves collecting large amounts of interactive data for self-supervised learning at scale. For instance, foundation models such as BERT and GPT-3 have been applied to modeling trajectory sequences of agent experience, and ever-larger datasets have been curated for learning multimodel, multitask, and generalist agents. These works demonstrate the potential benefits of foundation models on a broad set of decision making applications such as autonomous driving, healthcare systems, robotics, goal-oriented dialogue, robotics, and recommendation systems.

Despite early signs of success, foundation models for decision making remain largely underexplored, underutilized, and lacking solid empirical and theoretical grounding. The challenges faced by existing research are as follows:
1. Many traditional decision making benchmarks are (near-)Markovian (i.e., historyless), and this brings the value of sequence modeling into question. The true power of foundation models may require more complex tasks.
2. Decision making tasks are composed of multi-modal data. At minimum, the states (observations), actions, and rewards of a task are each of different types. Moreover, across different tasks, states and actions can be highly distinct (image vs. text observations, discrete vs. continuous actions).
3. Unlike vision and language, decision making agents can further interact with the environment to collect additional experience in conjunction with learning on existing data. How such an interactive component should be integrated with foundation models is not clear.
4. There already exhibits a large gap between theory and practice in decision making. Hastily applying large models to decision making might create an even greater gap.

Goal of the workshop: The goal of this workshop is to bring together the decision making community and the foundation models community in vision and language to confront the challenges in decision making at scale. The workshop will span high-level discussions on how foundation models can help decision making (if at all) and low-level algorithmic differences of decision, vision, and language which might lead to both opportunities or challenges for applying foundation models to decision making. More specific topics will include but are not limited to:
1. Common or distinct properties of vision, language, and decision making tasks that reassure or challenge the value of foundation models in decision making.
2. Introduction or proposals for new benchmarks to facilitate better research for foundation models for decision making.
3. How decision making can benefit from techniques already popular for foundation models, such as autoregressive sequence models, diffusion models, contrastive pretraining, masked autoencoders, prompting, etc.
4. Lessons learned from developing engineering frameworks, datasets and benchmarks, and evaluation protocols for foundation models in vision and language, and how can the decision making community benefit from these lessons.
5. How foundation models relate to the theoretical foundations of sequential decision making.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 6:50 a.m. - 7:00 a.m.	Ofir Nachum: Opening Remarks ( In-Person Introduction ) > SlidesLive Video	🔗
Sat 7:00 a.m. - 7:15 a.m.	Is Conditional Generative Modeling all you need for Decision-Making? ( Oral Presentation ) > SlidesLive Video	🔗
Sat 7:15 a.m. - 7:30 a.m.	Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training ( Oral Presentation ) > link SlidesLive Video Link	🔗
Sat 7:30 a.m. - 7:45 a.m.	VIMA: General Robot Manipulation with Multimodal Prompts ( Oral Presentation ) > link SlidesLive Video Link	🔗
Sat 7:45 a.m. - 8:00 a.m.	Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action ( Oral Presentation ) > link SlidesLive Video Link	🔗
Sat 8:00 a.m. - 8:30 a.m.	Gabriel Barth-Maron: Gato: A Generalist Agent ( Invited Talk ) > SlidesLive Video	🔗
Sat 8:30 a.m. - 9:00 a.m.	Jim Fan: Open-Ended Embodied Agents with Internet-Scale Knowledge ( Invited Talk ) > SlidesLive Video	🔗
Sat 9:00 a.m. - 9:30 a.m.	Leslie P. Kaelbling: What does an intelligent robot need to know? ( Invited Talk ) > SlidesLive Video	🔗
Sat 9:30 a.m. - 10:00 a.m.	Dorsa Sadigh: Learning and Leveraging Foundation Models in Robotics ( Invited Talk ) > SlidesLive Video	🔗
Sat 11:00 a.m. - 11:15 a.m.	REACT: Synergizing Reasoning and Acting in Language Models ( Oral Presentation ) > link SlidesLive Video Link	🔗
Sat 11:15 a.m. - 11:30 a.m.	Generative Pretraining for Black-Box Optimization ( Oral Presentation ) > link SlidesLive Video Link	🔗
Sat 11:30 a.m. - 11:45 a.m.	In-context Reinforcement Learning with Algorithm Distillation ( Oral Presentation ) > link SlidesLive Video Link	🔗
Sat 11:45 a.m. - 12:00 p.m.	Large Language Models Are Human-Level Prompt Engineers ( Oral Presentation ) > link SlidesLive Video Link	🔗
Sat 12:00 p.m. - 12:30 p.m.	Thomas Wolf: Unlocking Foundation Models for Embodied Learning – What Tools Will We Need? ( Invited Talk ) > SlidesLive Video	🔗
Sat 12:30 p.m. - 1:00 p.m.	Machel Reid: On using pre-trained language models for reinforcement learning ( Invited Talk ) > SlidesLive Video	🔗
Sat 1:00 p.m. - 1:30 p.m.	Deepak Pathak: Invited Talk ( Invited Talk ) > SlidesLive Video	🔗
Sat 1:30 p.m. - 2:00 p.m.	Dale Schuurmans: Large Foundation Models and Reinforcement Learning ( Invited Talk ) > SlidesLive Video	🔗
Sat 2:00 p.m. - 2:30 p.m.	Panel Discussion ( Panel Discussion ) > SlidesLive Video	🔗
-	Revealing the Bias in Large Language Models via Reward Structured Questions ( Poster ) > link Link	Ezgi Korkmaz 🔗
-	Intelligent Variable Selection for Branch \& Bound Methods ( Poster ) > link Link	Priya Shanmugasundaram · Saurabh Jha · Sailendu Patra 🔗
-	Skill Decision Transformer ( Poster ) > link Link	Shyam Sudhakaran · Sebastian Risi 🔗
-	PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pretraining ( Poster ) > link SlidesLive Video Link	Rogerio Bonatti · Sai Vemprala · shuang ma · Felipe Vieira Frujeri · Shuhang Chen · Ashish Kapoor 🔗
-	SMART: Self-supervised Multi-task pretrAining with contRol Transformers ( Poster ) > link SlidesLive Video Link	Yanchao Sun · shuang ma · Ratnesh Madaan · Rogerio Bonatti · Furong Huang · Ashish Kapoor 🔗
-	LATTE: LAnguage Trajectory TransformEr ( Poster ) > link SlidesLive Video Link	A Bucker · Luis Figueredo · Sami Haddadin · Ashish Kapoor · shuang ma · Sai Vemprala · Rogerio Bonatti 🔗
-	Build generally reusable agent-environment interaction models ( Poster ) > link SlidesLive Video Link	Jun Jin · Hongming Zhang · Jun Luo 🔗
-	Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains ( Poster ) > link SlidesLive Video Link	Pierre Chambon · Christian Bluethgen · Curtis Langlotz · Akshay Chaudhari 🔗
-	What Makes Certain Pre-Trained Visual Representations Better for Robotic Learning? ( Poster ) > link Link	Kyle Hsu · Tyler Lum · Ruohan Gao · Shixiang (Shane) Gu · Jiajun Wu · Chelsea Finn 🔗
-	Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) ( Poster ) > link SlidesLive Video Link	Karthik Valmeekam · Alberto Olmo · Sarath Sreedharan · Subbarao Kambhampati 🔗
-	A Control-Centric Benchmark for Video Prediction ( Poster ) > link Link	Stephen Tian · Chelsea Finn · Jiajun Wu 🔗
-	CabiNet: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation ( Poster ) > link SlidesLive Video Link	Adithyavairavan Murali · Arsalan Mousavian · Clemens Eppner · Adam Fishman · Dieter Fox 🔗
-	Planning With Large Language Models Via Corrective Re-Prompting ( Poster ) > link SlidesLive Video Link	Shreyas Sundara Raman · Vanya Cohen · Eric Rosen · Ifrah Idrees · David Paulius · Stefanie Tellex 🔗
-	Decision Making as Language Generation ( Poster ) > link Link	Roland Memisevic · Sunny P Panchal · Mingu Lee 🔗
-	Multi-step Planning for Automated Hyperparameter Optimization with OptFormer ( Poster ) > link SlidesLive Video Link	Lucio M Dery · Abram Friesen · Nando de Freitas · Marc'Aurelio Ranzato · Yutian Chen 🔗
-	A Mixture-of-Expert Approach to RL-based Dialogue Management ( Poster ) > link SlidesLive Video Link	Yinlam Chow · Azamat Tulepbergenov · Ofir Nachum · Dhawal Gupta · Moonkyung Ryu · Mohammad Ghavamzadeh · Craig Boutilier 🔗
-	Foundation Models for Semantic Novelty in Reinforcement Learning ( Poster ) > link SlidesLive Video Link	Tarun Gupta · Peter Karkus · Tong Che · Danfei Xu · Marco Pavone 🔗
-	Large Language Models Are Human-Level Prompt Engineers ( Poster ) > link SlidesLive Video Link	Yongchao Zhou · Andrei Muresanu · Ziwen Han · Silviu Pitis · Harris Chan · Keiran Paster · Jimmy Ba 🔗
-	Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks ( Poster ) > link Link	Jesse Farebrother · Joshua Greaves · Rishabh Agarwal · Charline Le Lan · Ross Goroshin · Pablo Samuel Castro · Marc Bellemare 🔗
-	Return Augmentation gives Supervised RL Temporal Compositionality ( Poster ) > link SlidesLive Video Link	Keiran Paster · Silviu Pitis · Sheila McIlraith · Jimmy Ba 🔗
-	Offline Q-learning on Diverse Multi-Task Data Both Scales And Generalizes ( Poster ) > link Link	Aviral Kumar · Rishabh Agarwal · XINYANG GENG · George Tucker · Sergey Levine 🔗
-	Pre-Training for Robots: Leveraging Diverse Multitask Data via Offline Reinforcement Learning ( Poster ) > link Link	Aviral Kumar · Anikait Singh · Frederik Ebert · Yanlai Yang · Chelsea Finn · Sergey Levine 🔗
-	Offline Reinforcement Learning from Heteroskedastic Data Via Support Constraints ( Poster ) > link Link	Anikait Singh · Aviral Kumar · Quan Vuong · Yevgen Chebotar · Sergey Levine 🔗
-	Planning with Large Language Models for Code Generation ( Poster ) > link Link	Shun Zhang · Zhenfang Chen · Yikang Shen · Mingyu Ding · Josh Tenenbaum · Chuang Gan 🔗
-	Learning Control by Iterative Inversion ( Poster ) > link Link	Gal Leibovich · Guy Jacob · Or Avner · Gal Novik · Aviv Tamar 🔗
-	Multi-Environment Pretraining Enables Transfer to Action Limited Datasets ( Poster ) > link SlidesLive Video Link	David Venuto · Mengjiao (Sherry) Yang · Pieter Abbeel · Doina Precup · Igor Mordatch · Ofir Nachum 🔗
-	Active Acquisition for Multimodal Temporal Data: A Challenging Decision-Making Task ( Poster ) > link SlidesLive Video Link	Jannik Kossen · Cătălina Cangea · Eszter Vértes · Andrew Jaegle · Viorica Patraucean · Ira Ktena · Nenad Tomasev · Danielle Belgrave 🔗
-	Foundation Models for History Compression in Reinforcement Learning ( Poster ) > link SlidesLive Video Link	Fabian Paischer · Thomas Adler · Andreas Radler · Markus Hofmarcher · Sepp Hochreiter 🔗
-	Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks ( Poster ) > link SlidesLive Video Link	Albert Yu · Raymond Mooney 🔗
-	How crucial is Transformer in Decision Transformer? ( Poster ) > link SlidesLive Video Link	Max Siebenborn · Boris Belousov · Junning Huang · Jan Peters 🔗
-	Pareto-Efficient Decision Agents for Offline Multi-Objective Reinforcement Learning ( Poster ) > link Link	Baiting Zhu · Meihua Dang · Aditya Grover 🔗
-	Is Conditional Generative Modeling all you need for Decision-Making? ( Poster ) > link Link	Anurag Ajay · Yilun Du · Abhi Gupta · Josh Tenenbaum · Tommi Jaakkola · Pulkit Agrawal 🔗
-	Wall Street Tree Search: Risk-Aware Planning for Offline Reinforcement Learning ( Poster ) > link Link	Dan Elbaz · Gal Novik · Oren Salzman 🔗
-	In-Context Policy Iteration ( Poster ) > link Link	Ethan Brooks · Logan Walls · Richard L Lewis · Satinder Singh 🔗
-	In-context Reinforcement Learning with Algorithm Distillation ( Poster ) > link Link	14 presenters Michael Laskin · Luyu Wang · Junhyuk Oh · Emilio Parisotto · Stephen Spencer · Richie Steigerwald · DJ Strouse · Steven Hansen · Angelos Filos · Ethan Brooks · Maxime Gazeau · Himanshu Sahni · Satinder Singh · Volodymyr Mnih 🔗
-	Contextual Transformer for Offline Meta Reinforcement Learning ( Poster ) > link SlidesLive Video Link	Runji Lin · Ye Li · Xidong Feng · Zhaowei Zhang · XIAN HONG WU FUNG · Haifeng Zhang · Jun Wang · Yali Du · Yaodong Yang 🔗
-	Generative Pretraining for Black-Box Optimization ( Poster ) > link SlidesLive Video Link	Siddarth Krishnamoorthy · Satvik Mashkaria · Aditya Grover 🔗
-	Understanding Hindsight Goal Relabeling Requires Rethinking Divergence Minimization ( Poster ) > link SlidesLive Video Link	Lunjun Zhang · Bradly Stadie 🔗
-	REACT: Synergizing Reasoning and Acting in Language Models ( Poster ) > link SlidesLive Video Link	Shunyu Yao · Jeffrey Zhao · Dian Yu · Izhak Shafran · Karthik Narasimhan · Yuan Cao 🔗
-	ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning ( Poster ) > link SlidesLive Video Link	Tung Nguyen · Qinqing Zheng · Aditya Grover 🔗
-	Skill Acquisition by Instruction Augmentation on Offline Datasets ( Poster ) > link SlidesLive Video Link	Ted Xiao · Harris Chan · Pierre Sermanet · Ayzaan Wahid · Anthony Brohan · Karol Hausman · Sergey Levine · Jonathan Tompson 🔗
-	On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning ( Poster ) > link SlidesLive Video Link	yifan xu · Nicklas Hansen · Zirui Wang · Yung-Chieh Chan · Hao Su · Zhuowen Tu 🔗
-	CLaP: Conditional Latent Planners for Offline Reinforcement Learning ( Poster ) > link SlidesLive Video Link	Harry Shin · Rose Wang 🔗
-	Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action ( Poster ) > link SlidesLive Video Link	Dhruv Shah 🔗
-	Deep Transformer Q-Networks for Partially Observable Reinforcement Learning ( Poster ) > link SlidesLive Video Link	Kevin Esslinger · Robert Platt · Christopher Amato 🔗
-	Control Graph as Unified IO for Morphology-Task Generalization ( Poster ) > link SlidesLive Video Link	Hiroki Furuta · Yusuke Iwasawa · Yutaka Matsuo · Shixiang (Shane) Gu 🔗
-	Hyper-Decision Transformer for Efficient Online Policy Adaptation ( Poster ) > link Link	Mengdi Xu · Yuchen Lu · Yikang Shen · Shun Zhang · DING ZHAO · Chuang Gan 🔗
-	Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training ( Poster ) > link SlidesLive Video Link	Jason Yecheng Ma · Shagun Sodhani · Dinesh Jayaraman · Osbert Bastani · Vikash Kumar · Amy Zhang 🔗
-	VIMA: General Robot Manipulation with Multimodal Prompts ( Poster ) > link SlidesLive Video Link	Yunfan Jiang · Agrim Gupta · Zichen Zhang · Guanzhi Wang · Yongqiang Dou · Yanjun Chen · Fei-Fei Li · Anima Anandkumar · Yuke Zhu · Linxi Fan 🔗
-	Constrained MDPs can be Solved by Eearly-Termination with Recurrent Models ( Poster ) > link SlidesLive Video Link	Hao Sun · Ziping Xu · Meng Fang · Zhenghao Peng · Taiyi Wang · Bolei Zhou 🔗
-	Supervised Q-Learning can be a Strong Baseline for Continuous Control ( Poster ) > link SlidesLive Video Link	Hao Sun · Ziping Xu · Taiyi Wang · Meng Fang · Bolei Zhou 🔗
-	Solving PDDL Planning Problems with Pretrained Large Language Models ( Poster ) > link SlidesLive Video Link	Tom Silver · Varun Hariprasad · Reece Shuttleworth · Nishanth Kumar · Tomás Lozano-Pérez · Leslie Kaelbling 🔗
-	Collaborating with language models for embodied reasoning ( Poster ) > link SlidesLive Video Link	Ishita Dasgupta · Christine Kaeser-Chen · Kenneth Marino · Arun Ahuja · Sheila Babayan · Felix Hill · Rob Fergus 🔗
-	Elicitation Inference Optimization for Multi-Principal-Agent Alignment ( Poster ) > link SlidesLive Video Link	Andrew Konya · Yeping L Qiu · Michael Varga · Aviv Ovadya 🔗
-	LMPriors: Pre-Trained Language Models as Task-Specific Priors ( Poster ) > link SlidesLive Video Link	Kristy Choi · Chris Cundy · Sanjari Srivastava · Stefano Ermon 🔗