Timezone: »
Most existing benchmarks for grounding language in interactive environments either lack realistic linguistic elements, or prove difficult to scale up due to substantial human involvement in the collection of data or feedback signals. We develop WebShop – a simulated e-commerce website environment with 1.18 million real-world products and 12,087 crowd-sourced text instructions. In this environment, an agent needs to navigate multiple types of webpages and issue diverse actions to find, customize, and purchase a product given an instruction. WebShop provides several challenges including understanding compositional instructions, query (re-)formulation, dealing with noisy text in webpages, and performing strategic exploration. We collect over 1,600 human trajectories to first validate the benchmark, then train and evaluate a diverse range of agents using reinforcement learning, imitation learning, and pre-trained image and language models. Our best model achieves a task success rate of 29%, which significantly outperforms rule heuristics but is far lower than expert human performance (59%). We also analyze agent and human trajectories and ablate various model components to provide insights for developing future agents with stronger language understanding and decision making abilities. Finally, we show our agent trained on WebShop exhibits non-trivial sim-to-real transfer when evaluated on amazon.com and ebay.com, indicating the potential value of our benchmark for developing practical web agents that can operate in the wild.
Author Information
Shunyu Yao (Princeton University)
Howard Chen (Princeton University)
John Yang (Princeton University)

I'm a 2nd year Master's student studying Natural Language Processing at Princeton University, advised by Professor Karthik Narasimhan.
Karthik Narasimhan (Princeton University)
More from the Same Authors
-
2022 : REACT: Synergizing Reasoning and Acting in Language Models »
Shunyu Yao · Jeffrey Zhao · Dian Yu · Izhak Shafran · Karthik Narasimhan · Yuan Cao -
2022 : Towards an Enhanced, Faithful, and Adaptable Web Interaction Environment »
John Yang · Howard Chen · Karthik Narasimhan -
2023 Poster: Reflexion: language agents with verbal reinforcement learning »
Noah Shinn · Federico Cassano · Ashwin Gopinath · Karthik Narasimhan · Shunyu Yao -
2023 Poster: Tree of Thoughts: Deliberate Problem Solving with Large Language Models »
Shunyu Yao · Dian Yu · Jeffrey Zhao · Izhak Shafran · Tom Griffiths · Yuan Cao · Karthik Narasimhan -
2023 Poster: InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback »
John Yang · Akshara Prabhakar · Karthik Narasimhan · Shunyu Yao -
2023 Oral: Tree of Thoughts: Deliberate Problem Solving with Large Language Models »
Shunyu Yao · Dian Yu · Jeffrey Zhao · Izhak Shafran · Tom Griffiths · Yuan Cao · Karthik Narasimhan -
2022 : WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents »
Shunyu Yao -
2022 : Karthik Narasimhan: Semantic Supervision for few-shot generalization and personalization »
Karthik Narasimhan -
2022 Poster: Using natural language and program abstractions to instill human inductive biases in machines »
Sreejan Kumar · Carlos G. Correa · Ishita Dasgupta · Raja Marjieh · Michael Y Hu · Robert Hawkins · Jonathan D Cohen · nathaniel daw · Karthik Narasimhan · Tom Griffiths -
2022 Poster: Learning Physics Constrained Dynamics Using Autoencoders »
Tsung-Yen Yang · Justinian Rosca · Karthik Narasimhan · Peter J. Ramadge -
2022 Poster: DataMUX: Data Multiplexing for Neural Networks »
Vishvak Murahari · Carlos Jimenez · Runzhe Yang · Karthik Narasimhan -
2019 Poster: Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations »
Kevin Smith · Lingjie Mei · Shunyu Yao · Jiajun Wu · Elizabeth Spelke · Josh Tenenbaum · Tomer Ullman -
2018 Poster: 3D-Aware Scene Manipulation via Inverse Graphics »
Shunyu Yao · Tzu Ming Hsu · Jun-Yan Zhu · Jiajun Wu · Antonio Torralba · Bill Freeman · Josh Tenenbaum -
2016 Poster: Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation »
Tejas Kulkarni · Karthik Narasimhan · Ardavan Saeedi · Josh Tenenbaum