Skip to yearly menu bar Skip to main content

Workshop: Instruction Tuning and Instruction Following

Exploring and Improving the Spatial Reasoning Abilities of Large Language Models

Manasi Sharma

Keywords: [ Large language models ] [ Spatial Reasoning ] [ 3D Data ] [ prompting ] [ Robotics ]


Large Language Models (LLMs) represent formidable tools for sequence modeling, boasting an innate capacity for general pattern recognition. Nevertheless, their broader spatial reasoning capabilities remain insufficiently explored. In this paper, we investigate the zero-shot performance of LLMs when confronted with a limited dataset comprising 3D robotic trajectory data and associated tasks, such as directional and motion labeling. Additionally, we introduce a novel prefix-based prompting mechanism, which yields a 30\% improvement on the 3D trajectory data and an increase of up to 16\% on SpartQA tasks when contrasted with the conventional vanilla prompt baseline (with gains over Chain-of-Thought prompting as well). The experimentation with 3D trajectory data offers an intriguing glimpse into the manner in which LLMs engage with numerical and spatial information, thus laying a solid foundation for the identification of target areas for future enhancements.

Chat is not available.