Reasoning in Superposition
Zheyang Xiong · Shivam Garg · Vaishnavi Shrivastava · Haoyu Zhao · Anastasios Kyrillidis · Dimitris Papailiopoulos
Abstract
While Large Language Models (LLMs) usually reason on the language space with discrete tokens, recent studies have found that LLMs can reason on more expressive spaces like continuous latent space. However, training LLMs on continuous latent space is challenging due to lack of sufficient training signals. In this work, we propose a way that teaches LLMs to reason on superpositions of discrete tokens. Our model takes in a superposition of token embeddings and outputs multiple tokens using a Multi-token Prediction (MTP) module. Our empirical results show that with superposition reasoning, the model use $\sim$40\% fewer reasoning tokens compared to the baseline with small accuracy gap.
Chat is not available.
Successful Page Load