NeurIPS How Long Can Context Length of Open-Source LLMs truly Promise?

Poster
in
Workshop: Instruction Tuning and Instruction Following

How Long Can Context Length of Open-Source LLMs truly Promise?

Dacheng Li · Rulin Shao · Anze Xie · Ying Sheng · Lianmin Zheng · Joseph Gonzalez · Ion Stoica · Xuezhe Ma · Hao Zhang

Keywords: [ long-context instruction following chatbot ] [ Long-context evaluation ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Large language models (LLMs) with long-context instruction following ability has unlocked new potentials, such as supporting long interactive chat sessions. In this paper, we introduce a test suite, LongEval, which enables us to evaluate the long-range retrieval ability of LLMs at various context lengths. We use LongEval to evaluate open-sourced LLMs, and surprisingly, we find many of them fail to achieve their promised context length. In addition, we present a recipe to fine tune a long-context chatbot based on LLaMA models, and introduce LongChat models that supporting conversations of up to 16,384 tokens. We have released our code at https://github.com/DachengLi1/LongChat.

Chat is not available.

Poster in Workshop: Instruction Tuning and Instruction Following

How Long Can Context Length of Open-Source LLMs truly Promise?

Dacheng Li · Rulin Shao · Anze Xie · Ying Sheng · Lianmin Zheng · Joseph Gonzalez · Ion Stoica · Xuezhe Ma · Hao Zhang

Poster
in
Workshop: Instruction Tuning and Instruction Following