Skip to yearly menu bar Skip to main content

Workshop: Attributing Model Behavior at Scale (ATTRIB)

Self-Select: Optimizing Instruction Selection for Large Language Models

Alexander Kyimpopkin · Keshav Ramji

Abstract: The same question can often be presented in different ways, depending on the audience and the intent with which it is being posed. To determine whether large language models (LLMs) demonstrate preferences for one phrasing over another regardless of semantic content, we introduce \textit{Self-Select}, a method for selection of a preferred instruction template, and generation of high-quality synthetic data samples. This algorithm makes use of a \textit{meta-prompt} to decide on an instruction template, given a task and candidate templates then generates $n$ new samples using the chosen template. We evaluate \textit{Self-Select} on numerical reasoning and sentiment classification tasks, using a variety of instruction-tuned and base models, providing insights into their abilities and biases. We find that permuting the instruction template ordering in the prompt leads to vastly different choice distributions, suggesting that selections of a specific template can be attributed to inductive biases rather than semantic understanding, even after instruction-tuning.

Chat is not available.