Poster
in
Workshop: Pluralistic Alignment Workshop

Aligning to Thousands of Varying Preferences via System Message Generalization

Seongyun Lee · Sue Hyun Park · Seungone Kim · Minjoon Seo

Project Page [ Poster] [ OpenReview]

Abstract

Current large language model (LLM) alignment methods often assume that aligning LLMs with general public preferences is optimal, overlooking individual value diversity. A major challenge in adopting a more individualized approach to LLM alignment is its lack of scalability, as it involves re-training new models for new value or user. We propose a new paradigm where users specify their values within the system message, steering LLM behavior to align with individual intentions. However, LLMs are typically trained on a generic system messages (e.g., "You are a helpful assistant"). To improve generalization to diverse system messages, we create a system message dataset with 197k value combinations across 66k user instructions. We train a 7B LLM, Janus, and test it on 921 prompts from 5 benchmarks, adding various unseen system messages reflecting user preferences. Janus achieves high tie+win rates against leading models, including GPT-4. Unexpectedly, Janus also outperforms LLaMA 3 8B Instruct on general helpfulness benchmarks, suggesting that training with diverse system messages enhances alignment with both individual and general preferences. Code, dataset, benchmark, and models are available at https://anonymous.4open.science/r/janus.

Chat is not available.