NeurIPS Reward Model Aggregation

Poster
in
Workshop: Instruction Tuning and Instruction Following

Reward Model Aggregation

Zihao Wang · Chirag Nagpal · Alexander D'Amour · Victor Veitch · Sanmi Koyejo

Keywords: [ reward aggregation ] [ LLM alignment ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Aligning language models requires guiding outputs towards desired properties using reward models. This paper tackles the challenge of combining multiple reward models for diverse objectives. We introduce methods for aggregating these rewards using logical operations. Experiments confirm our methods beat traditional aggregation techniques and underscore the significance of proper reference values.

Chat is not available.

Poster in Workshop: Instruction Tuning and Instruction Following

Reward Model Aggregation

Zihao Wang · Chirag Nagpal · Alexander D'Amour · Victor Veitch · Sanmi Koyejo

Poster
in
Workshop: Instruction Tuning and Instruction Following