Poster
Geometric-Averaged Preference Optimization for Soft Preference Labels
Hiroki Furuta · Kuang-Huei Lee · Shixiang (Shane) Gu · Yutaka Matsuo · Aleksandra Faust · Heiga Zen · Izzeddin Gur
East Exhibit Hall A-C #3300
Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic.However, it is reasonable to think that they can vary with different individuals, and thus should be distributional to reflect the subtle relationship between the responses.In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function.In doing so, the scale of learning loss is adjusted based on the soft labels, and the loss with equally preferred responses would be close to zero.This simple modification can be easily applied to any DPO family, and help the models escape from the objective mismatch prior works suffer from.In our experiments, we simulate the soft preference labels with AI feedback from LLMs and demonstrate that geometric averaging consistently improves performance on standard benchmarks for alignment research. In particular, we observe significant improvements with data where modestly-confident labels are in the majority.
Live content is unavailable. Log in and register to view live content