Timezone: »
Poster
Fine-tuning language models to find agreement among humans with diverse preferences
Michiel Bakker · Martin Chadwick · Hannah Sheahan · Michael Tessler · Lucy Campbell-Gillingham · Jan Balaguer · Nat McAleese · Amelia Glaese · John Aslanides · Matt Botvinick · Christopher Summerfield
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs ($>70\%$) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions ($>65\%$). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.
Author Information
Michiel Bakker (DeepMind)
Martin Chadwick (DeepMind)
Hannah Sheahan (DeepMind)
Michael Tessler (DeepMind)
Lucy Campbell-Gillingham (DeepMind Technologies Ltd)
Jan Balaguer (DeepMind)
Nat McAleese (DeepMind)
Amelia Glaese (DeepMind)
John Aslanides (DeepMind)
Matt Botvinick (Google DeepMind / University College London)
Christopher Summerfield (Google Deepmind)
More from the Same Authors
-
2021 : Alchemy: A benchmark and analysis toolkit for meta-reinforcement learning agents »
Jane Wang · Michael King · Nicolas Porcel · Zeb Kurth-Nelson · Tina Zhu · Charles Deck · Peter Choy · Mary Cassin · Malcolm Reynolds · Francis Song · Gavin Buttimore · David Reichert · Neil Rabinowitz · Loic Matthey · Demis Hassabis · Alexander Lerchner · Matt Botvinick -
2021 Spotlight: Collaborating with Humans without Human Data »
DJ Strouse · Kevin McKee · Matt Botvinick · Edward Hughes · Richard Everett -
2021 : Learning to solve complex tasks by growing knowledge culturally across generations »
Michael Tessler · Jason Madeano · Pedro Tsividis · Noah Goodman · Josh Tenenbaum -
2022 Poster: Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models »
Maribeth Rauh · John Mellor · Jonathan Uesato · Po-Sen Huang · Johannes Welbl · Laura Weidinger · Sumanth Dathathri · Amelia Glaese · Geoffrey Irving · Iason Gabriel · William Isaac · Lisa Anne Hendricks -
2022 Poster: Communicating Natural Programs to Humans and Machines »
Sam Acquaviva · Yewen Pu · Marta Kryven · Theodoros Sechopoulos · Catherine Wong · Gabrielle Ecanow · Maxwell Nye · Michael Tessler · Josh Tenenbaum -
2021 : Spotlight Talk: Learning to solve complex tasks by growing knowledge culturally across generations »
Noah Goodman · Josh Tenenbaum · Michael Tessler · Jason Madeano -
2021 Poster: Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning »
Maxwell Nye · Michael Tessler · Josh Tenenbaum · Brenden Lake -
2021 Poster: Collaborating with Humans without Human Data »
DJ Strouse · Kevin McKee · Matt Botvinick · Edward Hughes · Richard Everett -
2021 Poster: Attention over Learned Object Embeddings Enables Complex Visual Reasoning »
David Ding · Felix Hill · Adam Santoro · Malcolm Reynolds · Matt Botvinick -
2021 Poster: SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition »
Rishabh Kabra · Daniel Zoran · Goker Erdogan · Loic Matthey · Antonia Creswell · Matt Botvinick · Alexander Lerchner · Chris Burgess -
2021 Oral: Attention over Learned Object Embeddings Enables Complex Visual Reasoning »
David Ding · Felix Hill · Adam Santoro · Malcolm Reynolds · Matt Botvinick -
2020 : Panel discussion »
Pierre-Yves Oudeyer · Marc Bellemare · Peter Stone · Matt Botvinick · Susan Murphy · Anusha Nagabandi · Ashley Edwards · Karen Liu · Pieter Abbeel -
2020 : Invited talk: Matt Botvinick "Alchemy: A Benchmark Task Distribution for Meta-Reinforcement Learning Research" »
Matt Botvinick -
2019 : Poster session »
Jindong Gu · Alice Xiang · Atoosa Kasirzadeh · Zhiwei Han · Omar U. Florez · Frederik Harder · An-phi Nguyen · Amir Hossein Akhavan Rahnama · Michele Donini · Dylan Slack · Junaid Ali · Paramita Koley · Michiel Bakker · Anna Hilgard · Hailey James · Gonzalo Ramos · Jialin Lu · Jingying Yang · Margarita Boyarskaya · Martin Pawelczyk · Kacper Sokol · Mimansa Jaiswal · Umang Bhatt · David Alvarez-Melis · Aditya Grover · Charles Marx · Mengjiao (Sherry) Yang · Jingyan Wang · Gökhan Çapan · Hanchen Wang · Steffen Grünewälder · Moein Khajehnejad · Gourab Patro · Russell Kunes · Samuel Deng · Yuanting Liu · Luca Oneto · Mengze Li · Thomas Weber · Stefan Matthes · Duy Patrick Tu -
2019 Poster: When to use parametric models in reinforcement learning? »
Hado van Hasselt · Matteo Hessel · John Aslanides -
2018 Poster: Randomized Prior Functions for Deep Reinforcement Learning »
Ian Osband · John Aslanides · Albin Cassirer -
2018 Poster: Learning to Share and Hide Intentions using Information Regularization »
DJ Strouse · Max Kleiman-Weiner · Josh Tenenbaum · Matt Botvinick · David Schwab -
2018 Spotlight: Randomized Prior Functions for Deep Reinforcement Learning »
Ian Osband · John Aslanides · Albin Cassirer -
2017 : Panel Discussion »
Matt Botvinick · Emma Brunskill · Marcos Campos · Jan Peters · Doina Precup · David Silver · Josh Tenenbaum · Roy Fox -
2017 : Applying variational information bottleneck in hierarchical domains (Matt Botvinick) »
Matt Botvinick -
2017 : Meta-reinforcement learning in brains and machines »
Matt Botvinick