Timezone: »

Anca Dragan: Learning human preferences from language
Anca Dragan

In classic instruction following, language like "I'd like the JetBlue flight" maps to actions (e.g., selecting that flight). However, language also conveys information about a user's underlying reward function (e.g., a general preference for JetBlue), which can allow a model to carry out desirable actions in new contexts. In this talk, I'll share a model that infers rewards from language pragmatically: reasoning about how speakers choose utterances not only to elicit desired actions, but also to reveal information about their preferences.

Author Information

Anca Dragan (UC Berkeley)

More from the Same Authors