Skip to yearly menu bar Skip to main content


Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment

Chenliang Li ⋅ Siliang Zeng ⋅ Zeyi Liao ⋅ Jiaxiang Li ⋅ Dongyeop Kang ⋅ Alfredo Garcia ⋅ Mingyi Hong

Abstract

Chat is not available.