Skip to yearly menu bar Skip to main content


Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment

Chenliang Li · Siliang Zeng · Zeyi Liao · Jiaxiang Li · Dongyeop Kang · Alfredo Garcia · Mingyi Hong

Abstract

Chat is not available.