Skip to yearly menu bar Skip to main content


Poster

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Jiaxiang Li ⋅ Siliang Zeng ⋅ Hoi-To Wai ⋅ Chenliang Li ⋅ Alfredo Garcia ⋅ Mingyi Hong
2024 Poster

Abstract

Video

Chat is not available.