Skip to yearly menu bar Skip to main content


Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback

Marcus Williams ⋅ Micah Carroll ⋅ Constantin Weisser ⋅ Adhyyan Narang ⋅ Brendan Murphy ⋅ Anca Dragan

Abstract

Chat is not available.