Skip to yearly menu bar Skip to main content


Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback

Marcus Williams ⋅ Micah Carroll ⋅ Constantin Weisser ⋅ Brendan Murphy ⋅ Adhyyan Narang ⋅ Anca Dragan

Abstract

Chat is not available.