Skip to yearly menu bar Skip to main content


Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback

Marcus Williams · Micah Carroll · Constantin Weisser · Brendan Murphy · Adhyyan Narang · Anca Dragan

Abstract

Chat is not available.