Skip to yearly menu bar Skip to main content


RepIt: Steering Language Models with Concept-Specific Refusal Vectors

Vincent Siu · Nathan Henry · Nicholas Crispino · Yang Liu · Dawn Song · Chenguang Wang

Abstract

Chat is not available.