Skip to yearly menu bar Skip to main content


RepIt: Steering Language Models with Concept-Specific Refusal Vectors

Vincent Siu ⋅ Nathan Henry ⋅ Nicholas Crispino ⋅ Yang Liu ⋅ Dawn Song ⋅ Chenguang Wang

Abstract

Chat is not available.