Skip to yearly menu bar Skip to main content


Paraphrasing Away Malicious Tokens: Improving LLM Finetuning Safety by Filtering Spurious Correlation

Marcel Mateos Salles ⋅ Praney Goyal ⋅ Pradyut Sekhsaria ⋅ Hai Huang ⋅ Randall Balestriero

Abstract

Chat is not available.