Skip to yearly menu bar Skip to main content


Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs

Alexander von Recum ⋅ Christoph Schnabl ⋅ Gabor Hollbeck ⋅ Marvin von Hagen ⋅ Silas Alberti ⋅ Philip Blinde

Abstract

Chat is not available.