Poster

The Art of Saying No: Contextual Noncompliance in Language Models

Faeze Brahman ⋅ Sachin Kumar ⋅ Vidhisha Balachandran ⋅ Pradeep Dasigi ⋅ Valentina Pyatkin ⋅ Abhilasha Ravichander ⋅ Sarah Wiegreffe ⋅ Nouha Dziri ⋅ Khyathi Chandu ⋅ Jack Hessel ⋅ Yulia Tsvetkov ⋅ Noah Smith ⋅ Yejin Choi ⋅ Hanna Hajishirzi

2024 Poster

[ Paper]

Abstract

Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of ``unsafe'' queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a wide range of categories including incomplete, unsupported, indeterminate, and humanizing requests (in addition to unsafe requests). To test noncompliance capabilities of language models, we use this taxonomy to develop a new evaluation suite of 1000 noncompliance prompts. We find that most existing models show significantly high compliance rates in certain previously understudied categories with models like GPT-4 incorrectly complying with as many as 30\% of requests.To address these gaps, we explore different training strategies using a synthetically-generated training set of requests and expected noncompliant responses. Our experiments demonstrate that while direct finetuning of instruction-tuned models can lead to both over-refusal and a decline in general capabilities, using parameter efficient methods like low rank adapters helps to strike a good balance between appropriate noncompliance and other capabilities.

Video

Chat is not available.