NeurIPS Poster Towards Safe Concept Transfer of Multi-Modal Diffusion via Causal Representation Editing

Poster

Towards Safe Concept Transfer of Multi-Modal Diffusion via Causal Representation Editing

Peiran Dong · Bingjie WANG · Song Guo · Junxiao Wang · Jie ZHANG · Zicong Hong

East Exhibit Hall A-C #2802

[ Abstract ]

[ Paper] [ Slides] [ Poster] [ OpenReview]

Wed 11 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

Recent advancements in vision-language-to-image (VL2I) diffusion generation have made significant progress. While generating images from broad vision-language inputs holds promise, it also raises concerns about potential misuse, such as copying artistic styles without permission, which could have legal and social consequences. Therefore, it's crucial to establish governance frameworks to ensure ethical and copyright integrity, especially with widely used diffusion models. To address these issues, researchers have explored various approaches, such as dataset filtering, adversarial perturbations, machine unlearning, and inference-time refusals. However, these methods often lack either scalability or effectiveness. In response, we propose a new framework called causal representation editing (CRE), which extends representation editing from large language models (LLMs) to diffusion-based models. CRE enhances the efficiency and flexibility of safe content generation by intervening at diffusion timesteps causally linked to unsafe concepts. This allows for precise removal of harmful content while preserving acceptable content quality, demonstrating superior effectiveness, precision and scalability compared to existing methods. CRE can handle complex scenarios, including incomplete or blurred representations of unsafe concepts, offering a promising solution to challenges in managing harmful content generation in diffusion-based models.

Chat is not available.