Workshop
Foundation Model Interventions
Pau Rodriguez · Arno Blaas · Desi R Ivanova · Sahra Ghalebikesabi · Yuki M Asano · Katherine Metcalf · Xavier Suau
West Meeting Room 121, 122
Sun 15 Dec, 8:45 a.m. PST
The increasing capabilities of foundation models have raised concerns about their potential to generate undesirable content, perpetuate biases, and promote harmful behaviors. To address these issues, we propose a workshop that focuses on understanding the inner workings of foundation models and identifying actionable mechanisms involved in generation. Recent studies have shown promise in directly intervening on model activations or a low-rank subset of the weights to provide fine-grained control over model generation to mitigate the generation of harmful and toxic content. This workshop aims to bring together researchers to explore methods for improving the controllability of foundation models and developing a better understanding of their behavior and potential misuse.
Live content is unavailable. Log in and register to view live content