Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

5 Results

<<   <   Page 1 of 1   >>   >
Workshop
Extracting Unlearned Information from LLMs with Activation Steering
Atakan Seyitoğlu · Aleksei Kuvshinov · Leo Schwinn · Stephan Günnemann
Workshop
Overcoming Limitations of Steering Vectors with Low-Rank Representation Steering
Dmitrii Krasheninnikov · David Krueger
Workshop
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Joris Postmus · Steven Abreu
Workshop
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
Itamar Pres · Laura Ruis · Ekdeep S Lubana · David Krueger
Workshop
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
Madeline Brumley · Joe Kwon · David Krueger · Dmitrii Krasheninnikov · Usman Anwar