firstbacksecondback
5 Results
Workshop
|
Extracting Unlearned Information from LLMs with Activation Steering Atakan Seyitoğlu · Aleksei Kuvshinov · Leo Schwinn · Stephan Günnemann |
||
Workshop
|
Overcoming Limitations of Steering Vectors with Low-Rank Representation Steering Dmitrii Krasheninnikov · David Krueger |
||
Workshop
|
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering Joris Postmus · Steven Abreu |
||
Workshop
|
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs Itamar Pres · Laura Ruis · Ekdeep S Lubana · David Krueger |
||
Workshop
|
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks Madeline Brumley · Joe Kwon · David Krueger · Dmitrii Krasheninnikov · Usman Anwar |