firstbacksecondback
2 Results
Workshop
|
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering Joris Postmus · Steven Abreu |
||
Workshop
|
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs Itamar Pres · Laura Ruis · Ekdeep S Lubana · David Krueger |