firstbacksecondback
6 Results
Workshop
|
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering Joris Postmus · Steven Abreu |
||
Workshop
|
Attention Shift: Steering AI Away from Unsafe Content Shivank Garg · Manyana Tiwari |
||
Workshop
|
Unveiling and Manipulating Concepts in Time Series Foundation Models Michal Wilinski · Mononito Goswami · Nina Żukowska · Willa Potosnak · Artur Dubrawski |
||
Workshop
|
Towards Inference-time Category-wise Safety Steering for Large Language Models Amrita Bhattacharjee · Shaona Ghosh · Traian Rebedea · Christopher Parisien |
||
Workshop
|
Steering Without Side Effects: Improving Post-Deployment Control of Language Models Asa Cooper Stickland · Aleksandr Lyzhov · Jacob Pfau · Salsabila Mahdi · Samuel Bowman |
||
Poster
|
Thu 16:30 |
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization Yuanpu Cao · Tianrong Zhang · Bochuan Cao · Ziyi Yin · Lu Lin · Fenglong Ma · Jinghui Chen |