Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

6 Results

<<   <   Page 1 of 1   >>   >
Workshop
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Joris Postmus · Steven Abreu
Workshop
Attention Shift: Steering AI Away from Unsafe Content
Shivank Garg · Manyana Tiwari
Workshop
Unveiling and Manipulating Concepts in Time Series Foundation Models
Michal Wilinski · Mononito Goswami · Nina Żukowska · Willa Potosnak · Artur Dubrawski
Workshop
Towards Inference-time Category-wise Safety Steering for Large Language Models
Amrita Bhattacharjee · Shaona Ghosh · Traian Rebedea · Christopher Parisien
Workshop
Steering Without Side Effects: Improving Post-Deployment Control of Language Models
Asa Cooper Stickland · Aleksandr Lyzhov · Jacob Pfau · Salsabila Mahdi · Samuel Bowman
Poster
Thu 16:30 Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Yuanpu Cao · Tianrong Zhang · Bochuan Cao · Ziyi Yin · Lu Lin · Fenglong Ma · Jinghui Chen