Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

73 Results

<<   <   Page 3 of 7   >   >>
Workshop
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding
Haneul Yoo · Yongjin Yang · Hwaran Lee
Workshop
Sat 15:45 SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
Carter Teplica · Yixin Liu · Arman Cohan · Tim G. J. Rudner
Workshop
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
Carter Teplica · Yixin Liu · Arman Cohan · Tim G. J. Rudner
Workshop
Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Wu · Rishi Shah · Jing Yu Koh · Ruslan Salakhutdinov · Daniel Fried · Aditi Raghunathan
Workshop
Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Wu · Rishi Shah · Jing Yu Koh · Ruslan Salakhutdinov · Daniel Fried · Aditi Raghunathan
Workshop
SkewAct: Red Teaming Large Language Models via Activation-Skewed Adversarial Prompt Optimization
Hanxi Guo · Siyuan Cheng · Guanhong Tao · Guangyu Shen · Zhuo Zhang · Shengwei An · Kaiyuan Zhang · Xiangyu Zhang
Workshop
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Shaona Ghosh · Prasoon Varshney · Makesh Narsimhan Sreedhar · Aishwarya Padmakumar · Traian Rebedea · Jibin Varghese · Christopher Parisien
Workshop
Sun 11:20 The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Ezra Edelman · Nikolaos Tsilivis · Surbhi Goel · Benjamin Edelman · Eran Malach
Workshop
Dynamic Vocabulary Pruning in Early-Exit LLMs
· Karim Abdel Sadek · Joan Velja · Matteo Nulli · Metod Jazbec
Workshop
Infecting LLM Agents via Generalizable Adversarial Attack
Weichen Yu · Kai Hu · Tianyu Pang · Chao Du · Min Lin · Matt Fredrikson
Workshop
Sun 16:30 Boundaries of stable regions in activation space of LLMs become sharper with more compute
Jett Janiak · Jacek Karwowski · Chatrik Mangat · Giorgi Giglemiani · Nora Petrova · Stefan Heimersheim
Workshop
Steering Without Side Effects: Improving Post-Deployment Control of Language Models
Asa Cooper Stickland · Aleksandr Lyzhov · Jacob Pfau · Salsabila Mahdi · Samuel Bowman