firstbacksecondback
73 Results
Workshop
|
Code-Switching Red-Teaming: LLM Evaluation for Safety and Multilingual Understanding Haneul Yoo · Yongjin Yang · Hwaran Lee |
||
Workshop
|
Sat 15:45 |
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models Carter Teplica · Yixin Liu · Arman Cohan · Tim G. J. Rudner |
|
Workshop
|
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models Carter Teplica · Yixin Liu · Arman Cohan · Tim G. J. Rudner |
||
Workshop
|
Dissecting Adversarial Robustness of Multimodal LM Agents Chen Wu · Rishi Shah · Jing Yu Koh · Ruslan Salakhutdinov · Daniel Fried · Aditi Raghunathan |
||
Workshop
|
Dissecting Adversarial Robustness of Multimodal LM Agents Chen Wu · Rishi Shah · Jing Yu Koh · Ruslan Salakhutdinov · Daniel Fried · Aditi Raghunathan |
||
Workshop
|
SkewAct: Red Teaming Large Language Models via Activation-Skewed Adversarial Prompt Optimization Hanxi Guo · Siyuan Cheng · Guanhong Tao · Guangyu Shen · Zhuo Zhang · Shengwei An · Kaiyuan Zhang · Xiangyu Zhang |
||
Workshop
|
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails Shaona Ghosh · Prasoon Varshney · Makesh Narsimhan Sreedhar · Aishwarya Padmakumar · Traian Rebedea · Jibin Varghese · Christopher Parisien |
||
Workshop
|
Sun 11:20 |
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains Ezra Edelman · Nikolaos Tsilivis · Surbhi Goel · Benjamin Edelman · Eran Malach |
|
Workshop
|
Dynamic Vocabulary Pruning in Early-Exit LLMs · Karim Abdel Sadek · Joan Velja · Matteo Nulli · Metod Jazbec |
||
Workshop
|
Infecting LLM Agents via Generalizable Adversarial Attack Weichen Yu · Kai Hu · Tianyu Pang · Chao Du · Min Lin · Matt Fredrikson |
||
Workshop
|
Sun 16:30 |
Boundaries of stable regions in activation space of LLMs become sharper with more compute Jett Janiak · Jacek Karwowski · Chatrik Mangat · Giorgi Giglemiani · Nora Petrova · Stefan Heimersheim |
|
Workshop
|
Steering Without Side Effects: Improving Post-Deployment Control of Language Models Asa Cooper Stickland · Aleksandr Lyzhov · Jacob Pfau · Salsabila Mahdi · Samuel Bowman |