Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

14 Results

<<   <   Page 1 of 2   >   >>
Poster
Thu 11:00 Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs
Zhao Xu · Fan LIU · Hao Liu
Poster
Wed 11:00 Improved Generation of Adversarial Examples Against Safety-aligned LLMs
Qizhang Li · Yiwen Guo · Wangmeng Zuo · Hao Chen
Poster
Thu 11:00 Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control
Yuxin Xiao · Wan Chaoqun · Yonggang Zhang · Wenxiao Wang · Binbin Lin · Xiaofei He · Xu Shen · Jieping Ye
Poster
Thu 11:00 Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Anay Mehrotra · Manolis Zampetakis · Paul Kassianik · Blaine Nelson · Hyrum Anderson · Yaron Singer · Amin Karbasi
Poster
Fri 11:00 Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities
Alexander Nikitin · Jannik Kossen · Yarin Gal · Pekka Marttinen
Workshop
Sat 12:00 Weak-to-Strong Confidence Prediction
Yukai Yang · Tracy Zhu · Marco Morucci · Tim G. J. Rudner
Poster
Wed 16:30 Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
Kaifeng Lyu · Haoyu Zhao · Xinran Gu · Dingli Yu · Anirudh Goyal · Sanjeev Arora
Poster
Thu 16:30 PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations
Jiatong Li · Renjun Hu · Kunzhe Huang · Yan Zhuang · Qi Liu · Mengxiao Zhu · Xing Shi · Wei Lin
Poster
Fri 11:00 MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability
YANRUI DU · Sendong Zhao · Danyang Zhao · Ming Ma · Yuhan Chen · Liangyu Huo · Qing Yang · Dongliang Xu · Bing Qin
Poster
Thu 11:00 Protecting Your LLMs with Information Bottleneck
Zichuan Liu · Zefan Wang · Linjie Xu · Jinyu Wang · Lei Song · Tianchun Wang · Chunlin Chen · Wei Cheng · Jiang Bian
Poster
Wed 16:30 On scalable oversight with weak LLMs judging strong LLMs
Zachary Kenton · Noah Siegel · Janos Kramar · Jonah Brown-Cohen · Samuel Albanie · Jannis Bulian · Rishabh Agarwal · David Lindner · Yunhao Tang · Noah Goodman · Rohin Shah
Poster
Fri 11:00 ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
jingnan zheng · Han Wang · An Zhang · Nguyen Duy Tai · Jun Sun · Tat-Seng Chua