firstbacksecondback
14 Results
Poster
|
Thu 11:00 |
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs Zhao Xu · Fan LIU · Hao Liu |
|
Poster
|
Wed 11:00 |
Improved Generation of Adversarial Examples Against Safety-aligned LLMs Qizhang Li · Yiwen Guo · Wangmeng Zuo · Hao Chen |
|
Poster
|
Thu 11:00 |
Enhancing Multiple Dimensions of Trustworthiness in LLMs via Sparse Activation Control Yuxin Xiao · Wan Chaoqun · Yonggang Zhang · Wenxiao Wang · Binbin Lin · Xiaofei He · Xu Shen · Jieping Ye |
|
Poster
|
Thu 11:00 |
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically Anay Mehrotra · Manolis Zampetakis · Paul Kassianik · Blaine Nelson · Hyrum Anderson · Yaron Singer · Amin Karbasi |
|
Poster
|
Fri 11:00 |
Kernel Language Entropy: Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities Alexander Nikitin · Jannik Kossen · Yarin Gal · Pekka Marttinen |
|
Workshop
|
Sat 12:00 |
Weak-to-Strong Confidence Prediction Yukai Yang · Tracy Zhu · Marco Morucci · Tim G. J. Rudner |
|
Poster
|
Wed 16:30 |
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates Kaifeng Lyu · Haoyu Zhao · Xinran Gu · Dingli Yu · Anirudh Goyal · Sanjeev Arora |
|
Poster
|
Thu 16:30 |
PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations Jiatong Li · Renjun Hu · Kunzhe Huang · Yan Zhuang · Qi Liu · Mengxiao Zhu · Xing Shi · Wei Lin |
|
Poster
|
Fri 11:00 |
MoGU: A Framework for Enhancing Safety of LLMs While Preserving Their Usability YANRUI DU · Sendong Zhao · Danyang Zhao · Ming Ma · Yuhan Chen · Liangyu Huo · Qing Yang · Dongliang Xu · Bing Qin |
|
Poster
|
Thu 11:00 |
Protecting Your LLMs with Information Bottleneck Zichuan Liu · Zefan Wang · Linjie Xu · Jinyu Wang · Lei Song · Tianchun Wang · Chunlin Chen · Wei Cheng · Jiang Bian |
|
Poster
|
Wed 16:30 |
On scalable oversight with weak LLMs judging strong LLMs Zachary Kenton · Noah Siegel · Janos Kramar · Jonah Brown-Cohen · Samuel Albanie · Jannis Bulian · Rishabh Agarwal · David Lindner · Yunhao Tang · Noah Goodman · Rohin Shah |
|
Poster
|
Fri 11:00 |
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation jingnan zheng · Han Wang · An Zhang · Nguyen Duy Tai · Jun Sun · Tat-Seng Chua |