firstbacksecondback
340 Results
Poster
|
Thu 11:00 |
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs Zhao Xu · Fan LIU · Hao Liu |
|
Poster
|
Thu 11:00 |
When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models Yinghui Li · Qingyu Zhou · Yuanzhen Luo · Shirong Ma · Yangning Li · Hai-Tao Zheng · Xuming Hu · Philip S Yu |
|
Poster
|
Thu 16:30 |
PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations Jiatong Li · Renjun Hu · Kunzhe Huang · Yan Zhuang · Qi Liu · Mengxiao Zhu · Xing Shi · Wei Lin |
|
Poster
|
Thu 16:30 |
NYU CTF Bench: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security Minghao Shao · Sofija Jancheska · Meet Udeshi · Brendan Dolan-Gavitt · haoran xi · Kimberly Milner · Boyuan Chen · Max Yin · Siddharth Garg · Prashanth Krishnamurthy · Farshad Khorrami · Ramesh Karri · Muhammad Shafique |
|
Poster
|
Wed 11:00 |
Can LLMs Solve Molecule Puzzles? A Multimodal Benchmark for Molecular Structure Elucidation Kehan Guo · Bozhao Nan · Yujun Zhou · Taicheng Guo · Zhichun Guo · Mihir Surve · Zhenwen Liang · Nitesh Chawla · Olaf Wiest · Xiangliang Zhang |
|
Poster
|
Wed 16:30 |
Benchmarking LLMs via Uncertainty Quantification Fanghua Ye · Mingming Yang · Jianhui Pang · Longyue Wang · Derek Wong · Emine Yilmaz · Shuming Shi · Zhaopeng Tu |
|
Poster
|
Fri 16:30 |
Trace is the Next AutoDiff: Generative Optimization with Rich Feedback, Execution Traces, and LLMs Ching-An Cheng · Allen Nie · Adith Swaminathan |
|
Poster
|
Fri 11:00 |
UniTox: Leveraging LLMs to Curate a Unified Dataset of Drug-Induced Toxicity from FDA Labels Jacob Silberg · Kyle Swanson · Elana Simon · Angela Zhang · Zaniar Ghazizadeh · Scott Ogden · Hisham Hamadeh · James Zou |
|
Poster
|
Fri 11:00 |
StackEval: Benchmarking LLMs in Coding Assistance Nidhish Shah · Zulkuf Genc · Dogu Araci |
|
Poster
|
Wed 16:30 |
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content Joao Monteiro · Pierre-André Noël · Étienne Marcotte · Sai Rajeswar Mudumba · Valentina Zantedeschi · David Vazquez · Nicolas Chapados · Chris Pal · Perouz Taslakian |
|
Poster
|
Fri 16:30 |
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs Rudolf Laine · Bilal Chughtai · Jan Betley · Kaivalya Hariharan · Mikita Balesni · Jérémy Scheurer · Marius Hobbhahn · Alexander Meinke · Owain Evans |
|
Poster
|
Fri 16:30 |
QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation Zhuo Chen · Rumen Dangovski · Charlotte Loh · Owen Dugan · Di Luo · Marin Soljacic |