Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

340 Results

<<   <   Page 28 of 29   >   >>
Workshop
Jailbreaking Large Language Models with Symbolic Mathematics
Emet Bethany · Mazal Bethany · Juan Nolazco-Flores · Sumit Jha · peyman najafirad
Workshop
iART - Imitation guided Automated Red Teaming
Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Avisek Naug · Sahand Ghorbanpour · Ricardo Luna Gutierrez · Antonio Guillen-Perez · Paolo Faraboschi · Soumyendu Sarkar
Workshop
Imitation Guided Automated Red Teaming
Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Antonio Guillen-Perez · Ricardo Luna Gutierrez · Avisek Naug · Sahand Ghorbanpour · Soumyendu Sarkar
Workshop
Report Cards: Qualitative Evaluation of LLMs Using Natural Language Summaries
Blair Yang · Fuyang Cui · Keiran Paster · Jimmy Ba · Pashootan Vaezipoor · Silviu Pitis · Michael Zhang
Workshop
Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback
Marcus Williams · Micah Carroll · Constantin Weisser · Brendan Murphy · Adhyyan Narang · Anca Dragan
Workshop
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompt
Yusu Qian · Haotian Zhang · Yinfei Yang · Zhe Gan
Workshop
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko · Nicolas Flammarion
Workshop
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko · Nicolas Flammarion
Workshop
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates
Fengqing Jiang · Zhangchen Xu · Luyao Niu · Bill Yuchen Lin · Radha Poovendran
Workshop
Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI
Ramneet Kaur · Colin Samplawski · Adam Cobb · Anirban Roy · Brian Matejek · Manoj Acharya · Daniel Elenius · Alexander Berenbeim · John Pavlik · Nathaniel Bastian · Susmit Jha
Workshop
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
Xuandong Zhao · Lei Li · Yu-Xiang Wang
Workshop
Unlearning in- vs. out-of-distribution data in LLMs under gradient-based methods
Teodora Baluta · Gintare Karolina Dziugaite · Pascal Lamblin · Fabian Pedregosa · Danny Tarlow