firstbacksecondback
340 Results
Workshop
|
Jailbreaking Large Language Models with Symbolic Mathematics Emet Bethany · Mazal Bethany · Juan Nolazco-Flores · Sumit Jha · peyman najafirad |
||
Workshop
|
iART - Imitation guided Automated Red Teaming Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Avisek Naug · Sahand Ghorbanpour · Ricardo Luna Gutierrez · Antonio Guillen-Perez · Paolo Faraboschi · Soumyendu Sarkar |
||
Workshop
|
Imitation Guided Automated Red Teaming Sajad Mousavi · Desik Rengarajan · Ashwin Ramesh Babu · Vineet Gundecha · Antonio Guillen-Perez · Ricardo Luna Gutierrez · Avisek Naug · Sahand Ghorbanpour · Soumyendu Sarkar |
||
Workshop
|
Report Cards: Qualitative Evaluation of LLMs Using Natural Language Summaries Blair Yang · Fuyang Cui · Keiran Paster · Jimmy Ba · Pashootan Vaezipoor · Silviu Pitis · Michael Zhang |
||
Workshop
|
Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback Marcus Williams · Micah Carroll · Constantin Weisser · Brendan Murphy · Adhyyan Narang · Anca Dragan |
||
Workshop
|
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompt Yusu Qian · Haotian Zhang · Yinfei Yang · Zhe Gan |
||
Workshop
|
Does Refusal Training in LLMs Generalize to the Past Tense? Maksym Andriushchenko · Nicolas Flammarion |
||
Workshop
|
Does Refusal Training in LLMs Generalize to the Past Tense? Maksym Andriushchenko · Nicolas Flammarion |
||
Workshop
|
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates Fengqing Jiang · Zhangchen Xu · Luyao Niu · Bill Yuchen Lin · Radha Poovendran |
||
Workshop
|
Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI Ramneet Kaur · Colin Samplawski · Adam Cobb · Anirban Roy · Brian Matejek · Manoj Acharya · Daniel Elenius · Alexander Berenbeim · John Pavlik · Nathaniel Bastian · Susmit Jha |
||
Workshop
|
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs Xuandong Zhao · Lei Li · Yu-Xiang Wang |
||
Workshop
|
Unlearning in- vs. out-of-distribution data in LLMs under gradient-based methods Teodora Baluta · Gintare Karolina Dziugaite · Pascal Lamblin · Fabian Pedregosa · Danny Tarlow |