firstbacksecondback
17 Results
Workshop
|
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty Simulations Nathalie Kirch · Konstantin Hebenstreit · Matthias Samwald |
||
Workshop
|
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs Giulio Zizzo · Giandomenico Cornacchia · Kieran Fraser · Muhammad Zaid Hameed · Ambrish Rawat · Beat Buesser · Mark Purcell · Pin-Yu Chen · Prasanna Sattigeri · Kush Varshney |
||
Workshop
|
Decoding Biases: An Analysis of Automated Methods and Metrics for Gender Bias Detection in Language Models Shachi H. Kumar · Saurav Sahay · Sahisnu Mazumder · Eda Okur · Ramesh Manuvinakurike · Nicole Beckage · Hsuan Su · Hung-yi Lee · Lama Nachman |
||
Workshop
|
Lexically-constrained automated prompt augmentation: A case study using adversarial T2I data Jessica Quaye · Alicia Parrish · Oana Inel · Minsuk Kahng · Charvi Rastogi · Hannah Rose Kirk · Jess Tsang · Nathan Clement · Rafael Mosquera-Gomez · Juan Ciro · Vijay Janapa Reddi · Lora Aroyo |
||
Poster
|
Wed 11:00 |
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts Mikayel Samvelyan · Sharath Chandra Raparthy · Andrei Lupu · Eric Hambro · Aram Markosyan · Manish Bhatt · Yuning Mao · Minqi Jiang · Jack Parker-Holder · Jakob Foerster · Tim Rocktäschel · Roberta Raileanu |