firstbacksecondback
10 Results
Poster
|
Fri 16:30 |
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters Haibo Jin · Andy Zhou · Joe Menke · Haohan Wang |
|
Workshop
|
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs Giulio Zizzo · Giandomenico Cornacchia · Kieran Fraser · Muhammad Zaid Hameed · Ambrish Rawat · Beat Buesser · Mark Purcell · Pin-Yu Chen · Prasanna Sattigeri · Kush Varshney |
||
Workshop
|
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs Giulio Zizzo · Giandomenico Cornacchia · Kieran Fraser · Muhammad Zaid Hameed · Ambrish Rawat · Beat Buesser · Mark Purcell · Pin-Yu Chen · Prasanna Sattigeri · Kush Varshney |
||
Workshop
|
GuardFormer: Guardrail Instruction Pretraining for Efficient SafeGuarding James O' Neill · Santhosh Subramanian · Eric Lin · Abishek Satish · Vaikkunth Mugunthan |
||
Workshop
|
GuardFormer: Guardrail Instruction Pretraining for Efficient SafeGuarding James O' Neill · Santhosh Subramanian · Eric Lin · Abishek Satish · Vaikkunth Mugunthan |
||
Workshop
|
Sat 13:15 |
Keynote 3: Risk assessment, safety alignment, and guardrails for multimodal foundation models Bo Li |
|
Workshop
|
Sun 14:20 |
GuardFormer: Guardrail Instruction Pretraining for Efficient SafeGuarding |
|
Workshop
|
Sun 14:40 |
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs |
|
Workshop
|
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails Shaona Ghosh · Prasoon Varshney · Makesh Narsimhan Sreedhar · Aishwarya Padmakumar · Traian Rebedea · Jibin Varghese · Christopher Parisien |
||
Workshop
|
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI Ambrish Rawat · Stefan Schoepf · Giulio Zizzo · Giandomenico Cornacchia · Muhammad Zaid Hameed · Kieran Fraser · Erik Miehling · Beat Buesser · Elizabeth Daly · Mark Purcell · Prasanna Sattigeri · Pin-Yu Chen · Kush Varshney |