Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

10 Results

<<   <   Page 1 of 1   >>   >
Poster
Fri 16:30 Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters
Haibo Jin · Andy Zhou · Joe Menke · Haohan Wang
Workshop
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo · Giandomenico Cornacchia · Kieran Fraser · Muhammad Zaid Hameed · Ambrish Rawat · Beat Buesser · Mark Purcell · Pin-Yu Chen · Prasanna Sattigeri · Kush Varshney
Workshop
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo · Giandomenico Cornacchia · Kieran Fraser · Muhammad Zaid Hameed · Ambrish Rawat · Beat Buesser · Mark Purcell · Pin-Yu Chen · Prasanna Sattigeri · Kush Varshney
Workshop
GuardFormer: Guardrail Instruction Pretraining for Efficient SafeGuarding
James O&#x27; Neill · Santhosh Subramanian · Eric Lin · Abishek Satish · Vaikkunth Mugunthan
Workshop
GuardFormer: Guardrail Instruction Pretraining for Efficient SafeGuarding
James O&#x27; Neill · Santhosh Subramanian · Eric Lin · Abishek Satish · Vaikkunth Mugunthan
Workshop
Sat 13:15 Keynote 3: Risk assessment, safety alignment, and guardrails for multimodal foundation models
Bo Li
Workshop
Sun 14:20 GuardFormer: Guardrail Instruction Pretraining for Efficient SafeGuarding
Workshop
Sun 14:40 Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Workshop
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Shaona Ghosh · Prasoon Varshney · Makesh Narsimhan Sreedhar · Aishwarya Padmakumar · Traian Rebedea · Jibin Varghese · Christopher Parisien
Workshop
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
Ambrish Rawat · Stefan Schoepf · Giulio Zizzo · Giandomenico Cornacchia · Muhammad Zaid Hameed · Kieran Fraser · Erik Miehling · Beat Buesser · Elizabeth Daly · Mark Purcell · Prasanna Sattigeri · Pin-Yu Chen · Kush Varshney