firstbacksecondback
14 Results
Poster
|
Thu 16:30 |
WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs Seungju Han · Kavel Rao · Allyson Ettinger · Liwei Jiang · Bill Yuchen Lin · Nathan Lambert · Yejin Choi · Nouha Dziri |
|
Poster
|
Fri 11:00 |
Efficient multi-prompt evaluation of LLMs Felipe Maia Polo · Ronald Xu · Lucas Weber · Mírian Silva · Onkar Bhardwaj · Leshem Choshen · Allysson de Oliveira · Yuekai Sun · Mikhail Yurochkin |