Red Teaming GenAI: What Can We Learn from Adversaries?
Valeriia Cherepanova · Bo Li · Niv Cohen · Yifei Wang · Yisen Wang · Avital Shafran · Nil-Jana Akpinar · James Zou
Abstract
The development and proliferation of modern generative AI models has introduced valuable capabilities, but these models and their applications also introduce risks to human safety. How do we identify risks in new systems before they cause harm during deployment? This workshop focuses on red teaming, an emergent adversarial approach to probing model behaviors, and its applications towards making modern generative AI safe for humans.
Video
Chat is not available.
Schedule
Timezone: America/Los_Angeles
|
9:00 AM
|
|
9:30 AM
|
|
|
|
10:10 AM
|
|
|
|
10:55 AM
|
|
|
|
11:20 AM
|
|
|
|
12:00 PM
|
|
1:50 PM
|
|
2:15 PM
|
|
3:00 PM
|
|
|
|
|
|
4:30 PM
|
|
|
|
|
|
|
|
5:20 PM
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Successful Page Load