Skip to yearly menu bar Skip to main content


Search All 2022 Events
 

16 Results

<<   <   Page 1 of 2   >   >>
Workshop
Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety
Allen Schmaltz · Danielle Rasooly
Workshop
Fri 12:30 Sam Bowman: What's the deal with AI safety?
Samuel Bowman
Poster
Tue 14:00 Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu · Chenyan Jia · Ge Zhang · Ziyu Zhuang · Tony Liu · Soroush Vosoughi
Workshop
Fri 7:00 Workshop on Machine Learning Safety
Dan Hendrycks · Victoria Krakovna · Dawn Song · Jacob Steinhardt · Nicholas Carlini
Workshop
Red-Teaming the Stable Diffusion Safety Filter
Javier Rando · Daniel Paleka · David Lindner · Lennart Heim · Florian Tramer
Poster
Wed 9:00 Capturing Failures of Large Language Models via Human Cognitive Biases
Erik Jones · Jacob Steinhardt
Poster
Wed 9:00 On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach
Dennis Wei · Rahul Nair · Amit Dhurandhar · Kush Varshney · Elizabeth Daly · Moninder Singh
Competition
Thu 13:00 The Trojan Detection Challenge
Mantas Mazeika · Dan Hendrycks · Huichen Li · Xiaojun Xu · Andy Zou · Sidney Hough · Arezoo Rajabi · Dawn Song · Radha Poovendran · Bo Li · David Forsyth
Poster
Thu 9:00 Defining and Characterizing Reward Gaming
Joar Skalse · Nikolaus Howe · Dmitrii Krasheninnikov · David Krueger
Workshop
Fri 7:25 Beyond Safety: Toward a Value-Sensitive Approach to the Design of AI Systems. Alexander J. Fiannaca, Cynthia L. Bennett, Shaun Kane, Meredith Ringel Morris
Workshop
Indexing AI Risks with Incidents, Issues, and Variants
Sean McGregor · Kevin Paeth · Khoa Lam
Poster
MExMI: Pool-based Active Model Extraction Crossover Membership Inference
Yaxin Xiao · Qingqing Ye · Haibo Hu · Huadi Zheng · Chengfang Fang · Jie Shi