Search All 2022 Events

16 Results

<<   <   Page 1 of 2   >   >>
Introspection, Updatability, and Uncertainty Quantification with Transformers: Concrete Methods for AI Safety
Allen Schmaltz · Danielle Rasooly
Fri 12:30 Sam Bowman: What's the deal with AI safety?
Samuel Bowman
Tue 14:00 Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu · Chenyan Jia · Ge Zhang · Ziyu Zhuang · Tony Liu · Soroush Vosoughi
Fri 7:00 Workshop on Machine Learning Safety
Dan Hendrycks · Victoria Krakovna · Dawn Song · Jacob Steinhardt · Nicholas Carlini
Red-Teaming the Stable Diffusion Safety Filter
Javier Rando · Daniel Paleka · David Lindner · Lennart Heim · Florian Tramer
Wed 9:00 Capturing Failures of Large Language Models via Human Cognitive Biases
Erik Jones · Jacob Steinhardt
Wed 9:00 On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach
Dennis Wei · Rahul Nair · Amit Dhurandhar · Kush Varshney · Elizabeth Daly · Moninder Singh
Thu 13:00 The Trojan Detection Challenge
Mantas Mazeika · Dan Hendrycks · Huichen Li · Xiaojun Xu · Andy Zou · Sidney Hough · Arezoo Rajabi · Dawn Song · Radha Poovendran · Bo Li · David Forsyth
Fri 7:25 Beyond Safety: Toward a Value-Sensitive Approach to the Design of AI Systems. Alexander J. Fiannaca, Cynthia L. Bennett, Shaun Kane, Meredith Ringel Morris
Indexing AI Risks with Incidents, Issues, and Variants
Sean McGregor · Kevin Paeth · Khoa Lam
Thu 9:00 Defining and Characterizing Reward Gaming
Joar Skalse · Nikolaus Howe · Dmitrii Krasheninnikov · David Krueger
MExMI: Pool-based Active Model Extraction Crossover Membership Inference
Yaxin Xiao · Qingqing Ye · Haibo Hu · Huadi Zheng · Chengfang Fang · Jie Shi