Timezone: »
Opening Remarks
Jacob Steinhardt
Author Information
Jacob Steinhardt (UC Berkeley)
More from the Same Authors
-
2021 Spotlight: Learning Equilibria in Matching Markets from Bandit Feedback »
Meena Jagadeesan · Alexander Wei · Yixin Wang · Michael Jordan · Jacob Steinhardt -
2021 : Measuring Coding Challenge Competence With APPS »
Dan Hendrycks · Steven Basart · Saurav Kadavath · Mantas Mazeika · Akul Arora · Ethan Guo · Collin Burns · Samir Puranik · Horace He · Dawn Song · Jacob Steinhardt -
2021 : PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures »
Dan Hendrycks · Andy Zou · Mantas Mazeika · Leonard Tang · Dawn Song · Jacob Steinhardt -
2021 : Effect of Model Size on Worst-group Generalization »
Alan Pham · Eunice Chan · Vikranth Srivatsa · Dhruba Ghosh · Yaoqing Yang · Yaodong Yu · Ruiqi Zhong · Joseph Gonzalez · Jacob Steinhardt -
2021 : The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models »
Alexander Pan · Kush Bhatia · Jacob Steinhardt -
2021 : What Would Jiminy Cricket Do? Towards Agents That Behave Morally »
Dan Hendrycks · Mantas Mazeika · Andy Zou · Sahil Patel · Christine Zhu · Jesus Navarro · Dawn Song · Bo Li · Jacob Steinhardt -
2021 : Measuring Mathematical Problem Solving With the MATH Dataset »
Dan Hendrycks · Collin Burns · Saurav Kadavath · Akul Arora · Steven Basart · Eric Tang · Dawn Song · Jacob Steinhardt -
2022 : Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations »
Yongyi Yang · Jacob Steinhardt · Wei Hu -
2022 : Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small »
Kevin Wang · Alexandre Variengien · Arthur Conmy · Buck Shlegeris · Jacob Steinhardt -
2023 Poster: Jailbroken: How Does LLM Safety Training Fail? »
Alexander Wei · Nika Haghtalab · Jacob Steinhardt -
2023 Poster: Supply-Side Equilibria in Recommender Systems »
Meena Jagadeesan · Nikhil Garg · Jacob Steinhardt -
2023 Poster: Mass-Producing Failures of Multimodal Models »
Shengbang Tong · Erik Jones · Jacob Steinhardt -
2023 Poster: Goal Driven Discovery of Distributional Differences via Language Descriptions »
Ruiqi Zhong · Peter Zhang · Steve Li · Jinwoo Ahn · Dan Klein · Jacob Steinhardt -
2023 Poster: Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition »
Meena Jagadeesan · Michael Jordan · Jacob Steinhardt · Nika Haghtalab -
2023 Oral: Jailbroken: How Does LLM Safety Training Fail? »
Alexander Wei · Nika Haghtalab · Jacob Steinhardt -
2022 Workshop: Workshop on Machine Learning Safety »
Dan Hendrycks · Victoria Krakovna · Dawn Song · Jacob Steinhardt · Nicholas Carlini -
2022 Poster: How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios »
Mantas Mazeika · Eric Tang · Andy Zou · Steven Basart · Jun Shern Chan · Dawn Song · David Forsyth · Jacob Steinhardt · Dan Hendrycks -
2022 Poster: Capturing Failures of Large Language Models via Human Cognitive Biases »
Erik Jones · Jacob Steinhardt -
2022 Poster: Forecasting Future World Events With Neural Networks »
Andy Zou · Tristan Xiao · Ryan Jia · Joe Kwon · Mantas Mazeika · Richard Li · Dawn Song · Jacob Steinhardt · Owain Evans · Dan Hendrycks -
2021 Poster: Grounding Representation Similarity Through Statistical Testing »
Frances Ding · Jean-Stanislas Denain · Jacob Steinhardt -
2021 Poster: Learning Equilibria in Matching Markets from Bandit Feedback »
Meena Jagadeesan · Alexander Wei · Yixin Wang · Michael Jordan · Jacob Steinhardt -
2018 Workshop: Workshop on Security in Machine Learning »
Nicolas Papernot · Jacob Steinhardt · Matt Fredrikson · Kamalika Chaudhuri · Florian Tramer -
2018 Poster: Semidefinite relaxations for certifying robustness to adversarial examples »
Aditi Raghunathan · Jacob Steinhardt · Percy Liang -
2017 Workshop: Aligned Artificial Intelligence »
Dylan Hadfield-Menell · Jacob Steinhardt · David Duvenaud · David Krueger · Anca Dragan -
2017 Workshop: Machine Learning and Computer Security »
Jacob Steinhardt · Nicolas Papernot · Bo Li · Chang Liu · Percy Liang · Dawn Song -
2017 Poster: Certified Defenses for Data Poisoning Attacks »
Jacob Steinhardt · Pang Wei Koh · Percy Liang -
2016 Workshop: Reliable Machine Learning in the Wild »
Dylan Hadfield-Menell · Adrian Weller · David Duvenaud · Jacob Steinhardt · Percy Liang -
2015 Poster: Learning with Relaxed Supervision »
Jacob Steinhardt · Percy Liang