Timezone: »
Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics. Even though we are able to increase accuracy on MATH, our results show that accuracy remains relatively low, even with enormous Transformer models. Moreover, we find that simply increasing budgets and model parameter counts will be impractical for achieving strong mathematical reasoning if scaling trends continue. While scaling Transformers is automatically solving most other text-based tasks, scaling is not currently solving MATH. To have more traction on mathematical problem solving we will likely need new algorithmic advancements from the broader research community.
Author Information
Dan Hendrycks (UC Berkeley)
Collin Burns (University of California Berkeley)
Saurav Kadavath (UC Berkeley)
Akul Arora (University of California, Berkeley)
Steven Basart (University of Chicago)
Eric Tang (University of California Berkeley)
Dawn Song (UC Berkeley)
Jacob Steinhardt (UC Berkeley)
More from the Same Authors
-
2021 : CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review »
Dan Hendrycks · Collin Burns · Anya Chen · Spencer Ball -
2021 Spotlight: Learning Equilibria in Matching Markets from Bandit Feedback »
Meena Jagadeesan · Alexander Wei · Yixin Wang · Michael Jordan · Jacob Steinhardt -
2021 : Measuring Coding Challenge Competence With APPS »
Dan Hendrycks · Steven Basart · Saurav Kadavath · Mantas Mazeika · Akul Arora · Ethan Guo · Collin Burns · Samir Puranik · Horace He · Dawn Song · Jacob Steinhardt -
2021 : PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures »
Dan Hendrycks · Andy Zou · Mantas Mazeika · Leonard Tang · Dawn Song · Jacob Steinhardt -
2021 : Effect of Model Size on Worst-group Generalization »
Alan Pham · Eunice Chan · Vikranth Srivatsa · Dhruba Ghosh · Yaoqing Yang · Yaodong Yu · Ruiqi Zhong · Joseph Gonzalez · Jacob Steinhardt -
2021 : The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models »
Alexander Pan · Kush Bhatia · Jacob Steinhardt -
2021 : What Would Jiminy Cricket Do? Towards Agents That Behave Morally »
Dan Hendrycks · Mantas Mazeika · Andy Zou · Sahil Patel · Christine Zhu · Jesus Navarro · Dawn Song · Bo Li · Jacob Steinhardt -
2022 : Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations »
Yongyi Yang · Jacob Steinhardt · Wei Hu -
2022 : Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small »
Kevin Wang · Alexandre Variengien · Arthur Conmy · Buck Shlegeris · Jacob Steinhardt -
2023 Poster: Jailbroken: How Does LLM Safety Training Fail? »
Alexander Wei · Nika Haghtalab · Jacob Steinhardt -
2023 Poster: Supply-Side Equilibria in Recommender Systems »
Meena Jagadeesan · Nikhil Garg · Jacob Steinhardt -
2023 Poster: Mass-Producing Failures of Multimodal Models »
Shengbang Tong · Erik Jones · Jacob Steinhardt -
2023 Poster: Goal Driven Discovery of Distributional Differences via Language Descriptions »
Ruiqi Zhong · Peter Zhang · Steve Li · Jinwoo Ahn · Dan Klein · Jacob Steinhardt -
2023 Poster: Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition »
Meena Jagadeesan · Michael Jordan · Jacob Steinhardt · Nika Haghtalab -
2023 Poster: DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models »
Boxin Wang · Weixin Chen · Hengzhi Pei · Chulin Xie · Mintong Kang · Chenhui Zhang · Chejian Xu · Zidi Xiong · Ritik Dutta · Rylan Schaeffer · Sang Truong · Simran Arora · Mantas Mazeika · Dan Hendrycks · Zinan Lin · Yu Cheng · Sanmi Koyejo · Dawn Song · Bo Li -
2023 Oral: DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models »
Boxin Wang · Weixin Chen · Hengzhi Pei · Chulin Xie · Mintong Kang · Chenhui Zhang · Chejian Xu · Zidi Xiong · Ritik Dutta · Rylan Schaeffer · Sang Truong · Simran Arora · Mantas Mazeika · Dan Hendrycks · Zinan Lin · Yu Cheng · Sanmi Koyejo · Dawn Song · Bo Li -
2023 Oral: Jailbroken: How Does LLM Safety Training Fail? »
Alexander Wei · Nika Haghtalab · Jacob Steinhardt -
2023 Competition: TDC 2023 (LLM Edition): The Trojan Detection Challenge »
Mantas Mazeika · Andy Zou · Norman Mu · Long Phan · Zifan Wang · Chunru Yu · Adam Khoja · Fengqing Jiang · Aidan O'Gara · Zhen Xiang · Arezoo Rajabi · Dan Hendrycks · Radha Poovendran · Bo Li · David Forsyth -
2022 Workshop: Workshop on Machine Learning Safety »
Dan Hendrycks · Victoria Krakovna · Dawn Song · Jacob Steinhardt · Nicholas Carlini -
2022 Competition: The Trojan Detection Challenge »
Mantas Mazeika · Dan Hendrycks · Huichen Li · Xiaojun Xu · Andy Zou · Sidney Hough · Arezoo Rajabi · Dawn Song · Radha Poovendran · Bo Li · David Forsyth -
2022 : Dawn Song - Invited Talk »
Dawn Song -
2022 Workshop: Decentralization and Trustworthy Machine Learning in Web3: Methodologies, Platforms, and Applications »
Jian Lou · Zhiguang Wang · Chejian Xu · Bo Li · Dawn Song -
2022 Poster: How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios »
Mantas Mazeika · Eric Tang · Andy Zou · Steven Basart · Jun Shern Chan · Dawn Song · David Forsyth · Jacob Steinhardt · Dan Hendrycks -
2022 Poster: Capturing Failures of Large Language Models via Human Cognitive Biases »
Erik Jones · Jacob Steinhardt -
2022 Poster: Forecasting Future World Events With Neural Networks »
Andy Zou · Tristan Xiao · Ryan Jia · Joe Kwon · Mantas Mazeika · Richard Li · Dawn Song · Jacob Steinhardt · Owain Evans · Dan Hendrycks -
2022 Poster: OpenOOD: Benchmarking Generalized Out-of-Distribution Detection »
Jingkang Yang · Pengyun Wang · Dejian Zou · Zitang Zhou · Kunyuan Ding · WENXUAN PENG · Haoqi Wang · Guangyao Chen · Bo Li · Yiyou Sun · Xuefeng Du · Kaiyang Zhou · Wayne Zhang · Dan Hendrycks · Yixuan Li · Ziwei Liu -
2021 : Live panel: Perspectives on ImageNet. »
Dawn Song · Ross Wightman · Dan Hendrycks -
2021 : Using ImageNet to Measure Robustness and Uncertainty »
Dawn Song · Dan Hendrycks -
2021 Poster: Grounding Representation Similarity Through Statistical Testing »
Frances Ding · Jean-Stanislas Denain · Jacob Steinhardt -
2021 Poster: Latent Execution for Neural Program Synthesis Beyond Domain-Specific Languages »
Xinyun Chen · Dawn Song · Yuandong Tian -
2021 : VisDA21: Visual Domain Adaptation + Q&A »
Kate Saenko · Kuniaki Saito · Donghyun Kim · Samarth Mishra · Ben Usman · Piotr Teterwak · Dina Bashkirova · Dan Hendrycks -
2021 Poster: Learning Equilibria in Matching Markets from Bandit Feedback »
Meena Jagadeesan · Alexander Wei · Yixin Wang · Michael Jordan · Jacob Steinhardt -
2021 Poster: Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams »
Chawin Sitawarin · Evgenios Kornaropoulos · Dawn Song · David Wagner -
2020 Poster: Synthesize, Execute and Debug: Learning to Repair for Neural Program Synthesis »
Kavi Gupta · Peter Ebert Christensen · Xinyun Chen · Dawn Song -
2020 Poster: Compositional Generalization via Neural-Symbolic Stack Machines »
Xinyun Chen · Chen Liang · Adams Wei Yu · Dawn Song · Denny Zhou -
2019 : TBD »
Dawn Song -
2019 Poster: Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty »
Dan Hendrycks · Mantas Mazeika · Saurav Kadavath · Dawn Song -
2018 Workshop: Workshop on Security in Machine Learning »
Nicolas Papernot · Jacob Steinhardt · Matt Fredrikson · Kamalika Chaudhuri · Florian Tramer -
2018 Poster: Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise »
Dan Hendrycks · Mantas Mazeika · Duncan Wilson · Kevin Gimpel -
2018 Poster: Semidefinite relaxations for certifying robustness to adversarial examples »
Aditi Raghunathan · Jacob Steinhardt · Percy Liang -
2018 Poster: Tree-to-tree Neural Networks for Program Translation »
Xinyun Chen · Chang Liu · Dawn Song -
2017 Workshop: Aligned Artificial Intelligence »
Dylan Hadfield-Menell · Jacob Steinhardt · David Duvenaud · David Krueger · Anca Dragan -
2017 : Panel »
Garth Gibson · Joseph Gonzalez · John Langford · Dawn Song -
2017 Workshop: Machine Learning and Computer Security »
Jacob Steinhardt · Nicolas Papernot · Bo Li · Chang Liu · Percy Liang · Dawn Song -
2017 Poster: Certified Defenses for Data Poisoning Attacks »
Jacob Steinhardt · Pang Wei Koh · Percy Liang -
2016 : Opening Remarks »
Jacob Steinhardt -
2016 Workshop: Reliable Machine Learning in the Wild »
Dylan Hadfield-Menell · Adrian Weller · David Duvenaud · Jacob Steinhardt · Percy Liang -
2015 Poster: Learning with Relaxed Supervision »
Jacob Steinhardt · Percy Liang -
2009 Poster: Tracking Dynamic Sources of Malicious Activity at Internet Scale »
Shobha Venkataraman · Avrim Blum · Dawn Song · Subhabrata Sen · Oliver Spatscheck -
2009 Spotlight: Tracking Dynamic Sources of Malicious Activity at Internet Scale »
Shobha Venkataraman · Avrim Blum · Dawn Song · Subhabrata Sen · Oliver Spatscheck