Timezone: »
Forecasting future world events is a challenging but valuable task. Forecasts of climate, geopolitical conflict, pandemics and economic indicators help shape policy and decision making. In these domains, the judgment of expert humans contributes to the best forecasts. Given advances in language modeling, can these forecasts be automated? To this end, we introduce Autocast, a dataset containing thousands of forecasting questions and an accompanying news corpus. Questions are taken from forecasting tournaments, ensuring high quality, real-world importance, and diversity. The news corpus is organized by date, allowing us to precisely simulate the conditions under which humans made past forecasts (avoiding leakage from the future). Motivated by the difficulty of forecasting numbers across orders of magnitude (e.g. global cases of COVID-19 in 2022), we also curate IntervalQA, a dataset of numerical questions and metrics for calibration. We test language models on our forecasting task and find that performance is far below a human expert baseline. However, performance improves with increased model size and incorporation of relevant information from the news corpus. In sum, Autocast poses a novel challenge for large language models and improved performance could bring large practical benefits.
Author Information
Andy Zou (CMU, Carnegie Mellon University)
Tristan Xiao (University of California, Berkeley)
Ryan Jia (Apple (current) / UC Berkeley (until May 2022))
Joe Kwon (Massachusetts Institute of Technology)
Mantas Mazeika (University of Illinois Urbana-Champaign)
Richard Li (University of California, Berkeley)
Dawn Song (UC Berkeley)
Jacob Steinhardt (UC Berkeley)
Owain Evans (University of Oxford)
Dan Hendrycks (Center for AI Safety)
More from the Same Authors
-
2021 : CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review »
Dan Hendrycks · Collin Burns · Anya Chen · Spencer Ball -
2021 Spotlight: Learning Equilibria in Matching Markets from Bandit Feedback »
Meena Jagadeesan · Alexander Wei · Yixin Wang · Michael Jordan · Jacob Steinhardt -
2021 : Measuring Coding Challenge Competence With APPS »
Dan Hendrycks · Steven Basart · Saurav Kadavath · Mantas Mazeika · Akul Arora · Ethan Guo · Collin Burns · Samir Puranik · Horace He · Dawn Song · Jacob Steinhardt -
2021 : PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures »
Dan Hendrycks · Andy Zou · Mantas Mazeika · Leonard Tang · Dawn Song · Jacob Steinhardt -
2021 : Effect of Model Size on Worst-group Generalization »
Alan Pham · Eunice Chan · Vikranth Srivatsa · Dhruba Ghosh · Yaoqing Yang · Yaodong Yu · Ruiqi Zhong · Joseph Gonzalez · Jacob Steinhardt -
2021 : The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models »
Alexander Pan · Kush Bhatia · Jacob Steinhardt -
2021 : What Would Jiminy Cricket Do? Towards Agents That Behave Morally »
Dan Hendrycks · Mantas Mazeika · Andy Zou · Sahil Patel · Christine Zhu · Jesus Navarro · Dawn Song · Bo Li · Jacob Steinhardt -
2021 : Measuring Mathematical Problem Solving With the MATH Dataset »
Dan Hendrycks · Collin Burns · Saurav Kadavath · Akul Arora · Steven Basart · Eric Tang · Dawn Song · Jacob Steinhardt -
2022 : Values Shape Optimizers Shape Values »
Joe Kwon -
2022 : DensePure: Understanding Diffusion Models towards Adversarial Robustness »
Zhongzhu Chen · Kun Jin · Jiongxiao Wang · Weili Nie · Mingyan Liu · Anima Anandkumar · Bo Li · Dawn Song -
2022 : Are Neurons Actually Collapsed? On the Fine-Grained Structure in Neural Representations »
Yongyi Yang · Jacob Steinhardt · Wei Hu -
2022 : Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small »
Kevin Wang · Alexandre Variengien · Arthur Conmy · Buck Shlegeris · Jacob Steinhardt -
2023 Competition: TDC 2023 (LLM Edition): The Trojan Detection Challenge »
Mantas Mazeika · Andy Zou · Norman Mu · Long Phan · Zifan Wang · Chunru Yu · Adam Khoja · Fengqing Jiang · Aidan O'Gara · Zhen Xiang · Arezoo Rajabi · Dan Hendrycks · Radha Poovendran · Bo Li · David Forsyth -
2022 : Contributed Talk: DensePure: Understanding Diffusion Models towards Adversarial Robustness »
Zhongzhu Chen · Kun Jin · Jiongxiao Wang · Weili Nie · Mingyan Liu · Anima Anandkumar · Bo Li · Dawn Song -
2022 Workshop: Workshop on Machine Learning Safety »
Dan Hendrycks · Victoria Krakovna · Dawn Song · Jacob Steinhardt · Nicholas Carlini -
2022 Competition: The Trojan Detection Challenge »
Mantas Mazeika · Dan Hendrycks · Huichen Li · Xiaojun Xu · Andy Zou · Sidney Hough · Arezoo Rajabi · Dawn Song · Radha Poovendran · Bo Li · David Forsyth -
2022 Panel: Panel 4C-1: How Would The… & SCAMPS: Synthetics for… »
Mantas Mazeika · Daniel McDuff -
2022 Poster: How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios »
Mantas Mazeika · Eric Tang · Andy Zou · Steven Basart · Jun Shern Chan · Dawn Song · David Forsyth · Jacob Steinhardt · Dan Hendrycks -
2022 Poster: Capturing Failures of Large Language Models via Human Cognitive Biases »
Erik Jones · Jacob Steinhardt -
2022 Poster: OpenOOD: Benchmarking Generalized Out-of-Distribution Detection »
Jingkang Yang · Pengyun Wang · Dejian Zou · Zitang Zhou · Kunyuan Ding · WENXUAN PENG · Haoqi Wang · Guangyao Chen · Bo Li · Yiyou Sun · Xuefeng Du · Kaiyang Zhou · Wayne Zhang · Dan Hendrycks · Yixuan Li · Ziwei Liu -
2021 : Live panel: Perspectives on ImageNet. »
Dawn Song · Ross Wightman · Dan Hendrycks -
2021 : Using ImageNet to Measure Robustness and Uncertainty »
Dawn Song · Dan Hendrycks -
2021 Poster: Grounding Representation Similarity Through Statistical Testing »
Frances Ding · Jean-Stanislas Denain · Jacob Steinhardt -
2021 : VisDA21: Visual Domain Adaptation + Q&A »
Kate Saenko · Kuniaki Saito · Donghyun Kim · Samarth Mishra · Ben Usman · Piotr Teterwak · Dina Bashkirova · Dan Hendrycks -
2021 Poster: Learning Equilibria in Matching Markets from Bandit Feedback »
Meena Jagadeesan · Alexander Wei · Yixin Wang · Michael Jordan · Jacob Steinhardt -
2019 Poster: Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty »
Dan Hendrycks · Mantas Mazeika · Saurav Kadavath · Dawn Song -
2018 Workshop: Workshop on Security in Machine Learning »
Nicolas Papernot · Jacob Steinhardt · Matt Fredrikson · Kamalika Chaudhuri · Florian Tramer -
2018 Poster: Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise »
Dan Hendrycks · Mantas Mazeika · Duncan Wilson · Kevin Gimpel -
2018 Poster: Semidefinite relaxations for certifying robustness to adversarial examples »
Aditi Raghunathan · Jacob Steinhardt · Percy Liang -
2017 : Machine Learning for Human Deliberative Judgment »
Owain Evans -
2017 Workshop: Aligned Artificial Intelligence »
Dylan Hadfield-Menell · Jacob Steinhardt · David Duvenaud · David Krueger · Anca Dragan -
2017 Workshop: Machine Learning and Computer Security »
Jacob Steinhardt · Nicolas Papernot · Bo Li · Chang Liu · Percy Liang · Dawn Song -
2017 Poster: Certified Defenses for Data Poisoning Attacks »
Jacob Steinhardt · Pang Wei Koh · Percy Liang -
2016 : Opening Remarks »
Jacob Steinhardt -
2016 Workshop: Reliable Machine Learning in the Wild »
Dylan Hadfield-Menell · Adrian Weller · David Duvenaud · Jacob Steinhardt · Percy Liang -
2016 Poster: Latent Attention For If-Then Program Synthesis »
Chang Liu · Xinyun Chen · Richard Shin · Mingcheng Chen · Dawn Song -
2015 Poster: Learning with Relaxed Supervision »
Jacob Steinhardt · Percy Liang