Timezone: »
Time is an important dimension in our physical world. Lots of facts can evolve with respect to time. For example, the U.S. President might change every four years. Therefore, it is important to consider the time dimension and empower the existing QA models to reason over time. However, the existing QA datasets contain rather few time-sensitive questions, hence not suitable for diagnosing or benchmarking the model's temporal reasoning capability. In order to promote research in this direction, we propose to construct a time-sensitive QA dataset. The dataset is constructed by 1) mining time-evolving facts from WikiData and aligning them to their corresponding Wikipedia page, 2) employing crowd workers to verify and calibrate these noisy facts, 3) generating question-answer pairs based on the annotated time-sensitive facts. Our dataset poses challenges in the aspect of both temporal understanding and temporal reasoning. We evaluate different SoTA long-document QA systems like BigBird and FiD on our dataset. The best-performing model FiD can only achieve 46\% accuracy, still far behind the human performance of 87\%. We demonstrate that these models are still lacking the ability to perform consistent temporal reasoning. Therefore, we believe that our dataset could serve as a benchmark to develop NLP models more sensitive to temporal shifts.
Author Information
Wenhu Chen (University of California, Santa Barbara)
Xinyi Wang (UCSB)
William Yang Wang (University of California, Santa Barbara)
William Wang is the Co-Director of UC Santa Barbara's Natural Language Processing group and Center for Responsible Machine Learning. He is the Duncan and Suzanne Mellichamp Chair in Artificial Intelligence and Designs, and an Associate Professor in the Department of Computer Science at the University of California, Santa Barbara. He received his PhD from School of Computer Science, Carnegie Mellon University. He has broad interests in Artificial Intelligence, including statistical relational learning, information extraction, computational social science, dialog & generation, and vision. He has published more than 100 papers at leading NLP/AI/ML conferences and journals, and received best paper awards (or nominations) at ASRU 2013, CIKM 2013, EMNLP 2015, and CVPR 2019, a DARPA Young Faculty Award (Class of 2018), an IEEE AI's 10 to Watch Award (Class of 2020), an NSF CAREER Award (2021), two Google Faculty Research Awards (2018, 2019), three IBM Faculty Awards (2017-2019), two Facebook Research Awards (2018, 2019), an Amazon AWS Machine Learning Research Award, a JP Morgan Chase Faculty Research Award, an Adobe Research Award in 2018, and the Richard King Mellon Presidential Fellowship in 2011. He frequently serves as an Area Chair or Senior Area Chair for NAACL, ACL, EMNLP, and AAAI. He is an elected member of IEEE Speech and Language Processing Technical Committee (2021-2023) and a member of ACM Future of Computing Academy. In addition to research, William enjoys writing scientific articles that impact the broader online community. His work and opinions appear at major tech media outlets such as Wired, VICE, Scientific American, Fortune, Fast Company, NASDAQ, The Next Web, Law.com, and Mental Floss.
More from the Same Authors
-
2021 : VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation »
Linjie Li · Jie Lei · Zhe Gan · Licheng Yu · Yen-Chun Chen · Rohit Pillai · Yu Cheng · Luowei Zhou · Xin Wang · William Yang Wang · Tamara L Berg · Mohit Bansal · Jingjing Liu · Lijuan Wang · Zicheng Liu -
2022 : LAD: Language Augmented Diffusion for Reinforcement Learning »
Edwin Zhang · Yujie Lu · William Yang Wang · Amy Zhang -
2022 : Offline Reinforcement Learning with Closed-Form Policy Improvement Operators »
Jiachen Li · Edwin Zhang · Ming Yin · Qinxun Bai · Yu-Xiang Wang · William Yang Wang -
2022 : Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction »
Jiachen Li · Shuo Cheng · Zhenyu Liao · Huayan Wang · William Yang Wang · Qinxun Bai -
2023 Poster: Flexible Attention-Based Multi-Policy Fusion for Efficient Deep Reinforcement Learning »
Zih-Yun Chiu · Yi-Lin Tuan · William Yang Wang · Michael Yip -
2023 Poster: LayoutGPT: Compositional Visual Planning and Generation with Large Language Models »
Weixi Feng · Wanrong Zhu · Tsu-Jui Fu · Varun Jampani · Arjun Akula · Xuehai He · S Basu · Xin Wang · William Yang Wang -
2023 Poster: LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation »
Yujie Lu · Xianjun Yang · Xiujun Li · Xin Wang · William Yang Wang -
2023 Poster: ALGO: Synthesizing Algorithmic Programs with Generated Oracle Verifiers »
Kexun Zhang · Danqing Wang · Jingtao Xia · William Yang Wang · Lei Li -
2023 Poster: Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data »
Alon Albalak · Colin Raffel · William Yang Wang -
2023 Poster: Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning »
Xinyi Wang · Wanrong Zhu · Michael Saxon · Mark Steyvers · William Yang Wang -
2023 Poster: Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text »
Wanrong Zhu · Jack Hessel · Anas Awadalla · Samir Yitzhak Gadre · Jesse Dodge · Alex Fang · Youngjae Yu · Ludwig Schmidt · William Yang Wang · Yejin Choi -
2021 Poster: Local Explanation of Dialogue Response Generation »
Yi-Lin Tuan · Connor Pryor · Wenhu Chen · Lise Getoor · William Yang Wang -
2021 Poster: Counterfactual Maximum Likelihood Estimation for Training Deep Networks »
Xinyi Wang · Wenhu Chen · Michael Saxon · William Yang Wang -
2019 : Contributed Talk: TabFact: A Large-scale Dataset for Table-based Fact Verification »
Wenhu Chen