Timezone: »
We present Native Chinese Reader (NCR), a new machine reading comprehension MRC) dataset with particularly long articles in both modern and classical Chinese. NCR is collected from the exam questions for the Chinese course in China’s high schools, which are designed to evaluate the language proficiency of native Chinese youth. Existing Chinese MRC datasets are either domain-specific or focusing on short contexts of a few hundred characters in modern Chinese only. By contrast, NCR contains 8390 documents with an average length of 1024 characters covering a wide range of Chinese writing styles, including modern articles, classical literature and classical poetry. A total of 20477 questions on these documents also require strong reasoning abilities and common sense to figure out the correct answers. We implemented multiple baseline models using popular Chinese pre-trained models and additionally launched an online competition using our dataset to examine the limit of current methods. The best model achieves 59% test accuracy while human evaluation shows an average accuracy of 79%, which indicates a significant performance gap between current MRC models and native Chinese speakers.
Author Information
Shusheng Xu (IIIS, Tsinghua University)
Yichen Liu (New York University)
Xiaoyu Yi (Department of Electronics and Communications Engineering, Shenzhen University)
Siyuan Zhou (Peking University)
Huizi Li (no)
Yi Wu (OpenAI)
More from the Same Authors
-
2021 : Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization »
Zihan Zhou · Wei Fu · Bingliang Zhang · Yi Wu -
2021 : Learning Efficient Multi-Agent Cooperative Visual Exploration »
Chao Yu · Jiaxuan Gao · Huazhong Yang · Yu Wang · Yi Wu -
2021 : Maximum Entropy Population Based Training for Zero-Shot Human-AI Coordination »
Rui Zhao · Jinming Song · Hu Haifeng · Yang Gao · Yi Wu · Zhongqian Sun · Wei Yang -
2022 Poster: Grounded Reinforcement Learning: Learning to Win the Game under Human Commands »
Shusheng Xu · Huaijie Wang · YI WU -
2021 Poster: Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems »
Jiayu Chen · Yuanxin Zhang · Yuanfan Xu · Huimin Ma · Huazhong Yang · Jiaming Song · Yu Wang · Yi Wu -
2021 Poster: NovelD: A Simple yet Effective Exploration Criterion »
Tianjun Zhang · Huazhe Xu · Xiaolong Wang · Yi Wu · Kurt Keutzer · Joseph Gonzalez · Yuandong Tian