Timezone: »
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu · Daya Guo · Shuo Ren · Junjie Huang · Alexey Svyatkovskiy · Ambrosio Blanco · Colin Clement · Dawn Drain · Daxin Jiang · Duyu Tang · Ge Li · Lidong Zhou · Linjun Shou · Long Zhou · Michele Tufano · MING GONG · Ming Zhou · Nan Duan · Neel Sundaresan · Shao Kun Deng · Shengyu Fu · Shujie LIU
Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.
Author Information
Shuai Lu (Microsoft Research China)
Daya Guo (Sun Yat-Sen University)
Shuo Ren (Beihang University)
Junjie Huang (Beihang University)
Alexey Svyatkovskiy (Microsoft)
Ambrosio Blanco
Colin Clement (Microsoft)
Dawn Drain (Microsoft)
Daxin Jiang (Microsoft)
Duyu Tang (Microsoft Research)
Ge Li (Peking University)
Lidong Zhou (None)
Linjun Shou (Microsoft)
Long Zhou (Microsoft Research Asia)
Michele Tufano (Microsoft)
MING GONG (Microsoft)
Ming Zhou (Microsoft Research)
Nan Duan (Microsoft Research Asia)
Neel Sundaresan (Microsoft)
Shao Kun Deng (Microsoft)
Shengyu Fu
Shujie LIU (Microsoft)
More from the Same Authors
-
2022 Poster: Less-forgetting Multi-lingual Fine-tuning »
Yuren Mao · Yaobo Liang · Nan Duan · Haobo Wang · Kai Wang · Lu Chen · Yunjun Gao -
2023 Poster: ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation »
Chenyang Le · Yao Qian · Long Zhou · Shujie LIU · Yanmin Qian · Michael Zeng · Xuedong Huang -
2023 Poster: AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation »
Tong Wu · Zhihao Fan · Xiao Liu · Yeyun Gong · yelong shen · Jian Jiao · Hai-Tao Zheng · Juntao Li · zhongyu wei · Jian Guo · Nan Duan · Weizhu Chen -
2022 Spotlight: Lightning Talks 6A-4 »
Xiu-Shen Wei · Konstantina Dritsa · Guillaume Huguet · ABHRA CHAUDHURI · Zhenbin Wang · Kevin Qinghong Lin · Yutong Chen · Jianan Zhou · Yongsen Mao · Junwei Liang · Jinpeng Wang · Mao Ye · Yiming Zhang · Aikaterini Thoma · H.-Y. Xu · Daniel Sumner Magruder · Enwei Zhang · Jianing Zhu · Ronglai Zuo · Massimiliano Mancini · Hanxiao Jiang · Jun Zhang · Fangyun Wei · Faen Zhang · Ioannis Pavlopoulos · Zeynep Akata · Xiatian Zhu · Jingfeng ZHANG · Alexander Tong · Mattia Soldan · Chunhua Shen · Yuxin Peng · Liuhan Peng · Michael Wray · Tongliang Liu · Anjan Dutta · Yu Wu · Oluwadamilola Fasina · Panos Louridas · Angel Chang · Manik Kuchroo · Manolis Savva · Shujie LIU · Wei Zhou · Rui Yan · Gang Niu · Liang Tian · Bo Han · Eric Z. XU · Guy Wolf · Yingying Zhu · Brian Mak · Difei Gao · Masashi Sugiyama · Smita Krishnaswamy · Rong-Cheng Tu · Wenzhe Zhao · Weijie Kong · Chengfei Cai · WANG HongFa · Dima Damen · Bernard Ghanem · Wei Liu · Mike Zheng Shou -
2022 Spotlight: Two-Stream Network for Sign Language Recognition and Translation »
Yutong Chen · Ronglai Zuo · Fangyun Wei · Yu Wu · Shujie LIU · Brian Mak -
2022 Poster: Two-Stream Network for Sign Language Recognition and Translation »
Yutong Chen · Ronglai Zuo · Fangyun Wei · Yu Wu · Shujie LIU · Brian Mak -
2022 Poster: NUWA-Infinity: Autoregressive over Autoregressive Generation for Infinite Visual Synthesis »
Jian Liang · Chenfei Wu · Xiaowei Hu · Zhe Gan · Jianfeng Wang · Lijuan Wang · Zicheng Liu · Yuejian Fang · Nan Duan -
2022 Poster: Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively »
Haojie Zhang · Ge Li · Jia Li · Zhongjin Zhang · YUQI ZHU · Zhi Jin -
2022 Poster: LogiGAN: Learning Logical Reasoning via Adversarial Pre-training »
Xinyu Pi · Wanjun Zhong · Yan Gao · Nan Duan · Jian-Guang Lou -
2021 Poster: Integrating Tree Path in Transformer for Code Representation »
Han Peng · Ge Li · Wenhan Wang · YunFei Zhao · Zhi Jin -
2021 Poster: Neural Rule-Execution Tracking Machine For Transformer-Based Text Generation »
Yufei Wang · Can Xu · Huang Hu · Chongyang Tao · Stephen Wan · Mark Dras · Mark Johnson · Daxin Jiang -
2021 Poster: Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering »
Weijiang Yu · Haoteng Zheng · Mengfei Li · Lei Ji · Lijun Wu · Nong Xiao · Nan Duan -
2020 Poster: MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers »
Wenhui Wang · Furu Wei · Li Dong · Hangbo Bao · Nan Yang · Ming Zhou -
2019 Poster: Code Generation as a Dual Task of Code Summarization »
Bolin Wei · Ge Li · Xin Xia · Zhiyi Fu · Zhi Jin -
2019 Poster: A Tensorized Transformer for Language Modeling »
Xindian Ma · Peng Zhang · Shuai Zhang · Nan Duan · Yuexian Hou · Ming Zhou · Dawei Song -
2019 Poster: PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph »
Yikang LI · Tao Ma · Yeqi Bai · Nan Duan · Sining Wei · Xiaogang Wang -
2018 Poster: Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base »
Daya Guo · Duyu Tang · Nan Duan · Ming Zhou · Jian Yin