Timezone: »
The unbiased learning to rank (ULTR) problem has been greatly advanced by recent deep learning techniques and well-designed debias algorithms. However, promising results on the existing benchmark datasets may not be extended to the practical scenario due to some limitations of existing datasets. First, their semantic feature extractions are outdated while state-of-the-art large-scale pre-trained language models like BERT cannot be utilized due to the lack of original text. Second, display features are incomplete; thus in-depth study on ULTR is impossible such as the displayed abstract for analyzing the click necessary bias. Third, synthetic user feedback has been adopted by most existing datasets and real-world user feedback is greatly missing. To overcome these disadvantages, we introduce the Baidu-ULTR dataset. It involves randomly sampled 1.2 billion searching sessions and 7,008 expert annotated queries(397,572 query document pairs). Baidu-ULTR is the first billion-level dataset for ULTR. Particularly, it offers: (1)the original semantic features and pre-trained language models of different sizes; (2)sufficient display information such as position, displayed height, and displayed abstract, enabling the comprehensive study of multiple displayed biases; and (3)rich user feedback on search result pages (SERPs) like dwelling time, allowing for user engagement optimization and promoting the exploration of multi-task learning in ULTR. Furthermore, we present the design principle of Baidu-ULTR and the performance of representative ULTR algorithms on Baidu-ULTR. The Baidu-ULTR dataset and corresponding baseline implementations are available at https://github.com/ChuXiaokai/baiduultrdataset. The dataset homepage is available at https://searchscience.baidu.com/dataset.html.
Author Information
Lixin Zou (School of Cyber Science and Engineering, Wuhan University)
Haitao Mao (Michigan State University)
Xiaokai Chu (Tencent Inc.)
I am a Senior Algorithm Engineer of Tencent Inc. Before that, I received the B.Sc. degree in University of Science and Technology of China (USTC) in June 2016. I obtained my Ph.D. degree in June 2022, under the supervision of Prof. Jingping Bi at Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)
Jiliang Tang (Michigan State University)
Wenwen Ye
Shuaiqiang Wang (Baidu Inc.)
Dawei Yin (jd)
More from the Same Authors
-
2022 Poster: pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models »
Zitao Liu · Qiongqiong Liu · Jiahao Chen · Shuyan Huang · Jiliang Tang · Weiqi Luo -
2022 : Bi-channel Masked Graph Autoencoders for Spatially Resolved Single-cell Transcriptomics Data Imputation »
Hongzhi Wen · Wei Jin · Jiayuan Ding · Christopher Xu · Yuying Xie · Jiliang Tang -
2022 : Graph Neural Networks for Multimodal Single-Cell Data Integration »
Hongzhi Wen · Jiayuan Ding · Wei Jin · Yiqi Wang · Yuying Xie · Jiliang Tang -
2022 : Condensing Graphs via One-Step Gradient Matching »
Wei Jin · Xianfeng Tang · Haoming Jiang · Zheng Li · Danqing Zhang · Jiliang Tang · Bing Yin -
2022 : A Simple Framework for Active Learning to Rank »
Qingzhong Wang · Haifang Li · Haoyi Xiong · Wen Wang · Jiang Bian · Yu Lu · Shuaiqiang Wang · zhicong cheng · Dawei Yin · Dejing Dou -
2022 Spotlight: pyKT: A Python Library to Benchmark Deep Learning based Knowledge Tracing Models »
Zitao Liu · Qiongqiong Liu · Jiahao Chen · Shuyan Huang · Jiliang Tang · Weiqi Luo -
2022 Poster: Neuron with Steady Response Leads to Better Generalization »
Qiang Fu · Lun Du · Haitao Mao · Xu Chen · Wei Fang · Shi Han · Dongmei Zhang -
2021 Poster: Graph Neural Networks with Adaptive Residual »
Xiaorui Liu · Jiayuan Ding · Wei Jin · Han Xu · Yao Ma · Zitao Liu · Jiliang Tang