Timezone: »

A Large Scale Search Dataset for Unbiased Learning to Rank
Lixin Zou · Haitao Mao · Xiaokai Chu · Jiliang Tang · Wenwen Ye · Shuaiqiang Wang · Dawei Yin

Wed Nov 30 09:00 AM -- 11:00 AM (PST) @ Hall J #1013

The unbiased learning to rank (ULTR) problem has been greatly advanced by recent deep learning techniques and well-designed debias algorithms. However, promising results on the existing benchmark datasets may not be extended to the practical scenario due to some limitations of existing datasets. First, their semantic feature extractions are outdated while state-of-the-art large-scale pre-trained language models like BERT cannot be utilized due to the lack of original text. Second, display features are incomplete; thus in-depth study on ULTR is impossible such as the displayed abstract for analyzing the click necessary bias. Third, synthetic user feedback has been adopted by most existing datasets and real-world user feedback is greatly missing. To overcome these disadvantages, we introduce the Baidu-ULTR dataset. It involves randomly sampled 1.2 billion searching sessions and 7,008 expert annotated queries(397,572 query document pairs). Baidu-ULTR is the first billion-level dataset for ULTR. Particularly, it offers: (1)the original semantic features and pre-trained language models of different sizes; (2)sufficient display information such as position, displayed height, and displayed abstract, enabling the comprehensive study of multiple displayed biases; and (3)rich user feedback on search result pages (SERPs) like dwelling time, allowing for user engagement optimization and promoting the exploration of multi-task learning in ULTR. Furthermore, we present the design principle of Baidu-ULTR and the performance of representative ULTR algorithms on Baidu-ULTR. The Baidu-ULTR dataset and corresponding baseline implementations are available at https://github.com/ChuXiaokai/baiduultrdataset. The dataset homepage is available at https://searchscience.baidu.com/dataset.html.

Author Information

Lixin Zou (School of Cyber Science and Engineering, Wuhan University)
Haitao Mao (Michigan State University)
Xiaokai Chu (Tencent Inc.)

I am a Senior Algorithm Engineer of Tencent Inc. Before that, I received the B.Sc. degree in University of Science and Technology of China (USTC) in June 2016. I obtained my Ph.D. degree in June 2022, under the supervision of Prof. Jingping Bi at Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS)

Jiliang Tang (Michigan State University)
Wenwen Ye
Shuaiqiang Wang (Baidu Inc.)
Dawei Yin (jd)

More from the Same Authors