Skip to yearly menu bar Skip to main content


OGB-LSC 2022: A Large-Scale Challenge for ML on Graphs

Weihua Hu · Matthias Fey · Hongyu Ren · Maho Nakata · Yuxiao Dong · Jure Leskovec



Enabling effective and efficient machine learning (ML) over large-scale graph data (e.g., graphs with billions of edges) can have a huge impact on both industrial and scientific applications. At KDD Cup 2021, we organized the OGB Large-Scale Challenge (OGB-LSC), where we provided large and realistic graph ML tasks. Our KDD Cup attracted huge attention from graph ML community (more than 500 team registrations across the globe), facilitating innovative methods being developed to yield significant performance breakthrough. However, the problem of machine learning over large graphs is not solved yet and it is important for the community to engage in a focused multi-year effort in this area (like ImageNet and MS-COCO). Here we propose an annual ML challenge around large-scale graph datasets, which will drive forward method development and allow for tracking progress. We propose the 2nd OGB-LSC (referred to as OGB-LSC 2022) around the OGB-LSC datasets. Our proposed challenge consists of three tracks, covering core graph ML tasks of node-level prediction (academic paper classification with 240 million nodes), link-level prediction (knowledge graph completion with 90 million entities), and graph-level prediction (molecular property prediction with 4 million graphs). Importantly, we have updated two out of the three datasets based on the lessons learned from our KDD Cup, so that the resulting datasets are more challenging and realistic. Our datasets are extensively validated through our baseline analyses and last year’s KDD Cup. We also provide the baseline code as well as Python package to easily load the datasets and evaluate the model performance.