Poster
in
Workshop: AI for Science: Mind the Gaps

GraphGT: Machine Learning Datasets for Graph Generation and Transformation

Yuanqi Du · Shiyu Wang · Xiaojie Guo · Hengning Cao · Shujie Hu · Junji Jiang · Aishwarya Varala · Abhinav Angirekula · Liang Zhao


Abstract:

Graph generation, which learns from known graphs and discovers novel graphs, has great potential in numerous research topics like drug design and mobility synthesis and is one of the fastest-growing domains recently due to its promise for discovering new knowledge. Though many benchmark datasets have emerged in the domain of graph representation learning, the real-world datasets for graph generation problem are much fewer and limited to a small number of areas such as molecules and citation networks. To fill the gap, we introduce GraphGT, a large dataset collection for graph generation problem in machine learning, which contains 36 datasets from 9 domains across 6 subjects. To assist the researchers with better explorations of the datasets, we provide a systemic review and classification of the datasets from various views including research tasks, graph types, and application domains. In addition, GraphGT provides an easy-to-use graph generation pipeline that simplifies the process for graph data loading, experimental setup, model evaluation. The community can query and access datasets of interest according to a specific domain, task, or type of graph. GraphGT will be regularly updated and welcome inputs from the community. GraphGT is publicly available at \url{https://graphgt.github.io/} and can also be accessed via an open Python library.