Timezone: »
With the rapid emergence of graph representation learning, the construction of new large-scale datasets are necessary to distinguish model capabilities and accurately assess the strengths and weaknesses of each technique. By carefully analyzing existing graph databases, we identify 3 critical components important for advancing the field of graph representation learning: (1) large graphs, (2) many graphs, and (3) class diversity. To date, no single graph database offers all of these desired properties. We introduce MalNet , the largest public graph database ever constructed, representing a large-scale ontology of malicious software function call graphs. MalNet contains over 1.2 million graphs, averaging over 15k nodes and 35k edges per graph, across a hierarchy of 47 types and 696 families. Compared to the popular REDDIT-12K database, MalNet offers 105x more graphs, 44x larger graphs on average, and 63x more classes. We provide a detailed analysis of MalNet, discussing its properties and provenance, along with the evaluation of state-of-the-art machine learning and graph neural network techniques. The unprecedented scale and diversity of MalNet offers exciting opportunities to advance the frontiers of graph representation learning--enabling new discoveries and research into imbalanced classification, explainability and the impact of class hardness. The database is publicly available at www.mal-net.org.
Author Information
Scott Freitas (Georgia Institute of Technology)
Yuxiao Dong (Tsinghua University)
Joshua Neil
Duen Horng Chau (Georgia Tech)
More from the Same Authors
-
2021 : Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning »
Qinkai Zheng · Xu Zou · Yuxiao Dong · Yukuo Cen · Da Yin · Jiarong Xu · Yang Yang · Jie Tang -
2021 : OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs »
Weihua Hu · Matthias Fey · Hongyu Ren · Maho Nakata · Yuxiao Dong · Jure Leskovec -
2021 : GAM Changer: Editing Generalized Additive Models with Interactive Visualization »
Zijie Jay Wang · Harsha Nori · Duen Horng Chau · Jennifer Wortman Vaughan · Rich Caruana -
2021 : A Search Engine for Discovery of Scientific Challenges and Directions »
Dan Lahav · Jon Saad-Falcon · Duen Horng Chau · Diyi Yang · Eric Horvitz · Daniel Weld · Tom Hope -
2022 : A Universal Abstraction for Hierarchical Hopfield Networks »
Benjamin Hoover · Duen Horng Chau · Hendrik Strobelt · Dmitry Krotov -
2022 : A Universal Abstraction for Hierarchical Hopfield Networks »
Benjamin Hoover · Duen Horng Chau · Hendrik Strobelt · Dmitry Krotov -
2022 : A Universal Abstraction for Hierarchical Hopfield Networks »
Benjamin Hoover · Duen Horng Chau · Hendrik Strobelt · Dmitry Krotov -
2021 Poster: Adaptive Diffusion in Graph Neural Networks »
Jialin Zhao · Yuxiao Dong · Ming Ding · Evgeny Kharlamov · Jie Tang -
2021 : A Large-Scale Database for Graph Representation Learning »
Scott Freitas · Yuxiao Dong · Joshua Neil · Duen Horng Chau -
2020 Poster: Open Graph Benchmark: Datasets for Machine Learning on Graphs »
Weihua Hu · Matthias Fey · Marinka Zitnik · Yuxiao Dong · Hongyu Ren · Bowen Liu · Michele Catasta · Jure Leskovec -
2020 Spotlight: Open Graph Benchmark: Datasets for Machine Learning on Graphs »
Weihua Hu · Matthias Fey · Marinka Zitnik · Yuxiao Dong · Hongyu Ren · Bowen Liu · Michele Catasta · Jure Leskovec -
2020 Poster: Graph Random Neural Networks for Semi-Supervised Learning on Graphs »
Wenzheng Feng · Jie Zhang · Yuxiao Dong · Yu Han · Huanbo Luan · Qian Xu · Qiang Yang · Evgeny Kharlamov · Jie Tang -
2020 Oral: Graph Random Neural Networks for Semi-Supervised Learning on Graphs »
Wenzheng Feng · Jie Zhang · Yuxiao Dong · Yu Han · Huanbo Luan · Qian Xu · Qiang Yang · Evgeny Kharlamov · Jie Tang