Workshop
New Frontiers in Graph Learning
Jiaxuan You · Marinka Zitnik · Rex Ying · Yizhou Sun · Hanjun Dai · Stefanie Jegelka
Theater A
Background. In recent years, graph learning has quickly grown into an established subfield of machine learning. Researchers have been focusing on developing novel model architectures, theoretical understandings, scalable algorithms and systems, and successful applications across industry and science regarding graph learning. In fact, more than 5000 research papers related to graph learning have been published over the past year alone.
Challenges. Despite the success, existing graph learning paradigms have not captured the full spectrum of relationships in the physical and the virtual worlds. For example, in terms of applicability of graph learning algorithms, current graph learning paradigms are often restricted to datasets with explicit graph representations, whereas recent works have shown promise of graph learning methods for applications without explicit graph representations. In terms of usability, while popular graph learning libraries greatly facilitate the implementation of graph learning techniques, finding the right graph representation and model architecture for a given use case still requires heavy expert knowledge. Furthermore, in terms of generalizability, unlike domains such as computer vision and natural language processing where largescale pretrained models generalize across downstream applications with little to no finetuning and demonstrate impressive performance, such a paradigm has yet to succeed in the graph learning domain.
Goal. The primary goal of this workshop is to expand the impact of graph learning beyond the current boundaries. We believe that graph, or relation data, is a universal language that can be used to describe the complex world. Ultimately, we hope graph learning will become a generic tool for learning and understanding any type of (structured) data. We aim to present and discuss the new frontiers in graph learning with researchers and practitioners within and outside the graph learning community. New understandings of the current challenges, new perspectives regarding the future directions, and new solutions and applications as proof of concepts are highly welcomed.
Scope and Topics. We welcome submissions regarding the new frontiers of graph learning, including but not limited to:
 Graphs in the wild: Graph learning for datasets and applications without explicit relational structure (e.g., images, text, audios, code). Novel ways of modeling structured/unstructured data as graphs are highly welcomed.
 Graphs in ML: Graph representations in general machine learning problems (e.g., neural architectures as graphs, relations among input data and learning tasks, graphs in large language models, etc.)
 New oasis: Graph learning methods that are significantly different from the current paradigms (e.g., largescale pretrained models, multitask models, super scalable algorithms, etc.)
 New capabilities: Graph representation for knowledge discovery, optimization, causal inference, explainable ML, ML fairness, etc.
 Novel applications: Novel applications of graph learning in realworld industry and scientific domains. (e.g., graph learning for missing data imputation, program synthesis, etc.)
Call for papers
Submission deadline: Thursday, Sept 22, 2022 (16:59 PDT)
Submission site (OpenReview): NeurIPS 2022 GLFrontiers Workshop
Author notification: Thursday, Oct 6, 2022
Camera ready deadline: Thursday, Oct 27, 2022 (16:59 PDT)
Workshop (in person): Friday, Dec 2, 2022
The workshop will be held fully in person at the New Orleans Convention Center, as part of the NeurIPS 2022 conference. We also plan to offer livestream for the event, and more details will come soon.
We welcome both short research papers of up to 4 pages (excluding references and supplementary materials), and fulllength research papers of up to 8 pages (excluding references and supplementary materials). All accepted papers will be presented as posters. We plan to select around 6 papers for oral presentations and 2 papers for the outstanding paper awards with potential cash incentives.
All submissions must use the NeurIPS template. We do not require the authors to include the checklist in the template. Submissions should be in .pdf format, and the review process is doubleblind—therefore the papers should be appropriately anonymized. Previously published work (or underreview) is acceptable.
Should you have any questions, please reach out to us via email:
glfrontiers@googlegroups.com
Schedule
Fri 6:40 a.m.  7:00 a.m.

Jiaxuan You
(
Opening remarks
)
SlidesLive Video 
Jiaxuan You 🔗 
Fri 7:00 a.m.  7:30 a.m.

Invited talk: Kelsey Allen
(
Invited talk
)
SlidesLive Video 
Kelsey Allen 🔗 
Fri 7:30 a.m.  8:00 a.m.

Invited talk: Azalia Mirhoseini
(
Invited talk
)
SlidesLive Video 
Azalia Mirhoseini 🔗 
Fri 8:30 a.m.  9:00 a.m.

Oral presenters
(
Contributed talks: Part 1
)
SlidesLive Video 
🔗 
Fri 9:00 a.m.  10:00 a.m.

All participants
(
Poster sessions: Morning
)

🔗 
Fri 10:00 a.m.  11:00 a.m.

All participants
(
Lunch break
)

🔗 
Fri 11:00 a.m.  11:30 a.m.

Invited talk: Francesco Di Giovanni
(
Invited talk
)
SlidesLive Video 
Francesco Di Giovanni · Francesco Di Giovanni 🔗 
Fri 11:30 a.m.  12:00 p.m.

Invited talk: Matej Balog
(
Invited talk
)
SlidesLive Video 
Matej Balog · Matej Balog 🔗 
Fri 12:00 p.m.  12:30 p.m.

Invited talk: Xavier Bresson
(
Invited talk
)
SlidesLive Video 
Xavier Bresson 🔗 
Fri 12:30 p.m.  1:00 p.m.

Contributed talks: Part 2
(
Oral presenters
)
SlidesLive Video 
🔗 
Fri 1:00 p.m.  1:30 p.m.

All participants
(
Coffee break
)

🔗 
Fri 1:30 p.m.  2:15 p.m.

Panelists
(
Panel Discussions: The Future of Graph Learning
)
SlidesLive Video 
🔗 
Fri 2:15 p.m.  3:00 p.m.

All participants
(
Poster sessions: Afternoon
)

🔗 


Spectrum Guided Topology Augmentation for Graph Contrastive Learning
(
Poster
)
link
Graph contrastive learning (GCL) is a major selfsupervised graph learning technique that aims to capture invariant properties of graphs via instance discrimination. Its performance heavily relies on the construction of multiple graph views yet it still remains unclear about what makes effective topology augmentations. Recent studies mainly perform topology augmentations in a uniformly random manner without considering graph properties. In this work, we aim to find principled ways for topology augmentations by exploring the invariance of graphs from the graph spectral perspective. Specifically, we propose a novel topology augmentation method guided by spectral change. Extensive experiments on both graph and node classification tasks demonstrate the effectiveness of our method in capturing the structural essence of graphs for selfsupervised learning. The proposed method also brings promising performance in transfer learning and adversarial attack settings. We envision this work to provide a principled way for graph augmentation. 
Lu Lin · Jinghui Chen · Hongning Wang 🔗 


SketchGNN: Scalable Graph Neural Networks with Sublinear Training Complexity
(
Poster
)
link
Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or minibatch sample the graph (which results in exponentially growing computational complexities with respect to the number of GNN layers). Various samplingbased and historicalembeddingbased methods are proposed to avoid this exponential growth of complexities. However, none of these solutions eliminates the linear dependence on graph size. This paper proposes a sketchbased algorithm whose training time and memory grow sublinearly with respect to graph size by training GNNs atop a few compact sketches of graph adjacency and node embeddings. Based on polynomial tensorsketch (PTS) theory, our framework provides a novel protocol for sketching nonlinear activations and graph convolution matrices in GNNs, as opposed to existing methods that sketch linear weights or gradients in neural networks. In addition, we develop a locality sensitive hashing (LSH) technique that can be trained to improve the quality of sketches. Experiments on largegraph benchmarks demonstrate the scalability and competitive performance of our SketchGNNs versus their fullsize GNN counterparts. 
Mucong Ding · Tahseen Rabbani · Bang An · Evan Wang · Furong Huang 🔗 


Graph Neural Networks for Selection of Preconditioners and Krylov Solvers
(
Poster
)
link
Solving large sparse linear systems is ubiquitous in science and engineering, generally requiring iterative solvers and preconditioners, as many problems cannot be solved efficiently by using direct solvers. However, the practical performance of solvers and preconditioners is sometimes beyond theoretical analysis and an optimal choice calls for intuitions from domain experts, knowledge of the hardware, as well as trial and error. In this work, we propose a new method for optimal solverpreconditioner selection using Graph Neural Networks (GNNs), as a complementary solution to laborious expert efforts. The method is based upon the graph representation of the problem and the idea of integrating node features with graph features via graph convolutions. We show that our models perform favorably well compared with traditional machine learning models investigated by the prior literature (with an improvement of 25\% in selected evaluation metrics). Implementation details and possible limitations and improvements will be discussed. 
Ziyuan Tang · Hong Zhang · Jie Chen 🔗 


Faster Hyperparameter Search on Graphs via Calibrated Dataset Condensation
(
Poster
)
link
Dataset condensation aims to reduce the computational cost of training multiple models on a large dataset by condensing the training set into a small synthetic one. Stateoftheart approaches rely on matching the gradients between the real and synthetic data and are recently applied to condense largescale graphs for node classification tasks. Although dataset condensation may be efficient when we need to train multiple models for hyperparameter optimization, there is no theoretical guarantee on the generalizability of the condensed data, and it can generalize poorly across hyperparameters/architectures in practice; while on graphs, we find and prove this overfitting is much more severe. This paper considers a different condensation objective specifically for hyperparameter search. We aim to generate the synthetic dataset so that the validationperformance ranking of different models under different hyperparameters on the condensed and original datasets are comparable. We propose a novel hyperparametercalibrated dataset condensation (HCDC) algorithm, which learns the synthetic validation data by matching the hyperparameter gradients computed by implicit differentiation and efficient inverse Hessian approximation. HCDC employs a supernet with differentiable hyperparameters, making it suitable for modeling GNNs with widely different convolution filters. Experiments demonstrate that the proposed framework effectively maintains the validationperformance rankings of GNNs and speeds up hyperparameter/architecture search on graphs. 
Mucong Ding · Xiaoyu Liu · Tahseen Rabbani · Furong Huang 🔗 


SequenceGraph Duality: Unifying User Modeling with SelfAttention for Sequential Recommendation
(
Poster
)
link
User modeling is of great importance in personalization services. Many existing methods treat users as sequences of interacted items to encode item transition patterns. Another line of research models users as graphs in which interacted items are modelled as nodes, and itemitem relations are modelled as edges. The graphbased user modeling provides more flexibility to encode complex item relationships of different types (e.g. cooccurrence, similarity) but usually overlooks sequential patterns. Here, we introduce a novel user representation, Heterogeneous User Graph (HUG), which unifies sequence and graphbased user modeling to take advantage of both methods. An HUG is associated with two types of edges: sequential edges that preserve the sequential information and collaborative edges that store different itemitem relationship. To learn users' latent representations from their HUGs, we propose a multihead attention based architecture called Heterogeneous User Graph Transformer (HUGT). HUGT is developed on the basis of SASRec and can concurrently capture the sequential transition pattern and complex graph topology. We conduct experiments on four real world datasets from three different application domains. Experimental results show that (1) jointly modeling users as sequences and graphs with HUG provides better recommendation performance over sequenceonly and graphonly user modeling; (2) HUGT is effective in learning user latent representations from HUGs; (3) HUGT outperforms the baselines by up to 10\% on datasets with long sequences and aligns with the stateoftheart performance on datasets with short sequences. 
Zeren Shui · Ge Liu · Anoop Deoras · George Karypis 🔗 


How Powerful is Implicit Denoising in Graph Neural Networks
(
Poster
)
link
Graph Neural Networks (GNNs), which aggregate features from neighbors, are widely used for processing graphstructured data due to their powerful representation learning capabilities. It is generally believed that GNNs can implicitly remove feature noises and thus still obtain generalizable models. This point of view motivates some existing works to derive GNN models from the graph signal denoising (GSD) problem. However, few works have rigorously analyzed the implicit denoising effect in graph neural networks. In this work, we conduct a comprehensive theoretical study and analyze when and why implicit denoising happens in GNNs. Our theoretical analysis suggests that the implicit denoising largely depends on the connectivity and size of the graph, as well as the GNN architectures. Moreover, extensive empirical evaluations verify our theoretical analyses and the effectiveness of GNNs in eliminating noise in the feature matrix compared with MultiLayer Perceptron (MLP). Moreover, motivated by adversarial machine learning in improving the robustness of neural networks and the correlation of node features and graph structure in GSD, we propose the adversarial graph signal denoising (AGSD) problem. By solving such a problem, we derive a robust graph convolution, where the smoothness of the node representations and the implicit denoising effect can be enhanced. 
Songtao Liu · Rex Ying · Hanze Dong · Lu Lin · Jinghui Chen · Dinghao Wu 🔗 


PolarMOT: How far can geometric relations take us in 3D multiobject tracking?
(
Poster
)
link
Most (3D) multiobject tracking methods rely on objectlevel information, e.g. appearance, for data association. By contrast, we investigate how far we can get by considering only geometric relationships and interactions between objects over time. We represent each tracking sequence as a multiplex graph where 3D object detections are nodes, and spatial and temporal pairwise relations among them are encoded via two types of edges. This structure allows our graph neural network to consider all types of interactions and distinguish temporal, contextual and motion cues to obtain final scene interpretation by posing tracking as edge classification. The model outputs classification results after multiple rounds of neural message passing, during which it is able to reason about longterm object trajectories, influences and motion based solely on initial pairwise relationships. To enable our method for online (streaming) scenarios, we introduce a technique to continuously evolve our graph over long tracking sequences to achieve good performance while maintaining sparsity with linear complexity for the number of edges. We establish a new stateoftheart on the nuScenes dataset. 
Aleksandr Kim · Guillem Braso · Aljosa Osep · Laura LealTaixé 🔗 


Agentbased Graph Neural Networks
(
Poster
)
link
We present a novel graph neural network we call AgentNet, which is designed specifically for graphlevel tasks. AgentNet is inspired by sublinear algorithms, featuring a computational complexity that is independent of the graph size. The architecture of AgentNet differs fundamentally from the architectures of traditional graph neural networks. In AgentNet, some trained \textit{neural agents} intelligently walk the graph, and then collectively decide on the output. We provide an extensive theoretical analysis of AgentNet: We show that the agents can learn to systematically explore their neighborhood and that AgentNet can distinguish some structures that are even indistinguishable by 3WL. Moreover, AgentNet is able to separate any two graphs which are sufficiently different in terms of subgraphs. We confirm these theoretical results with synthetic experiments on hardtodistinguish graphs and realworld graph classification tasks. In both cases, we compare favorably not only to standard GNNs but also to computationally more expensive GNN extensions. 
Karolis Martinkus · Pál András Papp · Benedikt Schesch · Roger Wattenhofer 🔗 


Dissimilar Nodes Improve Graph Active Learning
(
Poster
)
link
Training labels for graph embedding algorithms could be costly to obtain in many practical scenarios. Information scorebased active learning (AL) algorithms are frameworks that could obtain the most useful labels for training while keeping the total label queries under a certain budget. The existing Active Graph Embedding framework has been shown to be capable of bringing some improvement to the node classification tasks of Graph Convolutional Networks, however, when evaluating the importance of unlabeled nodes, they fail to consider the influence of existing labeled nodes on the value of unlabeled nodes. With the aim of addressing this limitation, in this work, we introduce 3 dissimilaritybased information scores for active learning: feature dissimilarity score (FDS), structure dissimilarity score (SDS), and embedding dissimilarity score (EDS). According to experiments, our newly proposed scores boost the classification accuracy by 2.1$\%$ on average and are capable of generalizing to different Graph Neural Network architectures.

Zhicheng Ren · Yifu Yuan · Yuxin Wu · Xiaxuan Gao · Yewen Wang · Yizhou Sun 🔗 


Convolutional Neural Networks on Manifolds: From Graphs and Back
(
Poster
)
link
In recent years, geometric deep learning has gained attraction due to both the need for machine learning on structured data (e.g., graphs) and the increasing availability of this type of data. Extensions of deep convolutional architectures to nonEuclidean domains in particular are a powerful technique in sensor network applications  which can be seen as graphs  and 3D model analysis  which can be seen as manifolds. While recent works have provided a better theoretical understanding of why convolutional neural network architectures work well on graphs of moderate size, in the largescale regime that is the setting of most problems of interest, their behavior is not as well understood. In this paper, we bridge this gap by modeling large graphs as samples from manifolds and studying manifold neural networks (MNNs). Our main contribution is to define a manifold convolution operation which, when ``discretized'' in both the space and time domains, is consistent with the practical implementation of a graph convolution. We then show that graph neural networks (GNNs) can be particularized from MNNs, which in turn are the limits of these GNNs. We conclude with numerical experiments showcasing an application of the MNN to pointcloud classification. 
Zhiyang Wang · Luana Ruiz · Alejandro Ribeiro 🔗 


Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement
(
Poster
)
link
Object rearrangement is a challenge for embodied agents because solving these tasks requires generalizing across a combinatorially large set of underlying entities that take the value of object states. Worse, these entities are often unknown and must be inferred from sensory percepts. We present a hierarchical abstraction approach to uncover these underlying entities and achieve combinatorial generalization from unstructured inputs. By constructing a factorized transition graph over clusters of object representations inferred from pixels, we show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment. We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects, which outperforms current offline deep RL methods when evaluated on a set of simulated rearrangement and stacking tasks. 
Michael Chang · Alyssa L Dayan · Franziska Meier · Tom Griffiths · Sergey Levine · Amy Zhang 🔗 


Diffusion Models for Graphs Benefit From Discrete State Spaces
(
Poster
)
link
Denoising diffusion probabilistic models and score matching models have proven to be very powerful for generative tasks. While these approaches have also been applied to the generation of discrete graphs, they have, so far, relied on continuous Gaussian perturbations. Instead, in this work, we suggest using discrete noise for the forward Markov process. This ensures that in every intermediate step the graph remains discrete. Compared to the previous approach, our experimental results on four datasets and multiple architectures show that using a discrete noising process results in higher quality generated samples indicated with an average MMDs reduced by a factor of 1.5. Furthermore, the number of denoising steps is reduced from 1000 to 32 steps leading to a 30 times faster sampling procedure. 
Kilian Haefeli · Karolis Martinkus · Nathanael Perraudin · Roger Wattenhofer 🔗 


Multimodal Video Understanding using Graph Convolutional Network
(
Poster
)
link
Majority of existing semantic video understanding methods process every video independently without considering the underlying intervideo relationships. However, videos uploaded by individuals on social media platforms like YouTube, Instagram etc. exhibit intervideo relationship which are a reflection of individual’s interest, geography, culture etc. In this work, we explicitly attempt to model this intervideo relationship, originating from the creators of these videos using Graph Neural Networks (GNN) in a multimodal setup. We perform video classification by leveraging the creators of the videos and semantic similarity between for creating edges between videos and observe improvements of 4% in accuracy 
Ayush Singh · Vikram Gupta 🔗 


Condensing Graphs via OneStep Gradient Matching
(
Poster
)
link
As training deep learning models on large dataset takes a lot of time and resources, it is desired to construct a small synthetic dataset with which we can train deep learning models sufficiently. There are recent works that have explored solutions on condensing image datasets through complex bilevel optimization. For instance, dataset condensation (DC) matches network gradients w.r.t. largereal data and smallsynthetic data, where the network weights are optimized for multiple steps at each outer iteration. However, existing approaches have their inherent limitations: (1) they are not directly applicable to graphs where the data is discrete; and (2) the condensation process is computationally expensive due to the involved nested optimization. To bridge the gap, we investigate efficient dataset condensation tailored for graph datasets where we model the discrete graph structure as a probabilistic model. We further propose a onestep gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. Extensive experiments on various graph datasets demonstrate the effectiveness and efficiency of the proposed method. In particular, we are able to reduce the dataset size by $90$\% while approximating up to $98$\% of the original performance and our method is significantly faster than multistep gradient matching (e.g. $15$× in CIFAR10 for synthesizing $500$ graphs).

Wei Jin · Xianfeng Tang · Haoming Jiang · Zheng Li · Danqing Zhang · Jiliang Tang · Bing Yin 🔗 


From Local to Global: SpectralInspired Graph Neural Networks
(
Poster
)
link
Graph Neural Networks (GNNs) are powerful deep learning methods for NonEuclidean data. Popular GNNs are messagepassing algorithms (MPNNs) that aggregate and combine signals in a local graph neighborhood. However, shallow MPNNs tend to miss longrange signals and perform poorly on some heterophilous graphs, while deep MPNNs can suffer from issues like oversmoothing or oversquashing. To mitigate such issues, existing works typically borrow normalization techniques from training neural networks on Euclidean data or modify the graph structures. Yet these approaches are not wellunderstood theoretically and could increase the overall computational complexity. In this work, we draw inspirations from spectral graph embedding and propose \texttt{PowerEmbed}  a simple layerwise normalization technique to boost MPNNs. We show \texttt{PowerEmbed} can provably express the top$k$ leading eigenvectors of the graph operator, which prevents oversmoothing and is agnostic to the graph topology; meanwhile, it produces a list of representations ranging from local features to global signals, which avoids oversquashing. We apply \texttt{PowerEmbed} in a wide range of simulated and real graphs and demonstrate its competitive performance, particularly for heterophilous graphs.

Ningyuan Huang · Soledad Villar · Carey E Priebe · Da Zheng · Chengyue Huang · Lin Yang · Vladimir Braverman 🔗 


Provably expressive temporal graph networks
(
Poster
)
link
Temporal graph networks (TGNs) have gained prominence as models for embedding dynamic interactions, but little is known about their theoretical underpinnings. We establish fundamental results about the representational power and limits of the two main categories of TGNs: those that aggregate temporal walks (WATGNs), and those that augment local message passing with recurrent memory modules (MPTGNs). Specifically, novel constructions reveal the inadequacy of MPTGNs and WATGNs, proving that neither category subsumes the other. We extend the 1WL (WeisfeilerLeman) test to temporal graphs, and show that the most powerful MPTGNs should use injective updates, as in this case they become as expressive as the temporal WL. Also, we show that sufficiently deep MPTGNs cannot benefit from memory, and MP/WATGNs fail to compute graph properties such as girth. These theoretical insights lead us to introduce PINT  a novel architecture that leverages injective temporal message passing and relative positional features. Importantly, PINT is provably more expressive than both MPTGNs and WATGNs.Our experiments demonstrate that PINT significantly outperforms existing TGNs on several realworld benchmarks. 
Amauri Souza · Diego Mesquita · Samuel Kaski · Vikas Garg 🔗 


pyGSL: A Graph Structure Learning Toolkit
(
Poster
)
link
We introduce pyGSL, a Python library that provides efficient implementations of stateoftheart graph structure learning models along with diverse datasets to evaluate them on. The implementations are written in GPUfriendly ways, allowing one to scale to much larger network tasks. A common interface is introduced for algorithm unrolling methods, unifying implementations of recent stateoftheart techniques and allowing new methods to be quickly developed by avoiding the need to rebuild the underlying unrolling infrastructure. Implementations of differentiable graph structure learning models are written in PyTorch, allowing us to leverage the rich software ecosystem that exists e.g., around logging, hyperparameter search, and GPUcommunication. This also makes it easy to incorporate these models as components in larger gradient based learning systems where differentiable estimates of graph structure may be useful, e.g. in latent graph learning. Diverse datasets and performance metrics allow consistent comparisons across models in this fast growing field. The full code repository on GitHub will be included upon acceptance. 
Max Wasserman · Gonzalo Mateos 🔗 


Improving Graph Neural Networks at Scale: Combining Approximate PageRank and CoreRank
(
Poster
)
link
Graph Neural Networks (GNNs) have achieved great successes in many learning tasks performed on graph structures. Nonetheless, to propagate information GNNs rely on a message passing scheme which can become prohibitively expensive when working with industrialscale graphs. Inspired by the PPRGo model, we propose the CorePPR model, a scalable solution that utilises a learnable convex combination of the approximate personalised PageRank and the CoreRank to diffuse multihop neighbourhood information in GNNs. Additionally, we incorporate a dynamic mechanism to select the most influential neighbours for a particular node which reduces training time while preserving the performance of the model. Overall, we demonstrate that CorePPR outperforms PPRGo, particularly on large graphs where selecting the most influential nodes is particularly relevant for scalability. 
Ariel Ramos Vela · Johannes Lutzeyer · Anastasios Giovanidis · Michalis Vazirgiannis 🔗 


GraphWorld: Fake Graphs BringReal Insights for GNNs
(
Poster
)
link
Despite advances in the field of Graph Neural Networks (GNNs), only a small number (~5) of datasets are currently used to evaluate new models. This continued reliance on a handful of datasets provides minimal insight into the performance differences between models, and is especially challenging for industrial practitioners who have datasets which are very different from academic benchmarks.In this work we introduce GraphWorld, a novel methodology and system for benchmarking GNN models on an arbitrarilylarge population of synthetic graphs for any conceivable GNN task. GraphWorld allows a user to efficiently generate a \emph{world} with millions of statistically diverse datasets. It is accessible, scalable, and easy to use. GraphWorld can be run on a single machine without specialized hardware, or it can be easily scaled up to run on arbitrary clusters or cloud frameworks. Using GraphWorld, a user has finegrained control over graph generator parameters, and can benchmark arbitrary GNN models.We present insights from GraphWorld experiments on the performance of thirteen GNN models and baselines over millions of benchmark datasets. We show that GraphWorld efficiently explores regions of benchmark dataset space uncovered by standard benchmarks, revealing comparisons between models that have not been historically obtainable. Using GraphWorld, we also are able to study indetail the relationship between graph properties and task performance metrics, which is nearly impossible with the classic collection of realworld benchmarks. 
John Palowitch · Anton Tsitsulin · Bryan Perozzi · Brandon Mayer 🔗 


GLINKX: A Unified Framework for Largescale Homophilous and Heterophilous Graphs
(
Poster
)
link
In graph learning, there have been two main inductive biases regarding graphinspired architectures: On the one hand, higherorder interactions and message passing work well on homophilous graphs and are leveraged by GCNs and GATs. Such architectures, however, cannot easily scale to large realworld graphs. On the other hand, shallow (or nodelevel) models using ego features and adjacency embeddings work well in heterophilous graphs. In this work, we propose a novel scalable shallow method  GLINKX  that can work both on homophilous and heterophilous graphs. GLINKX leverages (i) novel monophilous label propagations (ii) ego/node features, (iii) knowledge graph embeddings as positional embeddings, (iv) nodelevel training, and (v) lowdimensional message passing, to achieve scaling in large graphs. We show the effectiveness of GLINKX on several homophilous and heterophilous datasets. 
Marios Papachristou · Rishab Goel · Frank Portman · Matthew Miller · Rong Jin 🔗 


Modular Flows: Differential Molecular Generation
(
Poster
)
link
Generating new molecules is fundamental to advancing critical applications such as drug discovery and material synthesis. Flows can generate molecules effectively by inverting the encoding process, however, existing flow models either require artifactual dequantization or specific node/edge orderings, lack desiderata such as permutation invariance or induce discrepancy between encoding and decoding steps that necessitates post hoc validity correction. We circumvent these issues with novel continuous normalizing E(3)equivariant flows, based on a system of node ODEs coupled as a graph PDE, that repeatedly reconcile locally toward globally aligned densities. Our models can be cast as message passing temporal networks, and result in superlative performance on the tasks of density estimation and molecular generation. In particular, our generated samples achieve state of art on both the standard QM9 and ZINC250K benchmarks. 
Yogesh Verma · Samuel Kaski · Markus Heinonen · Vikas Garg 🔗 


GIST: Distributed Training for LargeScale Graph Convolutional Networks
(
Poster
)
link
The graph convolutional network (GCN) is a goto solution for machine learning on graphs, but its training is notoriously difficult to scale both in terms of graph size and the number of model parameters. Although some work has explored training on largescale graphs, we pioneer efficient training of largescale GCN models with the proposal of a novel, distributed training framework, called \texttt{GIST}. \texttt{GIST} disjointly partitions the parameters of a GCN model into several, smaller subGCNs that are trained independently and in parallel. Compatible with all GCN architectures and existing sampling techniques, \texttt{GIST} $i)$ improves model performance, $ii)$ scales to training on arbitrarily large graphs, $iii)$ decreases wallclock training time, and $iv)$ enables the training of markedly overparameterized GCN models. Remarkably, with \texttt{GIST}, we train an astonishglywide $32,\!768$dimensional GraphSAGE model, which exceeds the capacity of a single GPU by a factor of $8\times$, to SOTA performance on the Amazon2M dataset.

Cameron Wolfe · Jingkang Yang · Fangshuo Liao · Arindam Chowdhury · Chen Dun · Artun Bayer · Santiago Segarra · Anastasios Kyrillidis 🔗 


Neural Coarsening Process for Multilevel Graph Combinatorial Optimization
(
Poster
)
link
Combinatorial optimization (CO) is applicable to various industrial fields, but solving CO problems is usually NPhard. Thus, previous studies have focused on designing heuristics to solve CO within a reasonable time. Recent advances in deep learning show the potential to automate the designing process of CO solvers by leveraging the powerful representation capability of deep neural networks. Practically, solving CO is often cast as a multilevel process; the lowerlevel CO problems are solved repeatedly so as to solve the upperlevel CO problem. In this case, the number of iterations within the lowerlevel process can dramatically impact the overall process. This paper proposes a new graph learning method, Neural Coarsening Process (NCP), to reduce the number of graph neural network inferences for lowerlevel CO problems. Experimental results show that NCP effectively reduces the number of inferences as compared to fully sequential decisionmaking. Furthermore, NCP outperforms competitive heuristics on CVRPCapacityCut, a subproblem of the cutting plane method for the capacitated vehicle routing problem (CVRP). 
Hyeonah Kim · Minsu Kim · Changhyun Kwon · Jinkyoo Park 🔗 


Graph Contrastive Learning with Crossview Reconstruction
(
Poster
)
link
Although different graph selfsupervised learning strategies have been proposed to tackle the supervision shortage issue in graph learning tasks, Graph contrastive learning (GCL) has been the most prevalent approach to this problem. Despite the remarkable performances those GCL methods have achieved, existing GCL methods that heavily depend on various manually designed augmentation techniques still struggle to improve model robustness without risking losing taskrelevant information. Consequently, the learned representation is either brittle or unilluminating. In light of this, we introduce the GraphCV, which follows the information bottleneck principle to learn minimal yet sufficient representations from graph data. Specifically, our proposed model elicits the predictive (useful for downstream instance discrimination) and other nonpredictive features separately. Except for the conventional contrastive loss which guarantees the consistency and sufficiency of the representations across different augmentation views, we introduce a crossview reconstruction mechanism to pursue the disentanglement of the two learned representations. Besides, an adversarial global view is added as the third view of contrastive loss to avoid the learned representation from being drafted too far away from the original distribution. We empirically demonstrate that our proposed model outperforms the stateoftheart on graph classification task over multiple benchmark datasets. 
Qianlong Wen · Zhongyu Ouyang · Chunhui Zhang · Yiyue Qian · Yanfang Ye · Chuxu Zhang 🔗 


Skeleton Clustering: GraphBased Approach for DimensionFree DensityAided Clustering
(
Poster
)
link
Densitybased clustering can identify clusters with irregular shapes and has intuitive interpretations, but struggles with largedimensional data due to the curse of dimensionality.We introduce a graphbased clustering framework called \textit{Skeleton Clustering} to adopt densitybased clustering idea to multivariate and even highdimensional data. The proposed framework constructs a graph representation of the data as a first step and combines prototype methods, densitybased clustering, and hierarchical clustering.We propose surrogate density measures based on the skeleton graph that are less dependent on the dimension and have meaningful geometric interpretations. We show by empirical studies that the proposed skeleton clustering method leads to reliable clusters in multivariate and even highdimensional data with irregular shapes. 
Zeyu Wei · YenChi Chen 🔗 


AutoTransfer: AutoML with Knowledge Transfer  An Application to Graph Neural Networks
(
Poster
)
link
AutoML has demonstrated remarkable success in finding an effective neural architecture for a given machine learning task defined by a specific dataset and an evaluation metric. However, most present AutoML techniques consider each task independently from scratch, which leads to many explored architectures and high computational cost. Here we propose AutoTransfer, an AutoML solution that improves search efficiency by transferring the known architectural design knowledge to the novel task of interest. Our key insights are a taskmodel bank that captures the training performance over a diverse set of GNN architectures and tasks, and a computationally efficient task embedding that can accurately measure the similarity between different tasks. Based on the taskmodel bank and the task embeddings, we estimate the design priors of desirable models of the novel task, by aggregating a similarityweighted sum of the topK design distributions on tasks that are similar to the task of interest. The computed design priors can be used with any AutoML search algorithm. We evaluate AutoTransfer on six datasets in the graph machine learning domain. Experiments demonstrate that (i) our proposed task embedding can be computed efficiently, and that tasks with similar embeddings have similar bestperforming architectures; (ii) AutoTransfer significantly improves search efficiency with the transferred design priors, reducing the number of explored architectures by an order of magnitude. Finally, we release GNNBank101, a largescale dataset of detailed GNN training information of 120,000 taskmodel combinations to facilitate and inspire future research. 
Kaidi Cao · Jiaxuan You · Jiaju Liu · Jure Leskovec 🔗 


Individual Fairness in Dynamic Financial Networks
(
Poster
)
link
In the financial world, a transaction graph is commonly used for modeling the everchanging payeepayor relationships. Every online transaction corresponds to a directed edge from the paying party to the receiving party in this graph. Even though the superior learning capability of Graph Neural Networks (GNNs) has led to several successful financial applications like fraud detection and antimoney laundering, most of these existing works do not have fairness considerations. Apparently, the lack of fairness guarantees during the GNNbased decisionmaking process would cause increasingly serious societal concerns from both buyers and sellers. Furthermore, the timevarying property of the financial networks makes the fairness requirements more challenging, since current fairness measures on graph learning tasks and fairnessaware GNN models are all designed for static graphs only. In this work, we present a new generic definition of individual fairness for dynamic graphs and propose a regularizationbased method to debias the GNN model in the temporal setting. We perform some preliminary experimental evaluations on two realworld datasets and demonstrate the potential efficacy of the proposed methods. 
Zixing Song · Yueen Ma · Irwin King 🔗 


Complete the Missing Half: Augmenting Aggregation Filtering with Diversification for Graph Convolutional Networks
(
Poster
)
link
The core operation of current Graph Neural Networks (GNNs) is the aggregation enabled by the graph Laplacian or message passing, which filters the neighborhood node information. Though effective for various tasks, in this paper, we show that they are potentially a problematic factor underlying all GNN methods for learning on certain datasets, as they force the node representations similar, making the nodes gradually lose their identity and become indistinguishable. Hence, we augment the aggregation operations with their dual, i.e. diversification operators that make the node more distinct and preserve the identity. Such augmentation replaces the aggregation with a twochannel filtering process that, in theory, is beneficial for enriching the node representations. In practice, the proposed twochannel filters can be easily patched on existing GNN methods with diverse training strategies, including spectral and spatial (message passing) methods. In the experiments, we observe desired characteristics of the models and significant performance boost upon the baselines on $9$ node classification tasks.

Sitao Luan · Mingde Zhao · Chenqing Hua · XiaoWen Chang · Doina Precup 🔗 


SPGP: Structure Prototype Guided Graph Pooling
(
Poster
)
link
While graph neural networks (GNNs) have been successful for node classification tasks and link prediction tasks in graph, learning graphlevel representations still remains a challenge. For the graphlevel representation, it is important to learn both representation of neighboring nodes, i.e., aggregation, and graph structural information. A number of graph pooling methods have been developed for this goal. However, most of the existing pooling methods utilize khop neighborhood without considering explicit structural information in a graph. In this paper, we propose Structure Prototype Guided Pooling (SPGP) that utilizes prior graph structures to overcome the limitation. SPGP formulates graph structures as learnable prototype vectors and computes the affinity between nodes and prototype vectors. This leads to a novel node scoring scheme that prioritizes informative nodes while encapsulating the useful structures of the graph. Our experimental results show that SPGP outperforms stateoftheart graph pooling methods on graph classification benchmark datasets in both accuracy and scalability. 
Sangseon Lee · Dohoon Lee · Yinhua Piao · Sun Kim 🔗 


ACMP: AllenCahn Message Passing with Attractive and Repulsive Forces for Graph Neural Networks
(
Poster
)
link
Neural message passing is a basic feature extraction unit for graphstructured data considering neighboring node features in network propagation from one layer to the next. We model such process by an interacting particle system with attractive and repulsive forces and AllenCahn force arising in the modeling of phase transition. The dynamics of the system is a reactiondiffusion process which can separate particles without blowing up. This induces an AllenCahn message passing (ACMP) for graph neural networks where the numerical iteration for the particle system solution constitutes the message passing propagation. ACMP which has a simple implementation with a neural ODE solver can propel the network depth up to one hundred of layers with theoretically proven strictly positive lower bound of the Dirichlet energy. It thus provides a deep model of GNNs circumventing the common GNN problem of oversmoothing. GNNs with ACMP achieve state of the art performance for realworld node classification tasks on both homophilic and heterophilic datasets. 
Yuelin Wang · Kai Yi · Xinliang Liu · Yuguang Wang · Shi Jin 🔗 


Expander Graph Propagation
(
Poster
)
link
Deploying graph neural networks (GNNs) on wholegraph classification or regression tasks is known to be challenging: it often requires computing node features that are mindful of both local interactions in their neighbourhood and the global context of the graph structure. GNN architectures that navigate this space need to avoid pathological behaviours, such as bottlenecks and oversquashing, while ideally having linear time and space complexity requirements. In this work, we propose an elegant approach based on propagating information over expander graphs. We provide an efficient method for constructing expander graphs of a given size, and use this insight to propose the EGP model. We show that EGP is able to address all of the above concerns, while requiring minimal effort to set up, and provide evidence of its empirical utility on relevant datasets and baselines in the Open Graph Benchmark. Importantly, using expander graphs as a template for message passing necessarily gives rise to negative curvature. While this appears to be counterintuitive in light of recent related work on oversquashing, we theoretically demonstrate that negatively curved edges are likely to be required to obtain scalable message passing without bottlenecks. To the best of our knowledge, this is a previously unstudied result in the context of graph representation learning, and we believe our analysis paves the way to a novel class of scalable methods to counter oversquashing in GNNs. 
Andreea Deac · Marc Lackenby · Petar Veličković 🔗 


Modeling Hierarchical Topological Structure in Scientific Images with Graph Neural Networks
(
Poster
)
link
Topological analysis reveals meaningful structure in data from a variety of domains. Tasks such as image segmentation can be effectively performed on the network structure of an image's topological complex using graph neural networks (GNNs). We propose two methods for using GNNs to learn from the hierarchical information captured by complexes at multiple levels of topological persistence: one modifies the training procedure of an existing GNN, while one extends the message passing across all levels of the complex. Experiments on realworld data from three different domains shows the performance benefits to GNNs from using hierarchical topological structure. 
Samuel Leventhal · Attila Gyulassy · Valerio Pascucci · Mark Heimann 🔗 


Variational Graph AutoEncoders for Heterogeneous Information Network
(
Poster
)
link
Heterogeneous networks, where nodes and their attributes denote realworld entities and links encode relationships between entities, are ubiquitous in many applications, including social media, recommender systems, biological networks, molecular structures, and brain networks. Such applications typically involve node classification or link prediction based on the attributes of the nodes and their connections in the graph. Graph neural networks naturally model the topological structures of \emph{homogeneous} graphs by learning lowdimensional embeddings of nodes. However, heterogeneous graphs with various types of nodes and links pose significant challenges in learning efficient lowdimensional embeddings. Existing methods for learning such representation rely on computationally expensive sampling of metapaths and heuristic adaptation of models developed for homogeneous networks. Therefore, the computational inefficiency and the generalization performance of such methods are frequently criticized and scrutinized, preventing their adoption in solving realworld problems. To mitigate such issues, we present three variants of graph variational autoencoder models for heterogeneous networks that avoid the computationally expensive sampling of metapaths and maintain uncertainty estimates of node embeddings that help with better generalization. We report the results of experiments on link prediction using three different realworld benchmark heterogeneous network data sets that show that the proposed methods significantly outperform stateoftheart baselines. 
Abhishek Dalvi · Ayan Acharya · Jing Gao · Vasant Honavar 🔗 


Contrastive Graph FewShot Learning
(
Poster
)
link
Prevailing deep graph learning models often suffer from label sparsity issue. Although many graph fewshot learning (GFL) methods have been developed to avoid the performance degradation in face of limited annotated data, they excessively rely on labeled data, where the distribution shift in the test phase might result in impaired generalization ability. Additionally, they lack a general purpose as their designs are coupled with task or dataspecific characteristics. To this end, we propose a general and effective ContrastiveGraph Fewshot Learning framework (CGFL). CGFL leverages a selfdistilled contrastive learning procedure to boost GFL. Specifically, our model firstly pretrains a graph encoder with contrastive learning using unlabeled data. Later, the trained encoder is frozen as a teacher model to distill a student model with a contrastive loss. The distilled model is finally fed to GFL. CGFL learns data representation in a selfsupervised manner, thus mitigating the distribution shift impact for better generalization and making model task and dataindependent for a general graph mining purpose. Furthermore, we introduce an informationbased method to quantitatively measure the capability of CGFL. Comprehensive experiments demonstrate that CGFL outperforms stateoftheart baselines on several graph mining tasks in the fewshot scenario. We also provide quantitative measurement of CGFL's success. 
Chunhui Zhang · Hongfu Liu · Jundong Li · Yanfang Ye · Chuxu Zhang 🔗 


ShiftRobust Node Classification via Graph Clustering Cotraining
(
Poster
)
link
It is widely known that machine learning models only achieve suboptimal performance when testing data exhibit distribution shift against training \ie, $\Pr_\text{train}(X,Y) \neq \Pr_\text{test}(X,Y)$. Although Graph Neural Networks (GNNs) have become de facto models for semisupervised learning tasks, they suffer even more from distribution shift because multiple types of shifts origin from not only node features but graph structures. Existing domain adaptation methods only work for specific type of shifts. In response, we propose ShiftRobust Node Classification (SRNC)  a unified domain adaptation framework for different kinds of distribution shifts on graph. Specifically, we cotrain an unsupervised cluster GNN, which captures the data distribution by graph homophily on target graph. Then a shiftrobust classifier is optimized on training graph and pseudo samples from target graph, which are provided by cluster GNN. Compared to the existing domain adaptation algorithms on graph, our approach works for both openset and closeset shifts with convergence guarantees.In our experiments, the classification accuracy is improved at least $3\%$ against the secondbest baseline under openset shifts. On timeevolving graph with closeset shift, existing domain adaption algorithms can barely improve the generalization if not worse. SRNC is still able to mitigate the negative effect ($>2\%$ absolute improvements) of the shift across different testingtimes.

Qi Zhu · Chao Zhang · Chanyoung Park · Carl Yang · Jiawei Han 🔗 


Diving into Unified DataModel Sparsity for ClassImbalanced Graph Representation Learning
(
Poster
)
link
Even pruned by the stateoftheart network compression methods, recent research shows that deep learning model training still suffers from the demand of massive data usage. In particular, Graph Neural Networks (GNNs) training upon such nonEuclidean graph data often encounters relatively higher time costs, due to its irregular and nasty density properties, compared with data in the regular Euclidean space (e.g., image or text). Another natural property concomitantly with graph is classimbalance which cannot be alleviated by the massive graph data while hindering GNNs' generalization. To fully tackle these unpleasant properties, (i) theoretically, we introduce a hypothesis about what extent a subset of the training data can approximate the full dataset's learning effectiveness. The effectiveness is further guaranteed and proved by the gradients' distance between the subset and the full set; (ii) empirically, we discover that during the learning process of a GNN, some samples in the training dataset are informative for providing gradients to update model parameters. Moreover, the informative subset is not fixed during training process. Samples that are informative in the current training epoch may not be so in the next one. We refer to this observation as dynamic data sparsity. We also notice that sparse subnets pruned from a welltrained GNN sometimes forget the information provided by the informative subset, reflected in their poor performances upon the subset. Based on these findings, we develop a unified datamodel dynamic sparsity framework named Graph Decantation (GraphDec) to address challenges brought by training upon a massive classimbalanced graph data. The key idea of GraphDec is to identify the informative subset dynamically during the training process by adopting sparse graph contrastive learning. Extensive experiments on multiple benchmark datasets demonstrate that GraphDec outperforms stateoftheart baselines for classimbalanced graph classification and classimbalanced node classification tasks, with respect to classification accuracy and data usage efficiency. 
Chunhui Zhang · Chao Huang · Yijun Tian · Qianlong Wen · Zhongyu Ouyang · Youhuan Li · Yanfang Ye · Chuxu Zhang 🔗 


Invertible Neural Networks for Graph Prediction
(
Poster
)
link
In this work, we address conditional generation using deep invertible neural networks. This is a type of problem where one aims to infer the most probable inputs $X$ given outcomes $Y$. We call our method \textit{invertible graph neural network} (iGNN) due to the primary focus on generating node features on graph data. A notable feature of our proposed methods is that during network training, we revise the typicallyused loss objective in normalizing flow and consider Wasserstein2 regularization to facilitate the training process. Algorithmicwise, we adopt an endtoend training approach since our objective is to address prediction and generation in the forward and backward processes at once through a single model. Theoretically, we study the expressiveness of iGNN in learning the mapping through utilizing the FokkerPlanck equation of an OrnsteinUhlenbeck process. Experimentally, we verify the performance of iGNN on both simulated and realdata datasets. We demonstrate through extensive numerical experiments that iGNN shows clear improvement over competing conditional generation benchmarks on highdimensional and/or nonconvex data.

Chen Xu · Xiuyuan Cheng · Yao Xie 🔗 


Efficient Automatic Machine Learning via Design Graphs
(
Poster
)
link
Despite the success of automated machine learning (AutoML), which aims to find the best design, including the architecture of deep networks and hyperparameters, conventional AutoML methods are computationally expensive and hardly provide insights into the effect of various model design choices. To tackle the challenges, we propose FALCON, an efficient samplebased method to predict the performance of a model and search for the optimal model design. Our key insight is to model the design space of possible model designs as a design graph, where the nodes represent design choices, and the edges denote design similarities. FALCON features 1) a taskagnostic module, which performs message passing on the design graph via a Graph Neural Network (GNN), and 2) a taskspecific module, which conducts label propagation of the known model performance information on the design graph. Both modules are combined to predict the performances of each design in the design space. We conduct extensive experiments on 27 tasks on graphs, including node and graph classifications in various application domains, and an image classification task on the CIFAR10 dataset. We empirically show that FALCON can efficiently obtain the wellperforming designs for each task using only 30 explored nodes. Specifically, FALCON has a comparable time cost with the oneshot approaches while achieving an average improvement of 3.3% on the graph classification datasets. 
Shirley Wu · Jiaxuan You · Jure Leskovec · Rex Ying 🔗 


Equivariant Graph Hierarchybased Neural Networks
(
Poster
)
link
Equivariant Graph neural Networks (EGNs) are powerful in characterizing the dynamics of multibody physical systems. Existing EGNs conduct flat message passing, which, yet, is unable to capture the spatial/dynamical hierarchy for complex systems particularly, limiting substructure discovery and global information fusion. In this paper, we propose Equivariant Hierarchybased Graph Networks (EGHNs) which consist of the three key components: generalized Equivariant Matrix Message Passing (EMMP), EPool, and EUnPool. In particular, EMMP is able to improve the expressivity of conventional equivariant message passing, EPool assigns the quantities of the lowlevel nodes into highlevel clusters, while EUnPool leverages the highlevel information to update the dynamics of the lowlevel nodes. As their names imply, both EPool and EUnPool are guaranteed to be equivariant to meet physic symmetry. Considerable experimental evaluations verify the effectiveness of our EGHN on several applications including multiobject dynamics simulation, motion capture, and protein dynamics modeling. 
Jiaqi Han · Yu Rong · Tingyang Xu · Wenbing Huang 🔗 


NOSMOG: Learning Noiserobust and Structureaware MLPs on Graphs
(
Poster
)
link
While Graph Neural Networks (GNNs) have demonstrated their efficacy in dealing with nonEuclidean structural data, they are difficult to be deployed in real applications due to the scalability constraint imposed by multihop data dependency. Existing methods attempt to address this scalability issue by training multilayer perceptrons (MLPs) exclusively on node content features using labels derived from trained GNNs. Even though the performance of MLPs can be significantly improved, two issues prevent MLPs from outperforming GNNs and being used in practice: the ignorance of graph structural information and the sensitivity to node feature noises. In this paper, we propose to learn NOiserobust Structureaware MLPs On Graphs (NOSMOG) to overcome the challenges. Specifically, we first complement node content with position features to help MLPs capture graph structural information. We then design a novel representational similarity distillation strategy to inject structural node similarities into MLPs. Finally, we introduce the adversarial feature augmentation to ensure stable learning against feature noises and further improve performance. Extensive experiments demonstrate that NOSMOG outperforms GNNs and the stateoftheart method in both transductive and inductive settings across 7 datasets, while maintaining a competitive inference efficiency. 
Yijun Tian · Chuxu Zhang · Zhichun Guo · Xiangliang Zhang · Nitesh Chawla 🔗 


A New Graph Node Classification Benchmark: Learning Structure from Histology Cell Graphs
(
Poster
)
link
We introduce a new benchmark dataset, Placenta, for node classification in an underexplored domain: predicting microanatomical tissue structures from cell graphs in placenta histology whole slide images. This problem is uniquely challenging for graph learning for a few reasons. Cell graphs are large (>1 million nodes per image), node features are varied (64dimensions of 11 types of cells), class labels are imbalanced (9 classes ranging from 0.21% of the data to 40.0%), and cellular communities cluster into heterogeneously distributed tissues of widely varying sizes (from 11 nodes to 44,671 nodes for a single structure). Here, we release a dataset consisting of two cell graphs from two placenta histology images totalling 2,395,747 nodes, 799,745 of which have ground truth labels. We present inductive benchmark results for 7 scalable models and show how the unique qualities of cell graphs can help drive the development of novel graph neural network architectures. 
Claudia Vanea · Jonathan Campbell · Omri Dodi · Liis Salumäe · Karen Meir · Drorith Hochner · Hagit Hochner · Triin Laisk · Linda Ernst · Cecilia Lindgren · Christoffer Nellaker



GraphFramEx: Towards Systematic Evaluation of Explainability Methods for Graph Neural Networks
(
Poster
)
link
As one of the most popular machine learning models today, graph neural networks (GNNs) have attracted intense interest recently, and so does their explainability. Users are increasingly interested in a better understanding of GNN models and their outcomes. Unfortunately, today's evaluation frameworks for GNN explainability often rely on few inadequate synthetic datasets, leading to conclusions of limited scope due to a lack of complexity in the problem instances. As GNN models are deployed to more missioncritical applications, we are in dire need for a common evaluation protocol of explainability methods of GNNs. In this paper, we propose, to our best knowledge, the first systematic evaluation framework for GNN explainability, considering explainability on three different "user needs". We propose a unique metric that combines the fidelity measures and classifies explanations based on their quality of being sufficient or necessary. We scope ourselves to node classification tasks and compare the most representative techniques in the field of inputlevel explainability for GNNs. For the inadequate but widely used synthetic benchmarks, surprisingly shallow techniques such as personalized PageRank have the best performance for a minimum computation time. But when the graph structure is more complex and nodes have meaningful features, gradientbased methods are the best according to our evaluation criteria. However, none dominates the others on all evaluation dimensions and there is always a tradeoff. We further apply our evaluation protocol in a case study for frauds explanation on eBay transaction graphs to reflect the production environment. 
Kenza Amara · Rex Ying · Zitao Zhang · Zhihao Han · Yinan Shan · Ulrik Brandes · Sebastian Schemm 🔗 


Learning Heterogeneous Interaction Strengths by Trajectory Prediction with Graph Neural Network
(
Poster
)
link
Dynamical systems with interacting agents are universal in nature, commonly modeled by a graph of relationships between their constituents. Recently, various works have been presented to tackle the problem of inferring those relationships from the system trajectories via deep neural networks, but most of the studies assume binary or discrete types of interactions for simplicity. In the real world, the interaction kernels often involve continuous interaction strengths, which cannot be accurately approximated by discrete relations. In this work, we propose the relational attentive inference network (RAIN) to infer continuously weighted interaction graphs without any groundtruth interaction strengths. Our model employs a novel pairwise attention (PA) mechanism to refine the trajectory representations and a graph transformer to extract heterogeneous interaction weights for each pair of agents. We show that our RAIN model with the PA mechanism accurately infers continuous interaction strengths for simulated physical systems in an unsupervised manner. Further, RAIN with PA successfully predicts trajectories from motion capture data with an interpretable interaction graph, demonstrating the virtue of modeling unknown dynamics with continuous weights. 
Seungwoong Ha · Hawoong Jeong 🔗 


A Graph Is More Than Its Nodes: Towards Structured UncertaintyAware Learning on Graphs
(
Poster
)
link
Current graph neural networks (GNNs) that tackle node classification on graphs tend to only focus on nodewise scores and are solely evaluated by nodewise metrics. This limits uncertainty estimation on graphs since nodewise marginals do not fully characterize the joint distribution given the graph structure. In this work, we propose novel edgewise metrics, namely the edgewise expected calibration error (ECE) and the agree/disagree ECEs, which provide criteria for uncertainty estimation on graphs beyond the nodewise setting. Our experiments demonstrate that the proposed edgewise metrics can complement the nodewise results and yield additional insights. Moreover, we show that GNN models which consider the structured prediction problem on graphs tend to have better uncertainty estimations, which illustrates the benefit of going beyond the nodewise setting. 
Hans HaoHsun Hsu · Yuesong Shen · Daniel Cremers 🔗 


Expectation Complete Graph Representations using Graph Homomorphisms
(
Poster
)
link
We propose and study a practical graph embedding that in expectation is able to distinguish all nonisomorphic graphs and can be computed in polynomial time. The embedding is based on Lovász' characterization of graph isomorphism through an infinite dimensional vector of homomorphism counts. Recent work has studied the expressiveness of graph embeddings by comparing their ability to distinguish graphs to that of the WeisfeilerLeman hierarchy. While previous methods have either limited expressiveness or are computationally impractical, we devise efficient samplingbased alternatives that are maximally expressive in expectation. We empirically evaluate our proposed embeddings and show competitive results on several benchmark graph learning tasks. 
Maximilian Thiessen · Pascal Welke · Thomas Gärtner 🔗 


antGLasso: An Efficient Tensor Graphical Lasso Algorithm
(
Poster
)
link
The class of bigraphical lasso algorithms (and, more broadly, 'tensor'graphical lasso algorithms) has been used to estimate dependency structures within matrix and tensor data. However, all current methods to do so take prohibitively long on modestly sized datasets. We present a novel tensorgraphical lasso algorithm that directly estimates the dependency structure, unlike its iterative predecessors. This provides a speedup of multiple orders of magnitude, allowing this class of algorithms to be used on large, realworld datasets. 
Bailey Andrew · David Westhead · Luisa Cutillo 🔗 


Certified Graph Unlearning
(
Poster
)
link
Graphstructured data is ubiquitous in practice and often processed using graph neural networks (GNNs). With the adoption of recent laws ensuring the ``right to be forgotten'', the problem of graph data removal has become of significant importance. To address the problem, we introduce the first known framework for \emph{certified graph unlearning} of GNNs. In contrast to standard machine unlearning, new analytical and heuristic unlearning challenges arise when dealing with complex graph data. First, three different types of unlearning requests need to be considered, including node feature, edge and node unlearning. Second, to establish provable performance guarantees, one needs to address challenges associated with feature mixing during propagation. The underlying analysis is illustrated on the example of simple graph convolutions (SGC) and their generalized PageRank (GPR) extensions, thereby laying the theoretical foundation for certified unlearning of GNNs. Our empirical studies on six benchmark datasets demonstrate excellent performancecomplexity tradeoffs when compared to complete retraining methods and approaches that do not leverage graph information. For example, when unlearning $20\%$ of the nodes on the Cora dataset, our approach suffers only a $0.1\%$ loss in test accuracy while offering a $4$fold speedup compared to complete retraining. Our scheme also outperforms unlearning methods that do not leverage graph information with a $12\%$ increase in test accuracy for comparable time complexity.

Eli Chien · Chao Pan · Olgica Milenkovic 🔗 


New Frontiers in Graph Autoencoders: Joint Community Detection and Link Prediction
(
Poster
)
link
Graph autoencoders (GAE) and variational graph autoencoders (VGAE) emerged as powerful methods for link prediction (LP). Their performances are less impressive on community detection (CD), where they are often outperformed by simpler alternatives such as the Louvain method. It is still unclear to what extent one can improve CD with GAE and VGAE, especially in the absence of node features. It is moreover uncertain whether one could do so while simultaneously preserving good performances on LP in a multitask setting. In this workshop paper, summarizing results from our journal publication, we show that jointly addressing these two tasks with high accuracy is possible. For this purpose, we introduce a communitypreserving message passing scheme, doping our GAE and VGAE encoders by considering both the initial graph and Louvainbased prior communities when computing embedding spaces. Inspired by modularitybased clustering, we further propose novel training and optimization strategies specifically designed for joint LP and CD. We demonstrate the empirical effectiveness of our approach, referred to as ModularityAware GAE and VGAE, on various real world graphs. 
Guillaume SALHA · Johannes Lutzeyer · George Dasoulas · Romain Hennequin · Michalis Vazirgiannis 🔗 


A Simple Hypergraph Kernel Convolution based on Discounted Markov Diffusion Process
(
Poster
)
link
Kernels on discrete structures evaluate pairwise similarities between objects which capture semantics and inherent topology information. Existing kernels on discrete structures are only developed by topology information(such as adjacency matrix of graphs), without considering original attributes of objects. This paper proposes a twophase paradigm to aggregate comprehensive information on discrete structures leading to a Discount Markov Diffusion Learnable Kernel (DMDLK). Specifically, based on the underlying projection of DMDLK, we design a Simple Hypergraph Kernel Convolution (SHKC) for hidden representation of vertices. SHKC can adjust diffusion steps rather than stacking convolution layers to aggregate information from longrange neighborhoods which prevents oversmoothing issues of existing hypergraph convolutions. Moreover, we utilize the uniform stability bound theorem in transductive learning to analyze critical factors for the effectiveness and generalization ability of SHKC from a theoretical perspective. The experimental results on several benchmark datasets for node classification tasks verified the superior performance of SHKC over stateoftheart methods. 
Fuyang Li · Jiying Zhang · Xi Xiao · bin zhang · Dijun Luo 🔗 


GraphCG: Unsupervised Discovery of Steerable Factors in Graphs
(
Poster
)
link
Deep generative models have been extensively explored recently, especially for the graph data such as molecular graphs and point clouds. Yet, much less investigation has been carried out on understanding the learned latent space of deep graph generative models. Such understandings can open up a unified perspective and provide guidelines for essential tasks like controllable generation. In this paper, we first examine the representation space of the recent deep generative model trained for graph data, observing that the learned representation space is not perfectly disentangled. Based on this observation, we then propose an unsupervised method called GraphCG, which is modelagnostic and taskagnostic for discovering steerable factors in graph data. Specifically, GraphCG learns the semanticrich directions via maximizing the corresponding mutual information, where the edited graph along the same direction will possess certain steerable factors. We conduct experiments on two types of graph data, molecular graphs and point clouds. Both the quantitative and qualitative results show the effectiveness of GraphCG for discovering steerable factors. The code will be public in the near future. 
Shengchao Liu · Chengpeng Wang · Weili Nie · Hanchen Wang · Jiarui Lu · Bolei Zhou · Jian Tang 🔗 


Differentially Private Graph Learning via SensitivityBounded Personalized PageRank
(
Poster
)
link
Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding. However, while data privacy is one of the most important recent concerns, existing PPR algorithms are not designed to protect user privacy. PPR is highly sensitive to the input graph edges: the difference of only one edge may cause a big change in the PPR vector, potentially leaking private user data.In this work, we propose an algorithm which outputs an approximate PPR and has provably bounded sensitivity to input edges. In addition, we prove that our algorithm achieves similar accuracy to nonprivate algorithms when the input graph has large degrees. Our sensitivitybounded PPR directly implies private algorithms for several tools of graph learning, such as, differentially private (DP) PPR ranking, DP node classification, and DP node embedding. To complement our theoretical analysis, we also empirically verify the practical performances of our algorithms. 
Alessandro Epasto · Vahab Mirrokni · Bryan Perozzi · Anton Tsitsulin · Peilin Zhong 🔗 


A deep learning approach to recover conditional independence graphs
(
Poster
)
link
Probabilistic Graphical Models are generative models of complex systems. They rely on conditional independence assumptions between variables to learn sparse representations which can be visualized in a form of a graph. Such models are used for domain exploration and structure discovery in poorly understood domains. This work introduces a novel technique to perform sparse graph recovery by optimizing deep unrolled networks. Assuming that the input data $X\in\mathbb{R}^{M\times D}$ comes from an underlying multivariate Gaussian distribution, we apply a deep model on $X$ that outputs the precision matrix $\Theta$. Then, the partial correlation matrix `$\rho$' is calculated which can also be interpreted as the conditional independence graph. Our model, \texttt{uGLAD}, builds upon and extends the stateoftheart model \texttt{GLAD} to the unsupervised setting. The key benefits of our model are (1) \texttt{uGLAD} automatically optimizes sparsityrelated regularization parameters leading to better performance than existing algorithms. (2) We introduce multitask learning based `consensus' strategy for robust handling of missing data in an unsupervised setting. We evaluate performance on synthetic Gaussian, nonGaussian data generated from Gene Regulatory Networks, and present a case study in anaerobic digestion.

Harsh Shrivastava · Urszula Chajewska · Robin Abraham · Xinshi Chen 🔗 


On the Unreasonable Effectiveness of Feature Propagation in Learning on Graphs with Missing Node Features
(
Poster
)
link
While Graph Neural Networks (GNNs) have recently become the de facto standard for modeling relational data, they impose a strong assumption on the availability of the node or edge features of the graph. In many realworld applications, however, features are only partially available; for example, in social networks, age and gender are available only for a small subset of users. We present a general approach for handling missing features in graph machine learning applications that is based on minimization of the Dirichlet energy and leads to a diffusiontype differential equation on the graph. The discretization of this equation produces a simple, fast and scalable algorithm which we call Feature Propagation. We experimentally show that the proposed approach outperforms previous methods on seven common nodeclassification benchmarks and can withstand surprisingly high rates of missing features: on average we observe only around 4% relative accuracy drop when 99% of the features are missing. Moreover, it takes only 10 seconds to run on a graph with ~2.5M nodes and ~23M edges on a single GPU. 
Emanuele Rossi · Henry Kenlay · Maria Gorinova · Benjamin Chamberlain · Xiaowen Dong · Michael Bronstein 🔗 


Empowering Language Models with Knowledge Graph Reasoning for Question Answering
(
Poster
)
link
Answering opendomain questions requires world knowledge about incontext entities. As pretrained Language Models (LMs) lack the power to store required knowledge, external knowledge sources, such as knowledge graphs, are often used to augment LMs. In this work, we propose knOwledge REasOning empowered Language Model (OREOLM), which consists of a novel Knowledge Interaction Layer that can be flexibly plugged into existing Transformerbased LMs to interact with a differentiable Knowledge Graph Reasoning module collaboratively. In this way, LM guides KG to walk towards the desired answer, while the retrieved knowledge improves LM.By adopting OREOLM to RoBERTa and T5, we show significant performance gain, achieving stateofart results in the ClosedBook setting. The performance enhancement is mainly from the KG reasoning's capacity to infer missing relational facts. In addition, OREOLM provides reasoning paths as rationales to interpret the model's decision. 
Ziniu Hu · Yichong Xu · Wenhao Yu · Shuohang Wang · Ziyi Yang · Chenguang Zhu · KaiWei Chang · Yizhou Sun 🔗 


COIN: CoCluster Infomax for Bipartite Graphs
(
Poster
)
link
Graph selfsupervised learning has attracted plenty of attention in recent years. However, most existing methods are designed for homogeneous graphs yet not tailored for bipartite graphs, and their objectives could induce clusterlevel errors since they only consider instancewise topological information. In this paper, we introduce a novel cocluster infomax (COIN) framework to capture the clusterlevel information by maximizing the mutual information of coclusters. Different from previous infomax methods which estimate mutual information, COIN directly calculates mutual information. Besides, COIN is an endtoend method which can be trained jointly with other objectives. Furthermore, we theoretically prove that COIN could effectively increase the mutual information of node embeddings and it is upperbounded by the prior distributions of nodes. Experimental results show that COIN outperforms stateoftheart methods on various downstream tasks. 
Baoyu Jing · Yuchen Yan · Yada Zhu · Hanghang Tong 🔗 