NeurIPS Low-Width Approximations and Sparsification for Scaling Graph Transformers

Poster
in
Workshop: New Frontiers in Graph Learning (GLFrontiers)

Low-Width Approximations and Sparsification for Scaling Graph Transformers

Hamed Shirzad · Balaji Venkatachalam · Ameya Velingker · Danica J. Sutherland · David Woodruff

Keywords: [ self-attention ] [ sparsification ] [ Sparse Transformers ] [ Geometric Deep Learning ] [ graph transformers ] [ graphs ] [ GNN ]

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract: Graph Transformers have shown excellent results on a diverse set of datasets. However, memory limitations prohibit these models from scaling to larger graphs. With standard single-GPU setups, even training on medium-sized graphs is impossible for most Graph Transformers. While the $\mathcal{O}(n^2d^2)$ complexity of each layer can be reduced to $\mathcal{O}((n+m)d^2)$ using sparse attention models such as Exphormer for graphs with $n$ nodes and $m$ edges, these models are still infeasible to train on training on small-memory devices even for medium-sized datasets. Here, we propose to sparsify the Exphormer model even further,by using a small ``pilot'' network to estimate attention scores along the graph edges, then training a larger model only using $\mathcal O(n)$ edges deemed important by the small network. We show empirically that attention scores from smaller networks provide a good estimate of the attention scores in larger networks, and that this process can yield a large-width sparse model nearly as good as the large-width non-sparse model.

Chat is not available.

Poster in Workshop: New Frontiers in Graph Learning (GLFrontiers)

Low-Width Approximations and Sparsification for Scaling Graph Transformers

Hamed Shirzad · Balaji Venkatachalam · Ameya Velingker · Danica J. Sutherland · David Woodruff

Poster
in
Workshop: New Frontiers in Graph Learning (GLFrontiers)