Generalizable Insights for Graph Transformers in Theory and Practice
Abstract
Graph transformers (GTs) have demonstrated strong empirical performance; however,current architectures exhibit significant variations in their utilization of attentionmechanisms, positional embeddings (PEs), and expressivity. Existing expressivityresults are often tied to specific design choices and lack comprehensive empiricalvalidation on large-scale data. This leaves a gap between theory and practice, preventingthe generation of generalizable insights that extend beyond particular applicationdomains. Here, we propose the Generalized-Distance Transformer (GDT). This GTarchitecture incorporates many advancements for GTs from recent years, and develops afine-grained understanding of the GDT’s representation power in terms of attention andPEs. Through extensive experiments, we identify design choices that consistentlyperform well across various applications, tasks, and model scales, demonstrating strongperformance in a few-shot transfer setting without the need for fine-tuning. We distillour theoretical and practical findings into several generalizable insights about effectiveGT design, training, and inference.