Skip to yearly menu bar Skip to main content


Poster
in
Workshop: New Frontiers of AI for Drug Discovery and Development

TopoPool: An Adaptive Graph Pooling Layer for Extracting Molecular and Protein Substructures

Mattson Thieme · Majdi Hassan · Chetan Rupakheti · Kedar Thiagarajan · Abhishek Pandey · Han Liu

Keywords: [ molecular representation learning ] [ protein representation learning ] [ Graph Pooling ]


Abstract:

Within molecules and proteins, discrete substructures affect high level properties and behavior in distinct ways. As such, explicitly locating and accounting for these substructures is a central problem when learning molecular or protein representations. Typically represented as graphs, this task falls under the umbrella of graph pooling, or segmentation. Given the highly variable size, number, and topology of these substructures, an ideal pooling algorithm would would adapt on a graph-by-graph basis and use local context to locate optimal pools. However, this poses a challenge where differentiability is concerned, and each of the learnable graph pooling methods proposed to date must make strong a priori assumptions in regards to the number or size of the learned pools. As such, demand remains for a graph pooling algorithm that can maintain differentiability while retaining adaptability in the size and number of learned pools. To meet this demand, we introduce the Topographical Pooling Layer (TopoPool): a differentiable, hierarchical graph pooling layer that learns an arbitrary number of varying sized pools without making any a priori assumptions about their number or size. Additionally, it naturally uncovers only connected substructures, increasing the interpretability of the learned pools and obviating the need for exogenous regularizers to enforce connectedness. We evaluate TopoPool on diverse molecular and protein property prediction tasks, where we achieve competitive performance against existing methods. Taken together, TopoPool represents a novel addition to the graph pooling toolbox, and is particularly relevant to areas like drug design where locating and optimizing discrete, connected molecular substructures is of central importance.

Chat is not available.