Timezone: »

Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks
Arian Jamasb · Ramon Viñas Torné · Eric Ma · Yuanqi Du · Charles Harris · Kexin Huang · Dominic Hall · Pietro Lió · Tom Blundell

Tue Nov 29 02:00 PM -- 04:00 PM (PST) @ Hall J #738

Geometric deep learning has broad applications in biology, a domain where relational structure in data is often intrinsic to modelling the underlying phenomena. Currently, efforts in both geometric deep learning and, more broadly, deep learning applied to biomolecular tasks have been hampered by a scarcity of appropriate datasets accessible to domain specialists and machine learning researchers alike. To address this, we introduce Graphein as a turn-key tool for transforming raw data from widely-used bioinformatics databases into machine learning-ready datasets in a high-throughput and flexible manner. Graphein is a Python library for constructing graph and surface-mesh representations of biomolecular structures, such as proteins, nucleic acids and small molecules, and biological interaction networks for computational analysis and machine learning. Graphein provides utilities for data retrieval from widely-used bioinformatics databases for structural data, including the Protein Data Bank, the AlphaFold Structure Database, chemical data from ZINC and ChEMBL, and for biomolecular interaction networks from STRINGdb, BioGrid, TRRUST and RegNetwork. The library interfaces with popular geometric deep learning libraries: DGL, Jraph, PyTorch Geometric and PyTorch3D though remains framework agnostic as it is built on top of the PyData ecosystem to enable inter-operability with scientific computing tools and libraries. Graphein is designed to be highly flexible, allowing the user to specify each step of the data preparation, scalable to facilitate working with large protein complexes and interaction graphs, and contains useful pre-processing tools for preparing experimental files. Graphein facilitates network-based, graph-theoretic and topological analyses of structural and interaction datasets in a high-throughput manner. We envision that Graphein will facilitate developments in computational biology, graph representation learning and drug discovery. Availability and implementation: Graphein is written in Python. Source code, example usage and tutorials, datasets, and documentation are made freely available under the MIT License at the following URL: https://anonymous.4open.science/r/graphein-3472/README.md

Author Information

Arian Jamasb (University of Cambridge)
Ramon Viñas Torné (University of Cambridge)
Eric Ma (PyMC Labs)
Yuanqi Du (Cornell University)
Charles Harris (University of Cambridge)
Kexin Huang (Stanford University)
Dominic Hall (University of Cambridge)
Pietro Lió (University of Cambridge)
Tom Blundell (University of Cambridge)
Tom Blundell

Professor Sir Tom Blundell, FRS, FMedSci, is a Biochemistry Director of Research in Cambridge. He worked with Dorothy Hodgkin in Oxford in the 1960s on structure of insulin and then in Sussex on glucagon in the 1970s Recently he focusses on DNA repair, defining structures of multicomponent the 4100 amino acid DNA-PKcs complexes using cryo-EM. Over the past 30 years he has produced software for homology modelling, called Modeller cited 13,000 times, and for predicting impacts of mutations in cancer and drug resistance using AI/ML methods, contributing to ~700 research papers. In 1970s Tom developed structure-guided drug discovery, and in 1999 pioneered fragment-based drug discovery, co-founding Astex, with two oncology drugs on the market. In academia he develops antibiotics targeting mycobacteria in leprosy and cystic fibrosis. In 1970 as a councillor and Chair of Oxford City Planning, he stopped a motorway planned to go through the city centre and instead pedestrianized the area. He chaired UK Royal Commission on Environment, 1998 to 2005.

More from the Same Authors