Timezone: »
Geometric deep learning has broad applications in biology, a domain where relational structure in data is often intrinsic to modelling the underlying phenomena. Currently, efforts in both geometric deep learning and, more broadly, deep learning applied to biomolecular tasks have been hampered by a scarcity of appropriate datasets accessible to domain specialists and machine learning researchers alike. To address this, we introduce Graphein as a turn-key tool for transforming raw data from widely-used bioinformatics databases into machine learning-ready datasets in a high-throughput and flexible manner. Graphein is a Python library for constructing graph and surface-mesh representations of biomolecular structures, such as proteins, nucleic acids and small molecules, and biological interaction networks for computational analysis and machine learning. Graphein provides utilities for data retrieval from widely-used bioinformatics databases for structural data, including the Protein Data Bank, the AlphaFold Structure Database, chemical data from ZINC and ChEMBL, and for biomolecular interaction networks from STRINGdb, BioGrid, TRRUST and RegNetwork. The library interfaces with popular geometric deep learning libraries: DGL, Jraph, PyTorch Geometric and PyTorch3D though remains framework agnostic as it is built on top of the PyData ecosystem to enable inter-operability with scientific computing tools and libraries. Graphein is designed to be highly flexible, allowing the user to specify each step of the data preparation, scalable to facilitate working with large protein complexes and interaction graphs, and contains useful pre-processing tools for preparing experimental files. Graphein facilitates network-based, graph-theoretic and topological analyses of structural and interaction datasets in a high-throughput manner. We envision that Graphein will facilitate developments in computational biology, graph representation learning and drug discovery. Availability and implementation: Graphein is written in Python. Source code, example usage and tutorials, datasets, and documentation are made freely available under the MIT License at the following URL: https://anonymous.4open.science/r/graphein-3472/README.md
Author Information
Arian Jamasb (University of Cambridge)
Ramon Viñas Torné (University of Cambridge)
Eric Ma (PyMC Labs)
Yuanqi Du (Cornell University)
Charles Harris (University of Cambridge)
Kexin Huang (Stanford University)
Dominic Hall (University of Cambridge)
Pietro Lió (University of Cambridge)
Tom Blundell (University of Cambridge)
Professor Sir Tom Blundell, FRS, FMedSci, is a Biochemistry Director of Research in Cambridge. He worked with Dorothy Hodgkin in Oxford in the 1960s on structure of insulin and then in Sussex on glucagon in the 1970s Recently he focusses on DNA repair, defining structures of multicomponent the 4100 amino acid DNA-PKcs complexes using cryo-EM. Over the past 30 years he has produced software for homology modelling, called Modeller cited 13,000 times, and for predicting impacts of mutations in cancer and drug resistance using AI/ML methods, contributing to ~700 research papers. In 1970s Tom developed structure-guided drug discovery, and in 1999 pioneered fragment-based drug discovery, co-founding Astex, with two oncology drugs on the market. In academia he develops antibiotics targeting mycobacteria in leprosy and cystic fibrosis. In 1970 as a councillor and Chair of Oxford City Planning, he stopped a motorway planned to go through the city centre and instead pedestrianized the area. He chaired UK Royal Commission on Environment, 1998 to 2005.
More from the Same Authors
-
2021 : Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development »
Kexin Huang · Tianfan Fu · Wenhao Gao · Yue Zhao · Yusuf Roohani · Jure Leskovec · Connor Coley · Cao Xiao · Jimeng Sun · Marinka Zitnik -
2021 : GraphGT: Machine Learning Datasets for Graph Generation and Transformation »
Yuanqi Du · Shiyu Wang · Xiaojie Guo · Hengning Cao · Shujie Hu · Junji Jiang · Aishwarya Varala · Abhinav Angirekula · Liang Zhao -
2021 : Interpretable Data Analysis for Bench-to-Bedside Research »
Zohreh Shams · Botty Dimanov · Nikola Simidjievski · Helena Andres-Terre · Paul Scherer · Urška Matjašec · Mateja Jamnik · Pietro Lió -
2021 : Learning Disentangled Representation for Spatiotemporal Graph Generation »
Yuanqi Du · Xiaojie Guo · Hengning Cao · Yanfang (Fa Ye · Liang Zhao -
2021 : Structure-aware generation of drug-like molecules »
Pavol Drotar · Arian Jamasb · Ben Day · Catalina Cangea · Pietro Lió -
2021 : 3D Pre-training improves GNNs for Molecular Property Prediction »
Hannes Stärk · Dominique Beaini · Gabriele Corso · Prudencio Tossou · Christian Dallago · Stephan Günnemann · Pietro Lió -
2021 : GraphGT: Machine Learning Datasets for Graph Generation and Transformation »
Yuanqi Du · Shiyu Wang · Xiaojie Guo · Hengning Cao · Shujie Hu · Junji Jiang · Aishwarya Varala · Abhinav Angirekula · Liang Zhao -
2021 : Physics-Augmented Learning: A New Paradigm Beyond Physics-Informed Learning »
Ziming Liu · Yuanqi Du · Yunyue Chen · Max Tegmark -
2021 : Adaptive Pseudo-labeling for Quantum Calculations »
Kexin Huang · Vishnu Sresht · Brajesh Rai -
2021 : 3D Pre-training improves GNNs for Molecular Property Prediction »
Hannes Stärk · Gabriele Corso · Christian Dallago · Stephan Günnemann · Pietro Lió -
2021 : Approximate Latent Force Model Inference »
Jacob Moss · Felix Opolka · Pietro Lió -
2021 : Learning Disentangled Representation for Spatiotemporal Graph Generation »
Yuanqi Du · Xiaojie Guo · Hengning Cao · Yanfang (Fa Ye · Liang Zhao -
2022 Poster: Audio-Driven Co-Speech Gesture Video Generation »
Xian Liu · Qianyi Wu · Hang Zhou · Yuanqi Du · Wayne Wu · Dahua Lin · Ziwei Liu -
2022 : Learning Feynman Diagrams using Graph Neural Networks »
Alexander Norcliffe · Harrison Mitchell · Pietro Lió -
2022 : GAUCHE: A Library for Gaussian Processes in Chemistry »
Ryan-Rhys Griffiths · Leo Klarner · Henry Moss · Aditya Ravuri · Sang Truong · Bojana Rankovic · Yuanqi Du · Arian Jamasb · Julius Schwartz · Austin Tripp · Gregory Kell · Anthony Bourached · Alex Chan · Jacob Moss · Chengzhi Guo · Alpha Lee · Philippe Schwaller · Jian Tang -
2022 : PIPS: Path Integral Stochastic Optimal Control for Path Sampling in Molecular Dynamics »
Lars Holdijk · Yuanqi Du · Ferry Hooft · Priyank Jaini · Berend Ensing · Max Welling -
2022 : A physics-informed search for metric solutions to Ricci flow, their embeddings, and visualisation »
Aarjav Jain · Challenger Mishra · Pietro Lió -
2022 : Structural Causal Model for Molecular Dynamics Simulation »
Qi Liu · Yuanqi Du · Fan Feng · Qiwei Ye · Jie Fu -
2022 : Tabular deep learning when $d \gg n$ by using an auxiliary knowledge graph »
Camilo Ruiz · Hongyu Ren · Kexin Huang · Jure Leskovec -
2022 : Improving Classification and Data Imputation for Single-Cell Transcriptomics with Graph Neural Networks »
Han-Bo Li · Ramon Viñas Torné · Pietro Lió -
2022 : Xtal2DoS: Attention-based Crystal to Sequence Learning for Density of States Prediction »
Junwen Bai · Yuanqi Du · Yingheng Wang · Shufeng Kong · John Gregoire · Carla Gomes -
2022 : ChemSpacE: Interpretable and Interactive Chemical Space Exploration »
Yuanqi Du · Xian Liu · Nilay Shah · Shengchao Liu · Jieyu Zhang · Bolei Zhou -
2022 : Structure-based Drug Design with Equivariant Diffusion Models »
Arne Schneuing · Yuanqi Du · Charles Harris · Arian Jamasb · Ilia Igashov · weitao Du · Tom Blundell · Pietro Lió · Carla Gomes · Max Welling · Michael Bronstein · Bruno Correia -
2022 : A Federated Learning benchmark for Drug-Target Interaction »
Filip Svoboda · Gianluca Mittone · Nicholas Lane · Pietro Lió -
2022 : Improving Molecular Pretraining with Complementary Featurizations »
Yanqiao Zhu · Dingshuo Chen · Yuanqi Du · Yingze Wang · Qiang Liu · Shu Wu -
2022 : Benchmarking Graph Neural Network-based Imputation Methods on Single-Cell Transcriptomics Data »
Han-Bo Li · Ramon Viñas Torné · Pietro Lió -
2022 : Sheaf Attention Networks »
Federico Barbero · Cristian Bodnar · Haitz Sáez de Ocáriz Borde · Pietro Lió -
2022 : Human Interventions in Concept Graph Networks »
Lucie Charlotte Magister · Pietro Barbiero · Dmitry Kazhdan · Federico Siciliano · Gabriele Ciravegna · Fabrizio Silvestri · Mateja Jamnik · Pietro Lió -
2022 Spotlight: Lightning Talks 4B-4 »
Ziyue Jiang · Zeeshan Khan · Yuxiang Yang · Chenze Shao · Yichong Leng · Zehao Yu · Wenguan Wang · Xian Liu · Zehua Chen · Yang Feng · Qianyi Wu · James Liang · C.V. Jawahar · Junjie Yang · Zhe Su · Songyou Peng · Yufei Xu · Junliang Guo · Michael Niemeyer · Hang Zhou · Zhou Zhao · Makarand Tapaswi · Dongfang Liu · Qian Yang · Torsten Sattler · Yuanqi Du · Haohe Liu · Jing Zhang · Andreas Geiger · Yi Ren · Long Lan · Jiawei Chen · Wayne Wu · Dahua Lin · Dacheng Tao · Xu Tan · Jinglin Liu · Ziwei Liu · 振辉 叶 · Danilo Mandic · Lei He · Xiangyang Li · Tao Qin · sheng zhao · Tie-Yan Liu -
2022 Spotlight: Audio-Driven Co-Speech Gesture Video Generation »
Xian Liu · Qianyi Wu · Hang Zhou · Yuanqi Du · Wayne Wu · Dahua Lin · Ziwei Liu -
2022 : Sheaf Attention Networks »
Federico Barbero · Cristian Bodnar · Haitz Sáez de Ocáriz Borde · Pietro Lió -
2022 : Dynamic outcomes-based clustering of disease progression in mechanically ventilated patients »
Emma Rocheteau · Ioana Bica · Pietro Lió · Ari Ercole -
2022 Workshop: AI for Science: Progress and Promises »
Yi Ding · Yuanqi Du · Tianfan Fu · Hanchen Wang · Anima Anandkumar · Yoshua Bengio · Anthony Gitter · Carla Gomes · Aviv Regev · Max Welling · Marinka Zitnik -
2022 Poster: Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off »
Mateo Espinosa Zarlenga · Pietro Barbiero · Gabriele Ciravegna · Giuseppe Marra · Francesco Giannini · Michelangelo Diligenti · Zohreh Shams · Frederic Precioso · Stefano Melacci · Adrian Weller · Pietro Lió · Mateja Jamnik -
2022 Poster: Neural Sheaf Diffusion: A Topological Perspective on Heterophily and Oversmoothing in GNNs »
Cristian Bodnar · Francesco Di Giovanni · Benjamin Chamberlain · Pietro Lió · Michael Bronstein -
2022 Poster: Multi-objective Deep Data Generation with Correlated Property Control »
Shiyu Wang · Xiaojie Guo · Xuanyang Lin · Bo Pan · Yuanqi Du · Yinkai Wang · Yanfang Ye · Ashley Petersen · Austin Leitgeb · Saleh Alkhalifa · Kevin Minbiole · William M. Wuest · Amarda Shehu · Liang Zhao -
2022 Poster: Composite Feature Selection Using Deep Ensembles »
Fergus Imrie · Alexander Norcliffe · Pietro Lió · Mihaela van der Schaar -
2022 Poster: SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks »
Davide Buffelli · Pietro Lió · Fabio Vandin -
2021 : Neural ODE Processes: A Short Summary »
Alexander Norcliffe · Cristian Bodnar · Ben Day · Jacob Moss · Pietro Lió -
2021 : On Second Order Behaviour in Augmented Neural ODEs: A Short Summary »
Alexander Norcliffe · Cristian Bodnar · Ben Day · Nikola Simidjievski · Pietro Lió -
2021 : Structure-aware generation of drug-like molecules »
Pavol Drotar · Arian Jamasb · Ben Day · Catalina Cangea · Pietro Lió -
2021 Workshop: AI for Science: Mind the Gaps »
Payal Chandak · Yuanqi Du · Tianfan Fu · Wenhao Gao · Kexin Huang · Shengchao Liu · Ziming Liu · Gabriel Spadon · Max Tegmark · Hanchen Wang · Adrian Weller · Max Welling · Marinka Zitnik -
2020 Poster: Constraining Variational Inference with Geometric Jensen-Shannon Divergence »
Jacob Deasy · Nikola Simidjievski · Pietro Lió -
2020 Poster: On Second Order Behaviour in Augmented Neural ODEs »
Alexander Norcliffe · Cristian Bodnar · Ben Day · Nikola Simidjievski · Pietro Lió