Timezone: »
Machine learning could enable an unprecedented level of control in protein engineering for therapeutic and industrial applications. Critical to its use in designing proteins with desired properties, machine learning models must capture the protein sequence-function relationship, often termed fitness landscape. Existing benchmarks like CASP or CAFA assess structure and function predictions of proteins, respectively, yet they do not target metrics relevant for protein engineering. In this work, we introduce Fitness Landscape Inference for Proteins (FLIP), a benchmark for function prediction to encourage rapid scoring of representation learning for protein engineering. Our curated splits, baselines, and metrics probe model generalization in settings relevant for protein engineering, e.g. low-resource and extrapolative. Currently, FLIP encompasses experimental data across adeno-associated virus stability for gene therapy, protein domain B1 stability and immunoglobulin binding, and thermostability from multiple protein families. In order to enable ease of use and future expansion to new splits, all data are presented in a standard format. FLIP scripts and data are freely accessible at https://benchmark.protein.properties.
Author Information
Christian Dallago (Technical University of Munich)
Jody Mou (Massachusetts Institute of Technology)
Kadina Johnston (California Institute of Technology)
Bruce Wittmann
Nicholas Bhattacharya (UC Berkeley)
Samuel Goldman (MIT)
MIT PhD Student in Computational and Systems Biology
Ali Madani (Salesforce Research)
Kevin Yang (Microsoft)
More from the Same Authors
-
2021 : End-to-end learning of multiple sequence alignmentswith differentiable Smith-Waterman »
Samantha Petti · Nicholas Bhattacharya · Roshan Rao · Justas Dauparas · Neil Thomas · Juannan Zhou · Alexander Rush · Peter Koo · Sergey Ovchinnikov -
2021 : 3D Pre-training improves GNNs for Molecular Property Prediction »
Hannes Stärk · Dominique Beaini · Gabriele Corso · Prudencio Tossou · Christian Dallago · Stephan Günnemann · Pietro Lió -
2021 : 3D Pre-training improves GNNs for Molecular Property Prediction »
Hannes Stärk · Gabriele Corso · Christian Dallago · Stephan Günnemann · Pietro Lió -
2022 : Masked inverse folding with sequence transfer for protein representation learning »
Kevin Yang · Niccoló Zanichelli · Hugh Yeh -
2022 : Protein structure generation via folding diffusion »
Kevin Wu · Kevin Yang · Rianne van den Berg · James Zou · Alex X Lu · Ava Soleimany -
2022 : Pretrained protein language model transfer learning: is the final layer representation what we want? »
Francesca-Zhoufan Li · Ava Soleimany · Kevin Yang · Alex X Lu -
2022 : Learning from physics-based features improves protein property prediction »
Amy Wang · Ava Soleimany · Alex X Lu · Kevin Yang -
2022 : Ligand-aware protein sequence design using protein self contacts »
Jody Mou · Benjamin Fry · Chun-Chen Yao · Nicholas Polizzi -
2022 : Designing active and thermostable enzymes with sequence-only predictive models »
Clara Fannjiang · Micah Olivas · Eric Greene · Craig Markin · Bram Wallace · Ben Krause · Margaux Pinney · James Fraser · Polly Fordyce · Ali Madani · Nikhil Naik -
2022 : Standards, tooling and benchmarks to probe representation learning on proteins »
Joaquin Gomez Sanchez · Sebastian Franz · Michael Heinzinger · Burkhard Rost · Christian Dallago -
2021 : End-to-end learning of multiple sequence alignmentswith differentiable Smith-Waterman »
Samantha Petti · Nicholas Bhattacharya · Roshan Rao · Justas Dauparas · Neil Thomas · Juannan Zhou · Alexander Rush · Peter Koo · Sergey Ovchinnikov -
2021 Poster: Deep Extrapolation for Attribute-Enhanced Generation »
Alvin Chan · Ali Madani · Ben Krause · Nikhil Naik -
2020 : Profile Prediction: An Alignment-Based Pre-Training Task for Protein Sequence Models »
Jesse Vig · Ali Madani -
2020 : Contributed Talk - ProGen: Language Modeling for Protein Generation »
Ali Madani · Bryan McCann · Nikhil Naik · · Possu Huang · Richard Socher -
2020 Social: Computational Biology Speed Networking »
Kevin Yang -
2019 Poster: Evaluating Protein Transfer Learning with TAPE »
Roshan Rao · Nicholas Bhattacharya · Neil Thomas · Yan Duan · Peter Chen · John Canny · Pieter Abbeel · Yun Song -
2019 Spotlight: Evaluating Protein Transfer Learning with TAPE »
Roshan Rao · Nicholas Bhattacharya · Neil Thomas · Yan Duan · Peter Chen · John Canny · Pieter Abbeel · Yun Song