Timezone: »
Successful approaches that model the fitness landscape of protein sequences have typically relied on family-specific sets of homologous sequences called multiple-sequence alignments (Hopf et al. 2017; Riesselman et al. 2018; Frazer et al. 2021). They are however limited by the fact many proteins are difficult to align or have shallow alignments. Newer models such as transformers that do not rely on alignments have been promising (Madani et al. 2020; Rives et al. 2021; Notin et al. 2022; Hesselow et al. 2022) to progressively bridge the gap with their alignment-based counterparts. In this work, we introduce TranceptEVE -- a hybrid between family-specific and family-agnostic models that seeks to build on the relative strengths from each approach to achieve state-of-the-art performance on the fitness prediction task. We demonstrate that it outperforms all other baselines on the recently released ProteinGym benchmarks (Notin et al. 2022) -- a curated set of 94 deep mutational scanning assays to assess the effects of substitution and indel mutations. We also quantify its ability to predict the pathogenicity of genetic mutations in humans based on annotations from ClinVar.
Author Information
Pascal Notin (Department of Computer Science, University of Oxford)
Lodevicus van Niekerk (University of Oxford)
Research Assistant at Marks Lab (Harvard Medical School) and OATML (Oxford).
Aaron Kollasch (Harvard University)
Daniel Ritter (Harvard Medical School)
Yarin Gal (University of OXford)
Debora Marks (Harvard University)
Debora is a mathematician and computational biologist with a track record of using novel algorithms and statistics to successfully address unsolved biological problems. She has a passion for interpreting genetic variation in a way that impacts biomedical applications. During her PhD, she quantified the pan-genomic scope of microRNA targeting - the combinatorial regulation of protein expression and co-discovered the first microRNA in a virus. As a postdoc she made a breakthrough in the classic, unsolved problem of ab initio 3D structure prediction of proteins using undirected graphical probability models for evolutionary sequences. She has developed this approach to determine functional interactions, biomolecular structures, including the 3D structure of RNA and RNA-protein complexes and the conformational ensembles of apparently disordered proteins. Her new lab at Harvard is interested in developing methods in deep learning to address a wide range of biological challenges including designing drug affinity libraries for large numbers of human genes, predicting epistasis in antibiotic resistance, the effects of genetic variation on human disease etiology and drug response and sequence design for biosynthetic applications.
More from the Same Authors
-
2020 : Paper 40: Real2sim: Automatic Generation of Open Street Map Towns For Autonomous Driving Benchmarks »
Panagiotis Tigas · Yarin Gal -
2022 : Discovering Long-period Exoplanets using Deep Learning with Citizen Science Labels »
Shreshth A Malik · Nora Eisner · Chris Lintott · Yarin Gal -
2022 : How can we use natural evolution and genetic experiments to design protein functions? »
Ada Shaw · June Shin · Debora Marks -
2022 : Kernelized Stein Discrepancies for Biological Sequences »
Alan Amin · Eli Weinstein · Debora Marks -
2022 : scPerturb: Information Resource for Harmonized Single-Cell Perturbation Data »
Tessa Green · Stefan Peidli · Ciyue Shen · Torsten Gross · Joseph Min · Samuele Garda · Jake Taylor-King · Debora Marks · Augustin Luna · Nils Blüthgen · Chris Sander -
2022 : Designing and Evolving Neuron-Specific Proteases »
Han Spinner · Colin Hemez · Julia McCreary · David Liu · Debora Marks -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : What 'Out-of-distribution' Is and Is Not »
Sebastian Farquhar · Yarin Gal -
2022 : Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation »
Lorenz Kuhn · Yarin Gal · Sebastian Farquhar -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2023 Poster: ProteinNPT: Improving protein property prediction and design with non-parametric transformers »
Pascal Notin · Ruben Weitzman · Debora Marks · Yarin Gal -
2023 Poster: ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design »
Pascal Notin · Aaron Kollasch · Daniel Ritter · Lodevicus van Niekerk · Nathan Rollins · Steffanie Paul · Ada Shaw · Ruben Weitzman · Jonathan Frazer · Mafalda Dias · Dinko Franceschi · Rose Orenbuch · Han Spinner · Yarin Gal · Debora Marks -
2022 Workshop: Learning Meaningful Representations of Life »
Elizabeth Wood · Adji Bousso Dieng · Aleksandrina Goeva · Alex X Lu · Anshul Kundaje · Chang Liu · Debora Marks · Ed Boyden · Eli N Weinstein · Lorin Crawford · Mor Nitzan · Rebecca Boiarsky · Romain Lopez · Tamara Broderick · Ray Jones · Wouter Boomsma · Yixin Wang · Stephen Ra -
2022 Poster: Tractable Function-Space Variational Inference in Bayesian Neural Networks »
Tim G. J. Rudner · Zonghao Chen · Yee Whye Teh · Yarin Gal -
2022 Poster: Scalable Sensitivity and Uncertainty Analyses for Causal-Effect Estimates of Continuous-Valued Interventions »
Andrew Jesson · Alyson Douglas · Peter Manshausen · Maëlys Solal · Nicolai Meinshausen · Philip Stier · Yarin Gal · Uri Shalit -
2022 Poster: Non-identifiability and the Blessings of Misspecification in Models of Molecular Fitness »
Eli Weinstein · Alan Amin · Jonathan Frazer · Debora Marks -
2022 Poster: Interventions, Where and How? Experimental Design for Causal Models at Scale »
Panagiotis Tigas · Yashas Annadani · Andrew Jesson · Bernhard Schölkopf · Yarin Gal · Stefan Bauer -
2022 Poster: Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation »
Jannik Kossen · Sebastian Farquhar · Yarin Gal · Thomas Rainforth -
2021 Workshop: Learning Meaningful Representations of Life (LMRL) »
Elizabeth Wood · Adji Bousso Dieng · Aleksandrina Goeva · Anshul Kundaje · Barbara Engelhardt · Chang Liu · David Van Valen · Debora Marks · Edward Boyden · Eli N Weinstein · Lorin Crawford · Mor Nitzan · Romain Lopez · Tamara Broderick · Ray Jones · Wouter Boomsma · Yixin Wang -
2019 : Synthetic Systems »
Pamela Silver · Debora Marks · Chang Liu · Possu Huang -
2019 Workshop: Learning Meaningful Representations of Life »
Elizabeth Wood · Yakir Reshef · Jonathan Bloom · Jasper Snoek · Barbara Engelhardt · Scott Linderman · Suchi Saria · Alexander Wiltschko · Casey Greene · Chang Liu · Kresten Lindorff-Larsen · Debora Marks -
2018 : Invited Talk Session 2 »
Debora Marks · Olexandr Isayev · Tess Smidt · Nathaniel Thomas -
2018 : TBC 4 »
Debora Marks -
2018 Poster: BRUNO: A Deep Recurrent Model for Exchangeable Data »
Iryna Korshunova · Jonas Degrave · Ferenc Huszar · Yarin Gal · Arthur Gretton · Joni Dambre