Timezone: »

ProteinNPT: Improving protein property prediction and design with non-parametric transformers
Pascal Notin · Ruben Weitzman · Debora Marks · Yarin Gal

Tue Dec 12 03:15 PM -- 05:15 PM (PST) @ Great Hall & Hall B1+B2 #308

Protein design holds immense potential for optimizing naturally occurring sequences, with broad applications in drug discovery, material design, and sustainability. However, computational methods for protein engineering are confronted with significant challenges, including an expansive design space, sparse functional regions, and scarcity of available labels. Furthermore, real-life design scenarios often necessitate the simultaneous optimization of multiple properties, exacerbating label sparsity issues. In this paper, we present ProteinNPT, a non-parametric transformer variant tailored for protein sequences and particularly suited to label-scarce and multi-task optimization settings. We first expand the ProteinGym benchmark to evaluate models in supervised settings and develop several cross-validation schemes for robust assessment. Subsequently, we reimplement existing top-performing baselines, introduce several extensions of these baselines by integrating diverse branches of protein engineering literature, and demonstrate that ProteinNPT consistently outperforms all of them across a diverse set of protein property prediction tasks. Finally, we demonstrate the value of our approach for iterative protein design in several in silico Bayesian optimization experiments.

Author Information

Pascal Notin (Department of Computer Science, University of Oxford)
Ruben Weitzman (University of Oxford)
Debora Marks (Harvard University)

Debora is a mathematician and computational biologist with a track record of using novel algorithms and statistics to successfully address unsolved biological problems. She has a passion for interpreting genetic variation in a way that impacts biomedical applications. During her PhD, she quantified the pan-genomic scope of microRNA targeting - the combinatorial regulation of protein expression and co-discovered the first microRNA in a virus.  As a postdoc she made a breakthrough in the classic, unsolved problem of ab initio 3D structure prediction of proteins using undirected graphical probability models for evolutionary sequences. She has developed this approach to determine functional interactions, biomolecular structures, including the 3D structure of RNA and RNA-protein complexes and the conformational ensembles of apparently disordered proteins. Her new lab at Harvard is interested in developing methods in deep learning to address a wide range of biological challenges including designing drug affinity libraries for large numbers of human genes, predicting epistasis in antibiotic resistance, the effects of genetic variation on human disease etiology and drug response and sequence design for biosynthetic applications.

Yarin Gal (University of Oxford)

More from the Same Authors