Timezone: »
Protein design is challenging because it requires searching through a vast combinatorial space that is only sparsely functional. Self-supervised learning approaches offer the potential to navigate through this space more effectively and thereby accelerate protein engineering. We introduce a sequence denoising autoencoder (DAE) that learns the manifold of protein sequences from a large amount of potentially unlabelled proteins. This DAE is combined with a function predictor that guides sampling towards sequences with higher levels of desired functions. We train the sequence DAE on more than 20M unlabeled protein sequences spanning many evolutionarily diverse protein families and train the function predictor on approximately 0.5M sequences with known function labels. At test time, we sample from the model by iteratively denoising a sequence while exploiting the gradients from the function predictor. We present a few preliminary case studies of protein design that demonstrate the effectiveness of this proposed approach, which we refer to as “deep manifold sampling”, including metal binding site addition, function-preserving diversification, and global fold change.
Author Information
Vladimir Gligorijevic (Prescient Design/Genentech)
Stephen Ra (Prescient Design | Genentech)
Dan Berenberg (New York University)
Richard Bonneau (Genentech)
Richard Bonneau is on leave from NYU and currently at Prescient Design.
Kyunghyun Cho (Genentech | New York University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 : Function-guided protein design by deep manifold sampling »
Mon. Dec 13th 07:40 -- 07:50 PM Room
More from the Same Authors
-
2021 : NaturalProofs: Mathematical Theorem Proving in Natural Language »
Sean Welleck · Jiacheng Liu · Ronan Le Bras · Hanna Hajishirzi · Yejin Choi · Kyunghyun Cho -
2021 : KLUE: Korean Language Understanding Evaluation »
Sungjoon Park · Jihyung Moon · Sungdong Kim · Won Ik Cho · Ji Yoon Han · Jangwon Park · Chisung Song · Junseong Kim · Youngsook Song · Taehwan Oh · Joohong Lee · Juhyun Oh · Sungwon Lyu · Younghoon Jeong · Inkwon Lee · Sangwoo Seo · Dongjun Lee · Hyunwoo Kim · Myeonghwa Lee · Seongbo Jang · Seungwon Do · Sunkyoung Kim · Kyungtae Lim · Jongwon Lee · Kyumin Park · Jamin Shin · Seonghyun Kim · Lucy Park · Alice Oh · Jung-Woo Ha · Kyunghyun Cho -
2022 : Automated Protein Function Description for Novel Class Discovery »
Meet Barot · Vladimir Gligorijevic · Richard Bonneau · Kyunghyun Cho -
2022 : A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences »
Nataša Tagasovska · Nathan Frey · Andreas Loukas · Isidro Hotzel · Julien Lafrance-Vanasse · Ryan Kelly · Yan Wu · Arvind Rajpal · Richard Bonneau · Kyunghyun Cho · Stephen Ra · Vladimir Gligorijevic -
2022 : PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design »
Ji Won Park · Samuel Stanton · Saeed Saremi · Andrew Watkins · Stephen Ra · Vladimir Gligorijevic · Kyunghyun Cho · Richard Bonneau -
2022 : EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation »
Jae Hyeon Lee · Payman Yadollahpour · Andrew Watkins · Nathan Frey · Andrew Leaver-Fay · Stephen Ra · Vladimir Gligorijevic · Kyunghyun Cho · Aviv Regev · Richard Bonneau -
2022 : Learning Causal Representations of Single Cells via Sparse Mechanism Shift Modeling »
Romain Lopez · Nataša Tagasovska · Stephen Ra · Kyunghyun Cho · Jonathan Pritchard · Aviv Regev -
2022 Workshop: Learning Meaningful Representations of Life »
Elizabeth Wood · Adji Bousso Dieng · Aleksandrina Goeva · Alex X Lu · Anshul Kundaje · Chang Liu · Debora Marks · Ed Boyden · Eli N Weinstein · Lorin Crawford · Mor Nitzan · Rebecca Boiarsky · Romain Lopez · Tamara Broderick · Ray Jones · Wouter Boomsma · Yixin Wang · Stephen Ra -
2022 : EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation »
Jae Hyeon Lee · Payman Yadollahpour · Andrew Watkins · Nathan Frey · Andrew Leaver-Fay · Stephen Ra · Vladimir Gligorijevic · Kyunghyun Cho · Aviv Regev · Richard Bonneau -
2021 Poster: True Few-Shot Learning with Language Models »
Ethan Perez · Douwe Kiela · Kyunghyun Cho -
2019 : Cell »
Anne Carpenter · Jian Zhou · Maria Chikina · Alexander Tong · Ben Lengerich · Aly Abdelkareem · Gokcen Eraslan · Stephen Ra · Daniel Burkhardt · Frederick A Matsen IV · Alan Moses · Zhenghao Chen · Marzieh Haghighi · Alex Lu · Geoffrey Schau · Jeff Nivala · Miriam Shiffman · Hannes Harbrecht · Levi Masengo Wa Umba · Joshua Weinstein