Timezone: »

 
Seq2MSA: A Language Model for Protein Sequence Diversification
Pascal Sturmfels · Roshan Rao · Robert Verkuil · Zeming Lin · Tom Sercu · Adam Lerer · Alex Rives

Sat Dec 03 12:20 PM -- 12:35 PM (PST) @

Diversification libraries of protein sequences that contain a similar set of structures over a variety of sequences can help protein design pipelines by introducing flexibility into the starting structures and providing a range of starting points for directed evolution. However, exploring the sequence space is computationally challenging: the vast majority of sequence space is non-viable, and even of those sequences that do fold to well-formed protein structures, it is challenging to find the fraction that maintain a similar fold class to a given protein. In this work, we propose to use an encoder-decoder language model, trained on a novel Seq2MSA task, that can create diversification libraries of any input protein. In particular, using our model, we are able to generate sequences that maintain structural similarity to a target sequence while pushing below 40% sequence identity to any protein in UniRef. Our diversification pipeline has the potential to aid in computational protein design by providing a diverse set of starting points in sequence space for a given functional or structural target.

Author Information

Pascal Sturmfels (University of Washington)
Pascal Sturmfels

PhD student at the University of Washington interested in deep learning for protein design and protein language modeling.

Roshan Rao (Meta)
Robert Verkuil (Facebook)
Zeming Lin (New York University)
Tom Sercu (FAIR)
Adam Lerer (Facebook AI Research)
Alex Rives (FAIR)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors

  • 2021 : Deep generative models create new and diverse protein structures »
    Zeming Lin · Tom Sercu · yann lecun · Alex Rives
  • 2021 : End-to-end learning of multiple sequence alignmentswith differentiable Smith-Waterman »
    Samantha Petti · Nicholas Bhattacharya · Roshan Rao · Justas Dauparas · Neil Thomas · Juannan Zhou · Alexander Rush · Peter Koo · Sergey Ovchinnikov
  • 2022 : Human-AI Coordination via Human-Regularized Search and Learning »
    Hengyuan Hu · David Wu · Adam Lerer · Jakob Foerster · Noam Brown
  • 2022 : Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning »
    Anton Bakhtin · David Wu · Adam Lerer · Jonathan Gray · Athul Jacob · Gabriele Farina · Alexander Miller · Noam Brown
  • 2022 : Invited Speaker »
    Alex Rives
  • 2022 Workshop: Machine Learning in Structural Biology Workshop »
    Roshan Rao · Jonas Adler · Namrata Anand · John Ingraham · Sergey Ovchinnikov · Ellen Zhong
  • 2021 : Deep generative models create new and diverse protein structures »
    Zeming Lin · Tom Sercu · yann lecun · Alex Rives
  • 2021 : End-to-end learning of multiple sequence alignmentswith differentiable Smith-Waterman »
    Samantha Petti · Nicholas Bhattacharya · Roshan Rao · Justas Dauparas · Neil Thomas · Juannan Zhou · Alexander Rush · Peter Koo · Sergey Ovchinnikov
  • 2021 Workshop: Machine Learning in Structural Biology »
    Ellen Zhong · Raphael Townshend · Stephan Eismann · Namrata Anand · Roshan Rao · John Ingraham · Wouter Boomsma · Sergey Ovchinnikov · Bonnie Berger
  • 2021 Poster: No-Press Diplomacy from Scratch »
    Anton Bakhtin · David Wu · Adam Lerer · Noam Brown
  • 2021 Poster: Language models enable zero-shot prediction of the effects of mutations on protein function »
    Joshua Meier · Roshan Rao · Robert Verkuil · Jason Liu · Tom Sercu · Alex Rives
  • 2020 : Exploring generative atomic models in cryo-EM reconstruction »
    Ellen Zhong · Adam Lerer · · Bonnie Berger
  • 2020 : Afternoon Poster Session »
    Roshan Rao
  • 2020 : Contributed talks intro »
    Roshan Rao
  • 2020 : Possu Huang intro »
    Roshan Rao
  • 2020 Workshop: Machine Learning for Structural Biology »
    Raphael Townshend · Stephan Eismann · Ron Dror · Ellen Zhong · Namrata Anand · John Ingraham · Wouter Boomsma · Sergey Ovchinnikov · Roshan Rao · Per Greisen · Rachel Kolodny · Bonnie Berger
  • 2020 Poster: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian »
    Jack Parker-Holder · Luke Metz · Cinjon Resnick · Hengyuan Hu · Adam Lerer · Alistair Letcher · Alexander Peysakhovich · Aldo Pacchiano · Jakob Foerster
  • 2020 Poster: Combining Deep Reinforcement Learning and Search for Imperfect-Information Games »
    Noam Brown · Anton Bakhtin · Adam Lerer · Qucheng Gong
  • 2019 : Contributed Talk - 3 »
    Adam Lerer
  • 2019 : Poster Session »
    Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn
  • 2019 : Extended Poster Session »
    Travis LaCroix · Marie Ossenkopf · Mina Lee · Nicole Fitzgerald · Daniela Mihai · Jonathon Hare · Ali Zaidi · Alexander Cowen-Rivers · Alana Marzoev · Eugene Kharitonov · Luyao Yuan · Tomasz Korbak · Paul Pu Liang · Yi Ren · Roberto Dessì · Peter Potash · Shangmin Guo · Tatsunori Hashimoto · Percy Liang · Julian Zubek · Zipeng Fu · Song-Chun Zhu · Adam Lerer
  • 2019 Poster: PyTorch: An Imperative Style, High-Performance Deep Learning Library »
    Adam Paszke · Sam Gross · Francisco Massa · Adam Lerer · James Bradbury · Gregory Chanan · Trevor Killeen · Zeming Lin · Natalia Gimelshein · Luca Antiga · Alban Desmaison · Andreas Kopf · Edward Yang · Zachary DeVito · Martin Raison · Alykhan Tejani · Sasank Chilamkurthy · Benoit Steiner · Lu Fang · Junjie Bai · Soumith Chintala
  • 2019 Poster: Evaluating Protein Transfer Learning with TAPE »
    Roshan Rao · Nicholas Bhattacharya · Neil Thomas · Yan Duan · Peter Chen · John Canny · Pieter Abbeel · Yun Song
  • 2019 Spotlight: Evaluating Protein Transfer Learning with TAPE »
    Roshan Rao · Nicholas Bhattacharya · Neil Thomas · Yan Duan · Peter Chen · John Canny · Pieter Abbeel · Yun Song
  • 2019 Poster: Robust Multi-agent Counterfactual Prediction »
    Alexander Peysakhovich · Christian Kroer · Adam Lerer
  • 2018 : Poster Session 1 + Coffee »
    Tom Van de Wiele · Rui Zhao · J. Fernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · S. M. Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alexander Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Ruben Rodriguez · Avi Singh · Yiming Zhang
  • 2018 Poster: Forward Modeling for Partial Observation Strategy Games - A StarCraft Defogger »
    Gabriel Synnaeve · Zeming Lin · Jonas Gehring · Dan Gant · Vegard Mella · Vasil Khalidov · Nicolas Carion · Nicolas Usunier
  • 2016 Workshop: Intuitive Physics »
    Adam Lerer · Jiajun Wu · Josh Tenenbaum · Emmanuel Dupoux · Rob Fergus