Timezone: »

 
Tuned Quadratic Landscapes for Benchmarking Model-Guided Protein Design
Neil Thomas · Atish Agarwala · David Belanger · Yun Song · Lucy Colwell
Event URL: https://openreview.net/forum?id=9NwpfynDXF2 »

Advancements in DNA synthesis and sequencing technologies have enabled a novel paradigm of protein design where machine learning models trained on experimental data are used to guide exploration of a protein sequence landscape. ML-guided directed evolution (MLDE) has the potential to not only build upon the successes of directed evolution, but to also unlock new strategies that can make more efficient use of experimental data, and trade off between multiple optimization objectives. Building an MLDE pipeline involves manifold design choices ranging from data collection strategies to modeling choices, each of which has a large impact on the downstream effectiveness of designed sequences. The cost of collecting experimental data makes benchmarking these pipelines on real data prohibitively difficult, necessitating the development of synthetic landscapes where MLDE strategies can be tested. In this work, we develop a framework called SLIP (“Synthetic Landscape Inference for Proteins”) for constructing synthetic landscapes with tunable difficulty based on Potts Models. SLIP is open-source.

Author Information

Neil Thomas (UC Berkeley)
Atish Agarwala (Google Research)
David Belanger (Google)
Yun Song (UC Berkeley)
Lucy Colwell (Cambridge University)

More from the Same Authors