Timezone: »

Designing and Evolving Neuron-Specific Proteases
Han Spinner · Colin Hemez · Julia McCreary · David Liu · Debora Marks
Event URL: https://openreview.net/forum?id=lU0LHB9tk4T »

Directed evolution has remarkably advanced protein engineering. However, these experiments are typically seeded with a single sequence, and they are limited by the amount of sequence space they can explore. Here, we aim to develop a machine learning method that learns from the natural distribution of sequences to design diverse seed sequences. We use Botulinum Neurotoxin X (BoNT/X) as a proof of concept for this approach since there is published data on this evolution campaign, and there are many therapeutic applications of neuron-specific proteases. Additionally, BoNT/X is especially promising for this approach since related BoNT proteases have specific substrate specificity, limiting the utility of simply drawing from the natural sequences. We hypothesize that our machine learning model can learn the ‘essence’ of the protein family and generate diverse substrate binding domains. We built an alignment of 452 sequences around BoNT/X and show that models trained on this data can separate known beneficial and deleterious mutations. Next, we will use these models to generate sequences and perform new evolution experiments. Finally, we will evaluate the impact of starting with a diverse set of seed sequences versus only one seed sequence. This work will not only create new proteases that can be used for therapeutic indications, but also puts forth a new approach for machine-learning-guided evolution experiments.

Author Information

Han Spinner (Harvard Medical School, Harvard University)
Colin Hemez
Julia McCreary
David Liu
Debora Marks (Harvard University)

Debora is a mathematician and computational biologist with a track record of using novel algorithms and statistics to successfully address unsolved biological problems. She has a passion for interpreting genetic variation in a way that impacts biomedical applications. During her PhD, she quantified the pan-genomic scope of microRNA targeting - the combinatorial regulation of protein expression and co-discovered the first microRNA in a virus.  As a postdoc she made a breakthrough in the classic, unsolved problem of ab initio 3D structure prediction of proteins using undirected graphical probability models for evolutionary sequences. She has developed this approach to determine functional interactions, biomolecular structures, including the 3D structure of RNA and RNA-protein complexes and the conformational ensembles of apparently disordered proteins. Her new lab at Harvard is interested in developing methods in deep learning to address a wide range of biological challenges including designing drug affinity libraries for large numbers of human genes, predicting epistasis in antibiotic resistance, the effects of genetic variation on human disease etiology and drug response and sequence design for biosynthetic applications.

More from the Same Authors