Timezone: »
Generative models of biological sequences are a powerful tool for learning from complex sequence data, predicting the effects of mutations, and designing novel biomolecules with desired properties. The problem of measuring differences between high-dimensional distributions is central to the successful construction and use of generative probabilistic models. In this paper we propose the KSD-B, a novel divergence measure for distributions over biological sequences that is based on the kernelized Stein discrepancy (KSD). As for all KSDs, the KSD-B between a model and dataset can be evaluated even when the normalizing constant of the model is unknown; unlike any previous KSD, the KSD-B can be applied to arbitrary distributions over variable-length discrete sequences, and can take into account biological notions of mutational distance. Our theoretical results rigorously establish that the KSD-B is not only a valid divergence measure, but also that it detects non-convergence in distribution. We outline the wide variety of possible applications of the KSD-B, including (a) goodness-of-fit tests, which enable generative sequence models to be evaluated on an absolute instead of relative scale; (b) measurement of posterior sample quality, which enables accurate semi-supervised sequence design and ancestral sequence reconstruction; and (c) selection of a set of representative points, which enables the design of libraries of sequences that are representative of a given generative model for efficient experimental testing.
Author Information
Alan Amin (Harvard University)
Eli Weinstein (Columbia University)
Debora Marks (Harvard University)
Debora is a mathematician and computational biologist with a track record of using novel algorithms and statistics to successfully address unsolved biological problems. She has a passion for interpreting genetic variation in a way that impacts biomedical applications. During her PhD, she quantified the pan-genomic scope of microRNA targeting - the combinatorial regulation of protein expression and co-discovered the first microRNA in a virus. As a postdoc she made a breakthrough in the classic, unsolved problem of ab initio 3D structure prediction of proteins using undirected graphical probability models for evolutionary sequences. She has developed this approach to determine functional interactions, biomolecular structures, including the 3D structure of RNA and RNA-protein complexes and the conformational ensembles of apparently disordered proteins. Her new lab at Harvard is interested in developing methods in deep learning to address a wide range of biological challenges including designing drug affinity libraries for large numbers of human genes, predicting epistasis in antibiotic resistance, the effects of genetic variation on human disease etiology and drug response and sequence design for biosynthetic applications.
More from the Same Authors
-
2022 : How can we use natural evolution and genetic experiments to design protein functions? »
Ada Shaw · June Shin · Debora Marks -
2022 : TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction »
Pascal Notin · Lodevicus van Niekerk · Aaron Kollasch · Daniel Ritter · Yarin Gal · Debora Marks -
2022 : scPerturb: Information Resource for Harmonized Single-Cell Perturbation Data »
Tessa Green · Stefan Peidli · Ciyue Shen · Torsten Gross · Joseph Min · Samuele Garda · Jake Taylor-King · Debora Marks · Augustin Luna · Nils Blüthgen · Chris Sander -
2022 : Designing and Evolving Neuron-Specific Proteases »
Han Spinner · Colin Hemez · Julia McCreary · David Liu · Debora Marks -
2022 Workshop: Learning Meaningful Representations of Life »
Elizabeth Wood · Adji Bousso Dieng · Aleksandrina Goeva · Alex X Lu · Anshul Kundaje · Chang Liu · Debora Marks · Ed Boyden · Eli N Weinstein · Lorin Crawford · Mor Nitzan · Rebecca Boiarsky · Romain Lopez · Tamara Broderick · Ray Jones · Wouter Boomsma · Yixin Wang · Stephen Ra -
2022 Panel: Panel 2A-2: VaiPhy: a Variational… & Non-identifiability and the… »
Oskar Kviman · Eli Weinstein -
2022 Poster: Non-identifiability and the Blessings of Misspecification in Models of Molecular Fitness »
Eli Weinstein · Alan Amin · Jonathan Frazer · Debora Marks -
2021 Workshop: Learning Meaningful Representations of Life (LMRL) »
Elizabeth Wood · Adji Bousso Dieng · Aleksandrina Goeva · Anshul Kundaje · Barbara Engelhardt · Chang Liu · David Van Valen · Debora Marks · Edward Boyden · Eli N Weinstein · Lorin Crawford · Mor Nitzan · Romain Lopez · Tamara Broderick · Ray Jones · Wouter Boomsma · Yixin Wang -
2021 Poster: A generative nonparametric Bayesian model for whole genomes »
Alan Amin · Eli N Weinstein · Debora Marks -
2019 : Synthetic Systems »
Pamela Silver · Debora Marks · Chang Liu · Possu Huang -
2019 Workshop: Learning Meaningful Representations of Life »
Elizabeth Wood · Yakir Reshef · Jonathan Bloom · Jasper Snoek · Barbara Engelhardt · Scott Linderman · Suchi Saria · Alexander Wiltschko · Casey Greene · Chang Liu · Kresten Lindorff-Larsen · Debora Marks -
2018 : Invited Talk Session 2 »
Debora Marks · Olexandr Isayev · Tess Smidt · Nathaniel Thomas -
2018 : TBC 4 »
Debora Marks