Virtual, make-on-demand chemical libraries have transformed early-stage drug discovery by unlocking vast, synthetically accessible regions of chemical space. Recent years have witnessed rapid growth in these libraries from millions to trillions of compounds, hiding undiscovered, potent hits for a variety of therapeutic targets. However, they are quickly approaching a size beyond that which permits explicit enumeration, presenting new challenges for virtual screening. To overcome these challenges, we propose the Combinatorial Synthesis Library Variational Auto-Encoder (CSLVAE). The proposed generative model represents such libraries as a differentiable, hierarchically-organized database. Given a compound from the library, the molecular encoder constructs a query for retrieval, which is utilized by the molecular decoder to reconstruct the compound by first decoding its chemical reaction and subsequently decoding its reactants. Our design minimizes autoregression in the decoder, facilitating the generation of large, valid molecular graphs. Our method performs fast and parallel batch inference for ultra-large synthesis libraries, enabling a number of important applications in early-stage drug discovery. Compounds proposed by our method are guaranteed to be in the library, and thus synthetically and cost-effectively accessible. Importantly, CSLVAE can encode out-of-library compounds and search for in-library analogues. In experiments, we demonstrate the capabilities of the proposed method in the navigation of massive combinatorial synthesis libraries.
Aryan Pedawi (Atomwise)
Pawel Gniewek (Atomwise, Inc)
Brandon Anderson (Atomic AI)
Henry van den Bedem (Atomwise, Inc and Dept of Bioengineering & Therapeutic Sciences, UCSF)
Henry van den Bedem is Vice President of Machine Learning Research & Cheminformatics at Atomwise, Inc. Henry concurrently holds an Associate Adjunct Professor appointment with the Department of Bioengineering & Therapeutic Sciences at the University of California at San Francisco and was a 2018-2020 Mercator Fellow with the Deutsche Forschungsgemeinschaft (DFG), the German equivalent of the US National Science Foundation (NSF). Before joining Atomwise, Henry led a research group in computational structural protein dynamics at the SLAC National Accelerator Laboratory at Stanford University. His group was funded by the National Institutes of Health, the US Department of Energy, and pharmaceutical companies. Henry holds a PhD from the University of Alabama at Birmingham and an MS from the Delft University of Technology, both in mathematics. He is the author of more than 150 peer-reviewed scientific publications and numerous open-source scientific software packages. He is a multiple Ironman World Championship qualifier and an All-American Athlete.