Timezone: »

CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models
Vijil Chenthamarakshan · Payel Das · Samuel Hoffman · Hendrik Strobelt · Inkit Padhi · Kar Wai Lim · Benjamin Hoover · Matteo Manica · Jannis Born · Teodoro Laino · Aleksandra Mojsilovic

Tue Dec 08 09:00 PM -- 11:00 PM (PST) @ Poster Session 2 #719
The novel nature of SARS-CoV-2 calls for the development of efficient de novo drug design approaches. In this study, we propose an end-to-end framework, named CogMol (Controlled Generation of Molecules), for designing new drug-like small molecules targeting novel viral proteins with high affinity and off-target selectivity. CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme that uses guidance from attribute predictors trained on latent features. To generate novel and optimal drug-like molecules for unseen viral targets, CogMol leverages a protein-molecule binding affinity predictor that is trained using SMILES VAE embeddings and protein sequence embeddings learned unsupervised from a large corpus. We applied the CogMol framework to three SARS-CoV-2 target proteins: main protease, receptor-binding domain of the spike protein, and non-structural protein 9 replicase. The generated candidates are novel at both the molecular and chemical scaffold levels when compared to the training data. CogMol also includes insilico screening for assessing toxicity of parent molecules and their metabolites with a multi-task toxicity classifier, synthetic feasibility with a chemical retrosynthesis predictor, and target structure binding with docking simulations. Docking reveals favorable binding of generated molecules to the target protein structure, where 87--95\% of high affinity molecules showed docking free energy $<$ -6 kcal/mol. When compared to approved drugs, the majority of designed compounds show low predicted parent molecule and metabolite toxicity and high predicted synthetic feasibility. In summary, CogMol can handle multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity, even to novel protein target sequences, and does not need target-dependent fine-tuning of the framework or target structure information.

Author Information

Vijil Chenthamarakshan (IBM Research)
Payel Das (IBM Research)
Samuel Hoffman (IBM Research)
Hendrik Strobelt (IBM Research)
Inkit Padhi (IBM Research)
Kar Wai Lim (IBM Singapore)
Benjamin Hoover (IBM Research)
Matteo Manica (IBM Research Zürich)

Matteo is a Pre-Doc in Cognitive Health Care and Life Sciences Department at IBM Zürich Research Laboratory. He is enrolled in a joint PhD programme with Institute of Molecular Systems Biology, ETH - Zürich. His research is focused on the development of predictive computational technologies and learning frameworks to exploit and integrate multiple molecular and clinical data in the context of cancer medicine in order to improve patients stratification and inform clinicians with personalized therapeutic interventions. He is currently working on the application of machine and deep learning methods to analyze progression and development of prostate cancer in the context an H2020 EU project, PrECISE. Before joining IBM, Matteo worked as consultant in data science and software development with specific applications in biological fluids dynamic, digital and biological signal processing and data analysis. The main focus was on the analysis of CT angiography and MR angiography scans of abdominal aortic aneurysms (AAA). Trough image analysis, segmentation and 3D volume rendering of the abdominal aorta he contributed to create patient specific models to simulate blood flows in the vessels and to assess rupture risk of the aneurysm. He obtained his BSc and MSc at Politecnico di Milano in Applied Mathematics and Computer Science, a course with a strong focus on numerical simulations and data analysis. In his master thesis work he developed an original model, based partial different equations for flow in porous media, to describe Medulloblastoma growth. By analysing MRIs at different time points of a given patient it was possible to fit the model trough segmentation and 3D volume rendering of the brain and the tumor mass, enabling an accurate estimate of the disease’s course over time.

Jannis Born (IBM Research)
Teodoro Laino (IBM Research Zurich)
Aleksandra Mojsilovic (IBM Research)

More from the Same Authors