Causal-Chemprop: Causal Machine Learning for Molecular Property Prediction and Optimization
Christian Natajaya · Lucas Attia · Jackson Burns
Abstract
A priori estimation of molecular properties has long been of immense interest to the pharmaceutical sciences for hit generation and optimization. While neural network-based models have achieved high predictive accuracy, they still find limited utility in molecular design. High-dimensional molecular representations are difficult to optimize, particularly when trained on small or sparse datasets. Moreover, neural network-based models lack mechanisms to explicitly incorporate domain knowledge from experts and prior knowledge from existing data. Herein, we introduce a causal machine learning framework built on the Chemprop and DAGMA architectures for molecular property prediction called Causal-Chemprop. To our knowledge, this is the first application of causal machine learning to molecular property prediction and optimization. Via intervention-based inference, Causal-Chemprop demonstrates strong predictive performances on $IC_{50}$ from the Kinase Knowledgebase and aqueous $logS$ from a solubility dataset comprising BigSolDB and SolProp. Counterfactual inference offers support for human-in-the-loop optimization of molecular structure, which we demonstrate by predicting solubility on an quinolinyltriazole MIF inhibitor seed structure and its molecular derivatives. Finally, we integrate Causal-Chemprop with the molecular optimization algorithm EvoMol to perform inverse molecular design, yielding soluble analogs of the MIF inhibitor seed structure.
Chat is not available.
Successful Page Load