Timezone: »
Molecular optimization is a fundamental goal in the chemical sciences and is of central interest to drug and material design. In recent years, significant progress has been made in solving challenging problems across various aspects of computational molecular optimizations, emphasizing high validity, diversity, and, most recently, synthesizability. Despite this progress, many papers report results on trivial or self-designed tasks, bringing additional challenges to directly assessing the performance of new methods. Moreover, the sample efficiency of the optimization---the number of molecules evaluated by the oracle---is rarely discussed, despite being an essential consideration for realistic discovery applications.To fill this gap, we have created an open-source benchmark for practical molecular optimization, PMO, to facilitate the transparent and reproducible evaluation of algorithmic advances in molecular optimization. This paper thoroughly investigates the performance of 25 molecular design algorithms on 23 single-objective (scalar) optimization tasks with a particular focus on sample efficiency. Our results show that most ``state-of-the-art'' methods fail to outperform their predecessors under a limited oracle budget allowing 10K queries and that no existing algorithm can efficiently solve certain molecular optimization problems in this setting. We analyze the influence of the optimization algorithm choices, molecular assembly strategies, and oracle landscapes on the optimization performance to inform future algorithm development and benchmarking. PMO provides a standardized experimental setup to comprehensively evaluate and compare new molecule optimization methods with existing ones. All code can be found at https://github.com/wenhao-gao/mol_opt.
Author Information
Wenhao Gao (Massachusetts Institute of Technology)
Tianfan Fu (Georgia Institute of Technology)
Jimeng Sun (University of Illinois, Urbana Champaign)
Connor Coley (MIT)
More from the Same Authors
-
2021 : Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development »
Kexin Huang · Tianfan Fu · Wenhao Gao · Yue Zhao · Yusuf Roohani · Jure Leskovec · Connor Coley · Cao Xiao · Jimeng Sun · Marinka Zitnik -
2021 Spotlight: GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles »
Octavian Ganea · Lagnajit Pattanaik · Connor Coley · Regina Barzilay · Klavs Jensen · William Green · Tommi Jaakkola -
2021 : Bringing Atomistic Deep Learning to Prime Time »
Nathan Frey · Siddharth Samsi · Bharath Ramsundar · Connor Coley -
2021 : Scalable Geometric Deep Learning on Molecular Graphs »
Nathan Frey · Siddharth Samsi · Lin Li · Connor Coley -
2022 : De novo PROTAC design using graph-based deep generative models »
Divya Nori · Connor Coley · Rocío Mercado -
2022 : De novo PROTAC design using graph-based deep generative models »
Divya Nori · Connor Coley · Rocío Mercado -
2022 : Recommendation for New Drugs with Limited Prescription Data »
Zhenbang Wu · Huaxiu Yao · Zhe Su · David Liebovitz · Lucas Glass · James Zou · Chelsea Finn · Jimeng Sun -
2023 Poster: BIOT: Biosignal Transformer for Cross-data Learning in the Wild »
Chaoqi Yang · M Westover · Jimeng Sun -
2023 Poster: An Iterative Self-Learning Framework for Medical Domain Generalization »
Zhenbang Wu · Huaxiu Yao · David Liebovitz · Jimeng Sun -
2023 Poster: Prefix-tree decoding for predicting mass spectra from molecules »
Samuel Goldman · John Bradshaw · Jiayi Xin · Connor Coley -
2023 Poster: CoDrug: Conformal Drug Property Prediction with Density Estimation under Covariate Shift »
Siddhartha Laghuvarapu · Zhen Lin · Jimeng Sun -
2023 Workshop: AI for Science: from Theory to Practice »
Yuanqi Du · Max Welling · Yoshua Bengio · Marinka Zitnik · Carla Gomes · Jure Leskovec · Maria Brbic · Wenhao Gao · Kexin Huang · Ziming Liu · Rocío Mercado · Miles Cranmer · Shengchao Liu · Lijing Wang -
2022 : A High-Throughput Platform for Efficient Exploration of Polypeptides Chemical Space via Automation and Machine Learning »
Guangqi Wu · Connor Coley · Hua Lu -
2022 : Automated Materials Synthesis Keynote »
Connor Coley -
2022 : MolPAL: Software for Sample Efficient High-Throughput Virtual Screening »
David Graff · Connor Coley -
2022 : A source data privacy framework for synthetic clinical trial data »
Afrah Shafquat · Jason Mezey · Mandis Beigi · Jimeng Sun · Jacob Aptekar -
2022 Workshop: AI for Science: Progress and Promises »
Yi Ding · Yuanqi Du · Tianfan Fu · Hanchen Wang · Anima Anandkumar · Yoshua Bengio · Anthony Gitter · Carla Gomes · Aviv Regev · Max Welling · Marinka Zitnik -
2022 Poster: Reinforced Genetic Algorithm for Structure-based Drug Design »
Tianfan Fu · Wenhao Gao · Connor Coley · Jimeng Sun -
2022 Poster: ATD: Augmenting CP Tensor Decomposition by Self Supervision »
Chaoqi Yang · Cheng Qian · Navjot Singh · Cao (Danica) Xiao · M Westover · Edgar Solomonik · Jimeng Sun -
2022 Poster: TransTab: Learning Transferable Tabular Transformers Across Tables »
Zifeng Wang · Jimeng Sun -
2022 Poster: Conformal Prediction with Temporal Quantile Adjustments »
Zhen Lin · Shubhendu Trivedi · Jimeng Sun -
2021 : AI X Chemistry »
Connor Coley -
2021 Workshop: AI for Science: Mind the Gaps »
Payal Chandak · Yuanqi Du · Tianfan Fu · Wenhao Gao · Kexin Huang · Shengchao Liu · Ziming Liu · Gabriel Spadon · Max Tegmark · Hanchen Wang · Adrian Weller · Max Welling · Marinka Zitnik -
2021 Poster: Learning Graph Models for Retrosynthesis Prediction »
Vignesh Ram Somnath · Charlotte Bunne · Connor Coley · Andreas Krause · Regina Barzilay -
2021 Poster: GeoMol: Torsional Geometric Generation of Molecular 3D Conformer Ensembles »
Octavian Ganea · Lagnajit Pattanaik · Connor Coley · Regina Barzilay · Klavs Jensen · William Green · Tommi Jaakkola -
2020 Demonstration: MolDesigner: Interactive Design of Efficacious Drugs with Deep Learning »
Kexin Huang · Tianfan Fu · Dawood Khan · Ali Abid · Ali Abdalla · Abubaker Abid · Lucas Glass · Marinka Zitnik · Cao Xiao · Jimeng Sun -
2019 Poster: Retrosynthesis Prediction with Conditional Graph Logic Network »
Hanjun Dai · Chengtao Li · Connor Coley · Bo Dai · Le Song -
2017 Poster: Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network »
Wengong Jin · Connor Coley · Regina Barzilay · Tommi Jaakkola