Poster
in
Workshop: New Frontiers of AI for Drug Discovery and Development

Data-Efficient Molecular Generation with Hierarchical Textual Inversion

Seojin Kim ⋅ Jaehyun Nam ⋅ Sihyun Yu ⋅ Younghoon Shin ⋅ Jinwoo Shin

Keywords: Molecular generation

Project Page [ OpenReview]

Abstract

Developing an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical textual Inversion for Molecular Generation (HI-Mol), a novel data-efficient molecular generation method. HI-Mol is inspired by a recent textual inversion technique in the visual domain that achieves data-efficient generation via simple optimization of a new single text token of a pre-trained text-to-image generative model. However, we find that its naive adoption fails for molecules due to their complicated and structured nature. Hence, we propose a hierarchical textual inversion scheme based on introducing low-level tokens that are selected differently per molecule in addition to the original single text token in textual inversion to learn the common concept among molecules. We then generate molecules using a pre-trained text-to-molecule model by interpolating the low-level tokens. Extensive experiments demonstrate the superiority of HI-Mol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-of-the-art method with 50$\times$ less training data. We also show the efficacy of HI-Mol in various applications, including molecular optimization and low-shot molecular property prediction.

Chat is not available.