Workshop: AI for Science: Progress and Promises

MoleculeCLIP: Learning Transferable Molecule Multi-Modality Models via Natural Language

Shengchao Liu · Weili Nie · Chengpeng Wang · Jiarui Lu · Zhuoran Qiao · Ling Liu · Jian Tang · Anima Anandkumar · Chaowei Xiao

Keywords: [ retrieval ] [ pretraining ] [ molecule structure ] [ controllable generaiton ] [ large language model ] [ Molecule Representation Learning ] [ Molecular Property Prediction ]

Abstract: Recently, artificial intelligence for drug discovery has attracted an increasing interest in the community. One of the key challenges is to learn a powerful molecule representation. To achieve this goal, existing works focus on learning the molecule representations from the molecule chemical structures (\textit{i.e.}, 1D description, 2D topology, or 3D geometry). However, such representations poorly generalize to unseen tasks. Meanwhile, humans can learn the hierarchical and multi-modality information including molecule chemical structure and natural language (\textit{e.g.}, biomedical text) simultaneously and can generalize to new concepts. Motivated by this observation, in this paper, we explore the functionality of text utilization for drug discovery. We design a multi-modality model, MoleculeCLIP, by leveraging natural language and molecule structure. MoleculeCLIP consists of two branches: chemical structure branch to encode the chemical structures and textual description branch to encode corresponding natural language-based descriptions. To train it, we first collect a large-scale dataset with more than 280k text and molecule pairs, called PubChemCLIP. It is about 28$\times$ larger than the existing dataset. We then train our model on this dataset by using the contrastive learning strategy to bridge representations from the two branches. We carefully design two categories of zero-shot downstream tasks: the retrieval task and language-guided editing task, through which we highlight three key features of introducing language in MoleculeCLIP: the open vocabulary, the compositionality, and the domain knowledge exploration. By conducting extensive experiments, quantitatively, MoleculeCLIP outperforms the existing methods on 6 zero-shot retrieval tasks and 24 zero-shot language-guided molecule editing tasks. Qualitatively, we show that MoleculeCLIP can understand the domain information by successfully detecting the key structures referred in the text prompts. Furthermore, the representation learned from MoleculeCLIP can be used to further boost the performance of the existing task, molecular property prediction.

Chat is not available.