A Comprehensive Survey of Multimodal LLMs for Scientific Discovery
Liang Yan · Xu Jiang · Jian Ma · Yuhang Liu · Tian Bian · Qichao Wang · Abhishek Basu · Yu Rong · Tingyang Xu · Pengcheng Wu · Le Song · Imran Razzak · Junchi Yan · zengfeng Huang · Yutong Xie
Abstract
Recent advances in artificial intelligence (AI), especially large language models, have accelerated the integration of multimodal data in scientific research. Given that scientific fields involve diverse data types, ranging from text and images to complex biological sequences and structures, multimodal large language models~(MLLMs) have emerged as powerful tools to bridge these modalities, enabling more comprehensive data analysis and intelligent decision-making. This work, $\text{S}^3\text{-Bench}$, provides a comprehensive overview of recent advances in MLLMs, focusing on their diverse applications across science. We systematically review the progress of MLLMs in key scientific domains, including drug discovery, molecular \& protein design, materials science, and genomics. The work highlights model architectures, domain-specific adaptations, benchmark datasets, and promising future directions. More importantly, we benchmarked open-source MLLMs on a range of critical small molecular and protein property prediction tasks. Our work aims to serve as a valuable resource for both researchers and practitioners interested in the rapidly evolving landscape of multimodal AI for science.
Chat is not available.
Successful Page Load