Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI4Mat-2024: NeurIPS 2024 Workshop on AI for Accelerated Materials Design

Spectro: A multi-modal approach for molecule elucidation using IR and NMR data

Edwin Chacko · Rudra Sondhi · Arnav Praveen · Kylie Luska · Rodrigo Vargas-Hernandez

Keywords: [ IR ] [ molecular elucidation ] [ vision models ] [ NMR ] [ molecules ] [ LLM ] [ SELFIES ]


Abstract: Molecular structure elucidation is a crucial but fundamentally challenging step in the characterization of materials given the large number of possible structures. Here, we introduce Spectro, an innovative multi-modal approach for molecular elucidation that combines 13\ceC and 1\ceH NMR data with IR. Spectro translates the embedded representations of the spectra into molecular structures using the SELFIES notation. We employed a vision model for the embedded representation of the IR data, which was pretrained to detect relevant functional group peaks in the IR spectra achieving an F1 score of 91\%. For NMR data, we utilized LLM2Vec, treating the NMR spectra as text. This integration of multiple spectroscopic techniques allows Spectro to achieve an overall test accuracy of 93\% when trained jointly with the vision model for the IR spectra, and 82\% when trained with fixed embeddings. Our approach demonstrates the potential of multi-modal learning in tackling complex molecular characterization tasks.

Chat is not available.