Cross Modal Predictive architecture for Material Property prediction
Abstract
In this work, we propose CrossModal Predictive Architecture(X-MoPA), a multimodal learning model that combines crystal structure graphs, X ray diffraction (XRD) patterns, and text based structural descriptions to improve materials property prediction. Unlike prior multimodal approaches that rely on heavy attention mechanisms or simple concatenation, X-MoPA leverages lightweight predictors to learn a joint latent space through cross-modal prediction. For each training instance, we select two modalities and predict the third one in latent space. This formulation captures complementary information across modalities while avoiding reconstruction inefficiencies and contrastive memory bottlenecks. We train and evaluate the model on Matbench for several key properties, Band Gap, Shear Modulus, Bulk Modulus and formation energy for Perovskites. X-MoPA consistently outperforms state of the art(SOTA) models, with error reductions ranging from 16% to 60% across four key properties, while matching the best baseline on Shear Modulus. Beyond Matbench, X-MoPA achieves SOTA performance on AFLOW band gap prediction, showing that the learned cross-modal representations transfer well across datasets with different sampling strategies and property distributions.