Semi-Supervised Learning for Molecular Graphs via Ensemble Consensus
Abstract
Machine learning is transforming molecular sciences by accelerating property prediction, simulation, and the discovery of new molecules and materials. Acquiring labeled data in these domains is often costly and time consuming, whereas large collections of unlabeled molecular data are readily available. Semi-supervised learning (SSL) can exploit such unlabeled data, but standard SSL methods often rely on label-preserving augmentations, which are challenging to design in the molecular domain, where minor changes can drastically alter properties. In this work, we propose an augmentation-free SSL method for regression and classification. Grounded in ensemble learning, our approach introduces a consistency loss that penalizes disagreements with the ensemble consensus. We demonstrate that this training procedure boosts the predictive accuracy of both the ensemble and its individual models across diverse datasets, tasks, and architectures.