DentalNet: Geometric Aware Multi-View Transformer for Occlusion Grade Prediction in Dental 3D Scans
Abstract
The assessment of orthodontic treatment need, standardized by the Index of Orthodontic Treatment Need (IOTN), is a cornerstone of clinical dentistry. The conventional workflow for grading its Dental Health Component (DHC) is a laborious process where 3D intraoral scans are physically printed for manual measurement of geometric discrepancies, introducing significant delays and potential for inter-examiner variability. To overcome this, we formulate the task of automated IOTN-DHC classification directly from 3D dental models.Furthermore, we conduct a study of retrospectively collected real clinical cases, with ground truth labels annotated by an orthodontic specialist, to serve as the benchmark for this problem.We propose DentalNet, a multi-modal deep learning architecture which fuses information from 2D rendered views and 3D point clouds. It achieves this by integrating a Vision Transformer for visual context and a Point Transformer for precise geometry, via a cross-attention mechanism to learn the critical inter-arch relationships that define malocclusion. On our 4-class classification task, DentalNet outperforms the state-of-the-art models' achieving a mean F1-score of 67.03\% which significantly surpasses the best performing 2D and 3D baselines i.e. Convnextv2 (49.79\%) and PointNet++ (55.36\%) demonstrating the utility of our multimodal approach.