DLAGF: Motion-Queried Cross-Attention Transformer Framework for Multimodal Cardiomyocyte Ageing Detection and Early Heart Failure Risk
abu sufian
Abstract
Cardiomyocyte ageing leads to heart failure, yet early detection is difficult with imaging alone. The research introduces a simple and non-invasive multimodal model that combines visual and gene expression data to detect signs of ageing in heart cells. The model uses a compact cross-attention Transformer with a Dual-Level Attention-Gated Fusion (DLAGF) module to integrate four types of data: motion from brightfield videos, single images (morphology), contraction values (CSV), and reduced RNA-seq gene features. The model was trained and tested on 672 clips from 28 wells in 3 plates, using grouped-by-well splits to avoid data leakage (train/val/test = 70/15/15; test set = 101 clips). Our model achieves a macro F1 score of $0.861 \pm 0.011$, outperforming the use of motion only ($0.79 \pm 0.02$) by +7.4 \% accuracy and +0.07 macro F1 points. It also outperforms strong multimodal baselines, such as Perceiver IO (0.84 macro F1) and a symmetric multimodal Transformer (0.85 macro F1). These gains are statistically reliable and come with very little additional computation (only +0.15M parameters and +5\% latency). Ablation studies show that removing gene data drops performance to 0.82 macro F1. It achieves per-class AUCs above $0.92$, and the performance gains are statistically significant: paired bootstrap $\Delta\mathrm{F1} = 0.011$, $p = 0.004$; McNemar’s test $\chi^2 = 6.1$, $p = 0.013$. Visualisation of attention weights also shows a clear link between motion changes and key gene features. This framework provides an efficient method for detecting cell ageing early and is beneficial in drug testing or regenerative heart research. Given that ageing phenotypes precede overt cardiac dysfunction, this multimodal readout supports early heart failure risk stratification in vitro.
Successful Page Load