Towards On-device Foundation Models for Raw Wearable Signals
Simon Lee · Cyrus Tanade · Hao Zhou · Juhyeon Lee · Megha Thukral · Baiying Lu · Sharanya Desai
Abstract
We propose a lightweight foundation model for wearable signals that leverages convolutional inductive biases within a masked autoencoder and U-Net CNN backbone. By explicitly encoding temporal locality and multi-scale structure, our approach aligns more naturally with the nonstationary dynamics of physiological waveforms than attention-based transformers. Pretrained on 80k hours of photoplethysmogram (PPG), the model matches or surpasses larger state-of-the-art baselines across ten clinical classification tasks. At the same time, it achieves two to three orders of magnitude reductions in parameters (0.31M vs. 110M), memory footprint (3.6 MB vs. 441.3 MB), and compute, while delivering substantial speedups ($\sim$4× CPU, $\sim$20× GPU) with resolution flexibility. Together, these results establish compact convolutional self-supervised models as both scientifically aligned and practically scalable foundations for potential real-time on-device healthcare monitoring.
Chat is not available.
Successful Page Load