High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training
Abstract
The development of machine learning for cardiac care is severely hampered by privacy restrictions on sharing real patient electrocardiogram (ECG) data. While generative AI offers a promising solution, synthesized ECGs produced by existing models often lack the morphological fidelity required for clinical utility due to their reliance on simplistic and general training objectives such as MSE loss. In this work, we address this critical gap by introducing MIST-ECG (Mel-spectrogram Informed Synthetic Training), a novel training paradigm that supervises the conditional diffusion-based Structured State Space Model (SSSD-ECG) with time–frequency domain objective to enforce structural realism. We train and rigorously evaluate our framework on the PTB-XL dataset, assessing the synthesized ECG signals on trustworthiness, fidelity, privacy preservation, and downstream task utility. MIST-ECG achieves substantial gains: it improves morphological coherence, preserves strong privacy guarantees with all metrics evaluated exceeding the baseline by 4%-8%, and notably reduces the interlead correlation error by an average of 74%. In critical low-data regimes, a classifier trained on datasets supplemented with our synthetic ECGs achieves performance comparable to a classifier trained solely on real data. This work demonstrates that ECG synthesizers, trained with the proposed time–frequency structural regularization scheme, can serve as high-fidelity, privacy-preserving surrogates when real data are scarce.