Poster
in
Workshop: Learning from Time-Series for Health

Learning Representations from Incomplete EHR Data with Dual-Masked Autoencoding

Xiao Xiang · David Restrepo · Hyewon Jeong · Yugang jia · Leo Anthony Celi

Project Page [ OpenReview]

Abstract

Learning from electronic health records (EHRs) time series is challenging due to irregular sampling, heterogeneous missingness patterns, and the signal encoded in missingness itself. Prior self-supervised methods either impute data before learning, or use imputation as the objective, reducing their capacity to learn robust representations that support clinical downstream tasks. We propose the Augmented-Intrinsic Dual-Masked Autoencoder (AID-MAE), which learns directly from incomplete EHR tables by applying an intrinsic missing mask to represent naturally missing values and an augmented mask that hides a subset of observed values for reconstruction during training. AID-MAE consistently outperforms strong baselines, including XGBoost and DuETT, on mortality and length-of-stay prediction with MIMIC-IV, while learning embeddings that naturally stratify patient cohorts in the embedding space.

Chat is not available.