Conditional Adversarial Random Forest for Synthetic Electronic Health Record Generation
Abstract
Synthetic Electronic Health Records (EHRs) enable privacy-preserving healthcare data sharing for machine learning research. However, existing methods struggle with: maintaining temporal consistency across patient visits while preserving demographic-clinical correlations. Current approaches either sacrifice temporal fidelity or require extensive postprocessing. We propose Conditional Adversarial Random Forest (CARF), extending Adversarial Random Forest [1] with a two-model strategy. The first model generates patient-level demographics that remain static across visits. The second conditional model produces visit-level clinical variables, incorporating visit rank and time progression to create complete patient trajectories. This eliminates manual postprocessing while preserving temporal patterns inherently.