Skip to yearly menu bar Skip to main content

Workshop: Deep Generative Models for Health

CHIRon: A Generative Foundation Model for Structured Sequential Medical Data

Brian Hill · Melika Emami · Vijay Nori · Aldo Cordova-Palomera · Robert Tillman · Eran Halperin


Recent advances in large language models (LLMs) have shown that foundation models (FMs) can learn highly complex representations of sequences that can be used for downstream generative and discriminative tasks such as text generation and classification.While most FMs focus on text, recent work has shown FMs can be learnt for sequential medical data, e.g. ICD-10 diagnosis codes associated with specific patient visits. These FMs demonstrate improved performance on downstream discriminative disease classification tasks, but cannot be used for generative tasks such as synthesizing artificial patient visits for data augmentation or privacy preserving-preserving data sharing since they utilize BERT-based pre-training. In this paper, we introduce CHIRon, the first generative FM for sequential medical data.CHIRon utilizes causal masking during for pre-training, enabling generative applications, and incorporates a number of architectural improvements and support for additional medical data types (diagnoses, procedures, medications, lab results, place of service, demographics).We show empirically that CHIRon can be used to generate realistic sequential medical data and also outperforms state of the art FMs for sequential medical data on disease classification tasks.

Chat is not available.