Skip to yearly menu bar Skip to main content


INSPECT: A Multimodal Dataset for Patient Outcome Prediction of Pulmonary Embolisms

Shih-Cheng Huang · Zepeng Huo · Ethan Steinberg · Chia-Chun Chiang · Curtis Langlotz · Matthew Lungren · Serena Yeung · Nigam Shah · Jason Fries

Great Hall & Hall B1+B2 (level 1) #443
[ ]
[ Paper [ Poster [ OpenReview
Thu 14 Dec 3 p.m. PST — 5 p.m. PST


Synthesizing information from various data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of pulmonary embolism (PE) patients, along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including CT images, sections of radiology reports, and structured electronic health record (EHR) data (including demographics, diagnoses, procedures, and vitals). Using our provided dataset, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and fused models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best our knowledge, INSPECT is the largest multimodal dataset for enabling reproducible research on strategies for integrating 3D medical imaging and EHR data.

Chat is not available.