Timezone: »

A source data privacy framework for synthetic clinical trial data
Afrah Shafquat · Jason Mezey · Mandis Beigi · Jimeng Sun · Jacob Aptekar

Fri Dec 02 07:42 AM -- 07:44 AM (PST) @
Event URL: https://openreview.net/forum?id=7sKxo8mc1pW »

Synthetic clinical trial data create opportunities for data sharing, cross-collaboration, and innovation for these valuable, siloed data sources. While the value of synthetic clinical trial data relies on the privacy preservation it offers the clinical trial participants, the true degree of privacy has been questioned in recent literature. Given the highly sensitive nature of clinical trial data, especially their content composing private health information, there is an urgent need for a framework specifically designed to provide guaranteed levels of privacy for synthetic datasets generated from clinical trial data. In this paper, we propose a practical privacy framework that ensures synthetic clinical trial data privacy at the level of the source data by design and provides objective, measurable bounds on the disclosure risks through a combination of technical, policy, and algorithmic controls. The proposed framework enforces privacy prior to the generation of synthetic datasets and therefore complements the privacy preserving attributes intrinsic to the algorithms used for synthetic data generation. To demonstrate how the components of the framework address the privacy requirements needed for clinical trial data, we discuss how this privacy system responds to a set of realistic adversarial scenarios. Ultimately, we believe the proposed framework can foster more privacy research in clinical trial data sharing.

Author Information

Afrah Shafquat (Medidata, a Dassault Systèmes company)
Afrah Shafquat

Afrah Shafquat is a Sr. Data Scientist at Medidata AI where her work is focused on synthetic clinical trial data generation and innovative machine-learning models to further understanding of clinical and healthcare datasets. She has a PhD in Computational Biology (2020) from Cornell University where her dissertation focused on inferring errors in disease diagnoses using Bayesian hierarchical models. She also has an SB in Biological Engineering from MIT.

Jason Mezey
Mandis Beigi (Medidata (Dassault Systemes))
Mandis Beigi

Mandis Beigi, PhD is a senior data scientist working in the Trial Design solutions group at Medidata Solutions (Dassault Systemes). She currently works on synthetic data generation from clinical trials data and privacy risks of synthetic data. She has over 20 years of prior experience at IBM Research working in various fields such as computer vision, sensor data analytics, high dimensional data analytics, anomaly detection and rule based system and network management. She received her masters and PhD in Electrical Engineering at Columbia University.

Jimeng Sun (University of Illinois, Urbana Champaign)
Jacob Aptekar

More from the Same Authors