Synthetic data (SD) is data that has been generated by a mathematical model to solve downstream data science tasks. SD can be used to address three key problems: 1/ private data release, 2/ data de-biasing and fairness, 3/ data augmentation for boosting the performance of ML models. While SD offers great opportunities for these problems, SD generation is still a developing area of research. Systematic frameworks for SD deployment and evaluation are also still missing. Additionally, despite the substantial advances in Generative AI, the scientific community still lacks a unified understanding of how generative AI can be utilized to generate SD for different modalities.The goal of this workshop is to provide a platform for vigorous discussion from all these different perspectives with research communities in the hope of progressing the ideal of using SD for better and trustworthy ML training. Through submissions and facilitated discussions, we aim to characterize and mitigate the common challenges of SD generation that span numerous application domains. The workshop is jointly organized by academic researchers (University of Cambridge) and industry partners from tech (Amazon AI).
Live content is unavailable. Log in and register to view live content