Competition
Privacy Preserving Federated Learning Document VQA
Dimosthenis Karatzas · Rubèn Tito · Lei Kang · Mohamed Ali Souibgui · Khanh Nguyen · Raouf Kerkouche · Kangsoo Jung · Marlon Tobaben · Joonas Jälkö · Vincent Poulain d'Andecy · Aurélie JOSEPH · Ernest Valveny · Josep Llados · Antti Honkela · Mario Fritz
Room 356
In an era of increasing digitalization and data-driven decision-making, the intersection of document intelligence and privacy has become a critical concern. The Privacy-Preserving Federated Learning Document Visual Question Answering Workshop aims to bring together experts, researchers, and practitioners to explore innovative solutions and discuss the latest advancements in this crucial field.
Join us for insightful invited talks by leading figures in the field. These talks will provide valuable perspectives on the current state of privacy-preserving document intelligence and its future directions. Get an in-depth look at the Privacy-Preserving Document Visual Question Answering Competition that we are currently holding, with a detailed overview of the competition, the dataset, and the competition results. Moreover, the top winners of the competition will have the opportunity to give short talks about their winning methods and strategies. Gain firsthand insights into the innovative approaches that led to their success.
Workshop URL: https://sites.google.com/view/pfldocvqa-neurips-23/home
Associated Competition URL: https://benchmarks.elsa-ai.eu/?ch=2
Schedule
Fri 7:00 a.m. - 7:05 a.m.
|
Opening remarks
(
Opening
)
>
SlidesLive Video |
Dimosthenis Karatzas 🔗 |
Fri 7:05 a.m. - 7:20 a.m.
|
Presentation of the ELSA network
(
Overview
)
>
SlidesLive Video |
Mario Fritz 🔗 |
Fri 7:20 a.m. - 7:50 a.m.
|
Virginia Smith - On Privacy and Personalization in Federated Learning
(
Invited talk
)
>
SlidesLive Video Abstract: A defining trait of federated learning is the presence of heterogeneity, i.e., that data may differ between clients in the network. In this talk I discuss how heterogeneity affects issues of privacy and personalization in federated settings. First, I present our work on private personalized learning in cross-device settings, where we show that personalized FL provides unique benefits when enforcing client-level differential privacy in heterogeneous networks. Second, I explore cross-silo settings, where differences in privacy granularity introduce new dynamics in terms of the privacy/utility trade-offs of personalized FL. I end by discussing our application of these works to privacy-preserving pandemic forecasting in the recent UK-US privacy-enhancing technologies prize challenge, and highlight promising directions of future work on privacy and personalization in FL. Bio: Virginia Smith is the Leonardo Assistant Professor of Machine Learning at Carnegie Mellon University. Her research spans machine learning, optimization, and distributed systems. Virginia’s current work addresses challenges related to optimization, privacy, and robustness in distributed settings to enable trustworthy federated learning at scale. Virginia’s work has been recognized by several awards, including an NSF CAREER Award, MIT TR35 Innovator Award, Intel Rising Star Award, and faculty awards from Google, Apple, and Meta. Prior to CMU, Virginia was a postdoc at Stanford University and received a Ph.D. in Computer Science from UC Berkeley. |
Virginia Smith 🔗 |
Fri 7:50 a.m. - 8:20 a.m.
|
David Doermann - Advancing Privacy and Dataset Augmentation in Medical and Chart Data Using AI-Driven Image Editing
(
Invited talk
)
>
SlidesLive Video In the current landscape where data privacy intersects with the ever-growing demand for comprehensive datasets, this talk introduces a novel approach employing large language models (LLMs) for image-based editing, targeting medical images and chart image data. This technique emphasizes preserving data integrity while ensuring the utmost privacy and confidentiality. We delve into utilizing LLMs to interpret and manipulate data visualizations, including diverse chart forms like bar graphs, pie charts, and line plots, alongside medical imagery such as X-rays, MRIs, and CT scans. The LLMs discern and subtly modify particular data elements or features within these images. In chart data, this pertains to altering specific data points without skewing the overarching trends or statistical relevance. Medical imagery involves modifying or removing identifiable markers while retaining diagnostic value. A significant aspect of our methodology is its role in data augmentation. For chart data, we generate synthetic images mirroring real data trends and enhancing datasets while adhering to privacy norms. In the realm of medical data, we create realistic, anonymized images that expand the scope of datasets, crucial in areas plagued by data scarcity, such as rare diseases or specific medical conditions. This talk will showcase the efficacy of our approach through various case studies and experimental analyses. We will also address the ethical implications and potential constraints of using AI in this context, providing a glimpse into the future of secure data handling and augmentation in the AI era. This presentation is an invitation to explore the intersection of AI and data privacy, specifically in medical and chart data. It is a journey through the innovative ways large language models are redefining data enhancement and privacy preservation. |
DAVID DOERMANN 🔗 |
Fri 8:20 a.m. - 8:30 a.m.
|
Coffee Break
(
Break
)
>
|
🔗 |
Fri 8:30 a.m. - 9:00 a.m.
|
Florien Tramèr - Privacy side-channels in machine learning systems
(
Invited talk
)
>
SlidesLive Video Abstract: Most current approaches for protecting privacy in machine learning (ML) assume that models exist in a vacuum, when in reality, ML models are part of larger systems that include components for training data filtering, output monitoring, and more. In this work, we introduce privacy side channels: attacks that exploit these system-level components to extract private information at far higher rates than is otherwise possible for standalone models. Bio: Florian Tramèr is an assistant professor of computer science at ETH Zurich. His research interests lie in Computer Security, Cryptography and Machine Learning security. In his current work, he studies the worst-case behavior of Deep Learning systems from an adversarial perspective, to understand and mitigate long-term threats to the safety and privacy of users. |
Florian Tramer 🔗 |
Fri 9:00 a.m. - 9:25 a.m.
|
Overview of the competition: Datasets, Metrics, and Results
(
Presentation
)
>
SlidesLive Video |
Marlon Tobaben · Khanh Nguyen 🔗 |
Fri 9:25 a.m. - 9:35 a.m.
|
Presentation from Participants – Winner FL Only Track: Communication Tuned Low-Rank Adaptation of Document Encoder
(
Presentation
)
>
SlidesLive Video |
Aashiq Muhamed 🔗 |
Fri 9:35 a.m. - 9:45 a.m.
|
Presentation from Participants – Winner FL+DP Track: Differentially Private Federated Learning with LoRA
(
Presentation
)
>
|
Ragul N 🔗 |
Fri 9:45 a.m. - 9:55 a.m.
|
Presentation from Participants – Runner-up Track 1 & 2: FedShampoo & DP-CLGECL
(
Presentation
)
>
SlidesLive Video |
Takumi Fukami · Yusuke Yamasaki 🔗 |
Fri 9:55 a.m. - 10:00 a.m.
|
Closing Remarks
(
Closing
)
>
SlidesLive Video |
Antti Honkela 🔗 |