Timezone: »
We provide a differentially private algorithm for producing synthetic data simultaneously useful for multiple tasks: marginal queries and multitask machine learning (ML). A key innovation in our algorithm is the ability to directly handle numerical features, in contrast to a number of related prior approaches which require numerical features to be first converted into {high cardinality} categorical features via {a binning strategy}. Higher binning granularity is required for better accuracy, but this negatively impacts scalability. Eliminating the need for binning allows us to produce synthetic data preserving large numbers of statistical queries such as marginals on numerical features, and class conditional linear threshold queries. Preserving the latter means that the fraction of points of each class label above a particular half-space is roughly the same in both the real and synthetic data. This is the property that is needed to train a linear classifier in a multitask setting. Our algorithm also allows us to produce high quality synthetic data for mixed marginal queries, that combine both categorical and numerical features. Our method consistently runs 2-5x faster than the best comparable techniques, and provides significant accuracy improvements in both marginal queries and linear prediction tasks for mixed-type datasets.
Author Information
Giuseppe Vietri (University of Minnesota)
Cedric Archambeau (Amazon Web Services)
Sergul Aydore (AWS AI)
William Brown (Columbia University)
Michael Kearns (University of Pennsylvania)
Michael Kearns is Professor and National Center Chair in the Computer and Information Science department at the University of Pennsylvania. His research interests include topics in machine learning, algorithmic game theory, social networks, and computational finance. Prior to joining the Penn faculty, he spent a decade at AT&T/Bell Labs, where he was head of AI Research. He is co-director of Penn’s Warren Center for Network and Data Sciences (warrencenter.upenn.edu), and founder of Penn’s Networked and Social Systems Engineering (NETS) undergraduate program (www.nets.upenn.edu). Kearns consults extensively in technology and finance, and is a Fellow of the Association for the Advancement of Artificial Intelligence and the American Academy of Arts and Sciences.
Aaron Roth (University of Pennsylvania)
Ankit Siva (Amazon)
Shuai Tang (Amazon Web Services)
Steven Wu (Carnegie Mellon University)
More from the Same Authors
-
2021 : What Would the Expert do()?: Causal Imitation Learning »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 : Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods »
Terrance Liu · Giuseppe Vietri · Steven Wu -
2021 : What Would the Expert do()?: Causal Imitation Learning »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 : What Would the Expert $do(\cdot)$?: Causal Imitation Learning »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 : Bayesian Persuasion for Algorithmic Recourse »
Keegan Harris · Valerie Chen · Joon Sik Kim · Ameet Talwalkar · Hoda Heidari · Steven Wu -
2021 : Spectrally Adaptive Common Spatial Patterns »
Mahta Mousavi · Eric Lybrand · Shuangquan Feng · Shuai Tang · Rayan Saab · Virginia de Sa -
2021 : What Would the Expert $do(\cdot)$?: Causal Imitation Learning »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 : Information Discrepancy in Strategic Learning »
Yahav Bechavod · Chara Podimata · Steven Wu · Juba Ziani -
2021 : Gaming Helps! Learning from Strategic Interactions in Natural Dynamics »
Yahav Bechavod · Katrina Ligett · Steven Wu · Juba Ziani -
2021 : Bayesian Persuasion for Algorithmic Recourse »
Keegan Harris · Valerie Chen · Joon Kim · Ameet S Talwalkar · Hoda Heidari · Steven Wu -
2021 : What Would the Expert $do(\cdot)$?: Causal Imitation Learning »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 : What Would the Expert $do(\cdot)$?: Causal Imitation Learning »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 : What Would the Expert $do(\cdot)$?: Causal Imitation Learning »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 : Information Discrepancy in Strategic Learning »
Yahav Bechavod · Chara Podimata · Steven Wu · Juba Ziani -
2021 : Gaming Helps! Learning from Strategic Interactions in Natural Dynamics »
Yahav Bechavod · Katrina Ligett · Steven Wu · Juba Ziani -
2021 : Bayesian Persuasion for Algorithmic Recourse »
Keegan Harris · Valerie Chen · Joon Kim · Ameet S Talwalkar · Hoda Heidari · Steven Wu -
2022 : Strategy-Aware Contextual Bandits »
Keegan Harris · Chara Podimata · Steven Wu -
2022 : Choosing Public Datasets for Private Machine Learning via Gradient Subspace Distance »
Xin Gu · Gautam Kamath · Steven Wu -
2022 : Strategy-Aware Contextual Bandits »
Keegan Harris · Chara Podimata · Steven Wu -
2022 : Strategy-Aware Contextual Bandits »
Keegan Harris · Chara Podimata · Steven Wu -
2022 : Differentially Private Gradient Boosting on Linear Learners for Tabular Data »
Saeyoung Rho · Shuai Tang · Sergul Aydore · Michael Kearns · Aaron Roth · Yu-Xiang Wang · Steven Wu · Cedric Archambeau -
2022 : Counterfactual Decision Support Under Treatment-Conditional Outcome Measurement Error »
Luke Guerdan · Amanda Coston · Kenneth Holstein · Steven Wu -
2023 Workshop: Synthetic Data Generation with Generative AI »
Sergul Aydore · Zhaozhi Qian · Mihaela van der Schaar -
2022 Affinity Workshop: Women in Machine Learning - Virtual »
Mariam Arab · Konstantina Palla · Sergul Aydore · Gloria Namanya · Beliz Gunel · Kimia Nadjahi · Soomin Aga Lee -
2022 : Achievements and Challenges Part 2/2 »
Zhaozhi Qian · Tucker Balch · Sergul Aydore -
2022 : Privacy Panel »
Mario Fritz · Katrina Ligett · Vamsi Potluru · Shuai Tang -
2022 Workshop: Synthetic Data for Empowering ML Research »
Mihaela van der Schaar · Zhaozhi Qian · Sergul Aydore · Dimitris Vlitas · Dino Oglic · Tucker Balch -
2022 Poster: Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications »
Daniel Lee · Georgy Noarov · Mallesh Pai · Aaron Roth -
2022 Poster: On Privacy and Personalization in Cross-Silo Federated Learning »
Ken Liu · Shengyuan Hu · Steven Wu · Virginia Smith -
2022 Poster: Memory Efficient Continual Learning with Transformers »
Beyza Ermis · Giovanni Zappella · Martin Wistuba · Aditya Rawal · Cedric Archambeau -
2022 Poster: Brownian Noise Reduction: Maximizing Privacy Subject to Accuracy Constraints »
Justin Whitehouse · Aaditya Ramdas · Steven Wu · Ryan Rogers -
2022 Poster: Practical Adversarial Multivalid Conformal Prediction »
Osbert Bastani · Varun Gupta · Christopher Jung · Georgy Noarov · Ramya Ramalingam · Aaron Roth -
2022 Poster: Incentivizing Combinatorial Bandit Exploration »
Xinyan Hu · Dung Ngo · Aleksandrs Slivkins · Steven Wu -
2022 Poster: Diversified Recommendations for Agents with Adaptive Preferences »
William Brown · Arpit Agarwal -
2022 Poster: Sequence Model Imitation Learning with Unobserved Contexts »
Gokul Swamy · Sanjiban Choudhury · J. Bagnell · Steven Wu -
2022 Poster: Minimax Optimal Online Imitation Learning via Replay Estimation »
Gokul Swamy · Nived Rajaraman · Matt Peng · Sanjiban Choudhury · J. Bagnell · Steven Wu · Jiantao Jiao · Kannan Ramchandran -
2022 Poster: Bayesian Persuasion for Algorithmic Recourse »
Keegan Harris · Valerie Chen · Joon Kim · Ameet Talwalkar · Hoda Heidari · Steven Wu -
2022 Affinity Workshop: Women in Machine Learning »
Mariam Arab · Konstantina Palla · Sergul Aydore · Gloria Namanya · Beliz Gunel · Kimia Nadjahi · Soomin Aga Lee -
2021 : Panel »
Oluwaseyi Feyisetan · Helen Nissenbaum · Aaron Roth · Christine Task -
2021 : Leveraging strategic interactions for causal discovery »
Steven Wu -
2021 : Bayesian Persuasion for Algorithmic Recourse »
Keegan Harris · Valerie Chen · Joon Sik Kim · Ameet Talwalkar · Hoda Heidari · Steven Wu -
2021 : Invited talk: Aaron Roth (UPenn / Amazon): Machine Unlearning. »
Aaron Roth -
2021 : What Would the Expert do()?: Causal Imitation Learning »
Gokul Swamy · Sanjiban Choudhury · James Bagnell · Steven Wu -
2021 Poster: Adaptive Machine Unlearning »
Varun Gupta · Christopher Jung · Seth Neel · Aaron Roth · Saeed Sharifi-Malvajerdi · Chris Waites -
2021 Poster: Iterative Methods for Private Synthetic Data: Unifying Framework and New Methods »
Terrance Liu · Giuseppe Vietri · Steven Wu -
2021 Poster: Stateful Strategic Regression »
Keegan Harris · Hoda Heidari · Steven Wu -
2021 Poster: Adversarial Robustness with Non-uniform Perturbations »
Ecenaz Erdemir · Jeffrey Bickford · Luca Melis · Sergul Aydore -
2020 : Invited Talk 7:Fair Portfolio Design »
Michael Kearns -
2020 : Keynote: Michael Kearns »
Michael Kearns -
2020 Poster: Metric-Free Individual Fairness in Online Learning »
Yahav Bechavod · Christopher Jung · Steven Wu -
2020 Poster: Understanding Gradient Clipping in Private SGD: A Geometric Perspective »
Xiangyi Chen · Steven Wu · Mingyi Hong -
2020 Poster: Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms »
Xiangyi Chen · Tiancong Chen · Haoran Sun · Steven Wu · Mingyi Hong -
2020 Spotlight: Understanding Gradient Clipping in Private SGD: A Geometric Perspective »
Xiangyi Chen · Steven Wu · Mingyi Hong -
2020 Oral: Metric-Free Individual Fairness in Online Learning »
Yahav Bechavod · Christopher Jung · Steven Wu -
2020 Session: Orals & Spotlights Track 16: Continual/Meta/Misc Learning »
Laurent Charlin · Cedric Archambeau -
2020 Session: Orals & Spotlights Track 20: Social/Adversarial Learning »
Steven Wu · Miro Dudik -
2020 : QA Long Presentation III »
Giuseppe Vietri · Daniel Palomino · María Belén Guaranda · Andres Carvallo -
2020 : Private Reinforcement Learning with PAC and Regret Guarantees »
Giuseppe Vietri -
2019 : Aaron Roth, "Average Individual Fairness" »
Aaron Roth -
2019 : Poster Session »
Clement Canonne · Kwang-Sung Jun · Seth Neel · Di Wang · Giuseppe Vietri · Liwei Song · Jonathan Lebensold · Huanyu Zhang · Lovedeep Gondara · Ang Li · FatemehSadat Mireshghallah · Jinshuo Dong · Anand D Sarwate · Antti Koskela · Joonas Jälkö · Matt Kusner · Dingfan Chen · Mi Jung Park · Ashwin Machanavajjhala · Jayashree Kalpathy-Cramer · · Vitaly Feldman · Andrew Tomkins · Hai Phan · Hossein Esfandiari · Mimansa Jaiswal · Mrinank Sharma · Jeff Druce · Casey Meehan · Zhengli Zhao · Hsiang Hsu · Davis Railsback · Abraham Flaxman · · Julius Adebayo · Aleksandra Korolova · Jiaming Xu · Naoise Holohan · Samyadeep Basu · Matthew Joseph · My Thai · Xiaoqian Yang · Ellen Vitercik · Michael Hutchinson · Chenghong Wang · Gregory Yauney · Yuchao Tao · Chao Jin · Si Kai Lee · Audra McMillan · Rauf Izmailov · Jiayi Guo · Siddharth Swaroop · Tribhuvanesh Orekondy · Hadi Esmaeilzadeh · Kevin Procopio · Alkis Polyzotis · Jafar Mohammadi · Nitin Agrawal -
2019 : Invited talk #3 »
Aaron Roth -
2019 Poster: Average Individual Fairness: Algorithms, Generalization and Experiments »
Saeed Sharifi-Malvajerdi · Michael Kearns · Aaron Roth -
2019 Poster: Dynamic Local Regret for Non-convex Online Forecasting »
Sergul Aydore · Tianhao Zhu · Dean Foster -
2019 Poster: Equal Opportunity in Online Classification with Partial Feedback »
Yahav Bechavod · Katrina Ligett · Aaron Roth · Bo Waggoner · Steven Wu -
2019 Poster: Random Quadratic Forms with Dependence: Applications to Restricted Isometry and Beyond »
Arindam Banerjee · Qilong Gu · Vidyashankar Sivakumar · Steven Wu -
2019 Oral: Average Individual Fairness: Algorithms, Generalization and Experiments »
Saeed Sharifi-Malvajerdi · Michael Kearns · Aaron Roth -
2019 Poster: Private Hypothesis Selection »
Mark Bun · Gautam Kamath · Thomas Steinke · Steven Wu -
2019 Poster: Locally Private Gaussian Estimation »
Matthew Joseph · Janardhan Kulkarni · Jieming Mao · Steven Wu -
2018 : Shuai Tang, "Learning Distributed Representations of Symbolic Structure Using Binding and Unbinding Operations" »
Shuai Tang -
2018 : Coffee break + posters 2 »
Jan Kremer · Erik McDermott · Brandon Carter · Albert Zeyer · Andreas Krug · Paul Pu Liang · Katherine Lee · Dominika Basaj · Abelino Jimenez · Lisa Fan · Gautam Bhattacharya · Tzeviya S Fuchs · David Gifford · Loren Lugosch · Orhan Firat · Benjamin Baer · JAHANGIR ALAM · Jamin Shin · Mirco Ravanelli · Paul Smolensky · Zining Zhu · Hamid Eghbal-zadeh · Skyler Seto · Imran Sheikh · Joao Felipe Santos · Yonatan Belinkov · Nadir Durrani · Oiwi Parker Jones · Shuai Tang · André Merboldt · Titouan Parcollet · Wei-Ning Hsu · Krishna Pillutla · Ehsan Hosseini-Asl · Monica Dinculescu · Alexander Amini · Ying Zhang · Taoli Cheng · Alain Tapp -
2018 : Coffee break + posters 1 »
Samuel Myer · Wei-Ning Hsu · Jialu Li · Monica Dinculescu · Lea Schönherr · Ehsan Hosseini-Asl · Skyler Seto · Oiwi Parker Jones · Imran Sheikh · Thomas Manzini · Yonatan Belinkov · Nadir Durrani · Alexander Amini · Johanna Hansen · Gabi Shalev · Jamin Shin · Paul Smolensky · Lisa Fan · Zining Zhu · Hamid Eghbal-zadeh · Benjamin Baer · Abelino Jimenez · Joao Felipe Santos · Jan Kremer · Erik McDermott · Andreas Krug · Tzeviya S Fuchs · Shuai Tang · Brandon Carter · David Gifford · Albert Zeyer · André Merboldt · Krishna Pillutla · Katherine Lee · Titouan Parcollet · Orhan Firat · Gautam Bhattacharya · JAHANGIR ALAM · Mirco Ravanelli -
2018 : Invited Talk 3: Fairness in Allocation Problems »
Michael Kearns -
2018 Poster: Online Learning with an Unknown Fairness Metric »
Stephen Gillen · Christopher Jung · Michael Kearns · Aaron Roth -
2017 : Spotlights »
Antti Kangasrääsiö · Richard Everett · Yitao Liang · Yang Cai · Steven Wu · Vidya Muthukumar · Sven Schmit -
2017 : Industry talk: Cedric Archambeau (TBA) »
Cedric Archambeau -
2017 Poster: Accuracy First: Selecting a Differential Privacy Level for Accuracy Constrained ERM »
Katrina Ligett · Seth Neel · Aaron Roth · Bo Waggoner · Steven Wu -
2016 Workshop: Adaptive Data Analysis »
Vitaly Feldman · Aaditya Ramdas · Aaron Roth · Adam Smith -
2016 Poster: Learning from Rational Behavior: Predicting Solutions to Unknown Linear Programs »
Shahin Jabbari · Ryan Rogers · Aaron Roth · Steven Wu -
2016 Poster: Fairness in Learning: Classic and Contextual Bandits »
Matthew Joseph · Michael Kearns · Jamie Morgenstern · Aaron Roth -
2015 Poster: Generalization in Adaptive Data Analysis and Holdout Reuse »
Cynthia Dwork · Vitaly Feldman · Moritz Hardt · Toni Pitassi · Omer Reingold · Aaron Roth -
2014 Workshop: Learning Semantics »
Cedric Archambeau · Antoine Bordes · Leon Bottou · Chris J Burges · David Grangier -
2014 Invited Talk: Games, Networks, and People »
Michael Kearns -
2013 Poster: Marginals-to-Models Reducibility »
Tim Roughgarden · Michael Kearns -
2011 Workshop: Choice Models and Preference Learning »
Jean-Marc Andreoli · Cedric Archambeau · Guillaume Bouchard · Shengbo Guo · Kristian Kersting · Scott Sanner · Martin Szummer · Paolo Viappiani · Onno Zoeter -
2011 Session: Spotlight Session 7 »
Cedric Archambeau -
2011 Session: Oral Session 9 »
Cedric Archambeau -
2011 Poster: Sparse Bayesian Multi-Task Learning »
Cedric Archambeau · Shengbo Guo · Onno Zoeter -
2008 Poster: Sparse probabilistic projections »
Cedric Archambeau · Francis Bach -
2008 Spotlight: Sparse probabilistic projections »
Cedric Archambeau · Francis Bach -
2007 Spotlight: Privacy-Preserving Belief Propagation and Sampling »
Michael Kearns · Jinsong Tan · Jennifer Wortman Vaughan -
2007 Poster: Privacy-Preserving Belief Propagation and Sampling »
Michael Kearns · Jinsong Tan · Jennifer Wortman Vaughan -
2007 Poster: Variational Inference for Diffusion Processes »
Cedric Archambeau · Manfred Opper · Yuan Shen · Dan Cornford · John Shawe-Taylor -
2006 Workshop: Dynamical Systems, Stochastic Processes and Bayesian Inference »
Manfred Opper · Cedric Archambeau · John Shawe-Taylor -
2006 Poster: Learning from Multiple Sources »
Yacov Crammer · Michael Kearns · Jennifer Wortman Vaughan -
2006 Poster: A Small World Threshold for Economic Network Formation »
Eyal Even-Dar · Michael Kearns