Timezone: »
Self- and semi-supervised learning frameworks have made significant progress in training machine learning models with limited labeled data in image and language domains. These methods heavily rely on the unique structure in the domain datasets (such as spatial relationships in images or semantic relationships in language). They are not adaptable to general tabular data which does not have the same explicit structure as image and language data. In this paper, we fill this gap by proposing novel self- and semi-supervised learning frameworks for tabular data, which we refer to collectively as VIME (Value Imputation and Mask Estimation). We create a novel pretext task of estimating mask vectors from corrupted tabular data in addition to the reconstruction pretext task for self-supervised learning. We also introduce a novel tabular data augmentation method for self- and semi-supervised learning frameworks. In experiments, we evaluate the proposed framework in multiple tabular datasets from various application domains, such as genomics and clinical data. VIME exceeds state-of-the-art performance in comparison to the existing baseline methods.
Author Information
Jinsung Yoon (Google)
I am a research scientist at Google Cloud AI. I am currently working on diverse machine learning research topics such as generative models, self- and semi-supervised learning, model interpretation, data imputation, and synthetic data generation. Previously, I worked on machine learning for medicine with Professor Mihaela van der Schaar as a graduate student researcher in UCLA Electrical and Computer Engineering Department. I received my Ph.D. and M.S. in Electrical and Computer Engineering Department at UCLA, and B.S. in Electrical and Computer Engineering at Seoul National University (SNU).
Yao Zhang (University of Cambridge)
James Jordon (University of Oxford)
Mihaela van der Schaar (University of Cambridge)
More from the Same Authors
-
2021 Spotlight: On Inductive Biases for Heterogeneous Treatment Effect Estimation »
Alicia Curth · Mihaela van der Schaar -
2021 Spotlight: Explaining Latent Representations with a Corpus of Examples »
Jonathan Crabbe · Zhaozhi Qian · Fergus Imrie · Mihaela van der Schaar -
2021 : Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation »
Alicia Curth · David Svensson · Jim Weatherall · Mihaela van der Schaar -
2021 : The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation »
Alex Chan · Ioana Bica · Alihan Hüyük · Daniel Jarrett · Mihaela van der Schaar -
2022 : Adaptively Identifying Patient Populations With Treatment Benefit in Clinical Trials »
Alicia Curth · Alihan Hüyük · Mihaela van der Schaar -
2022 : D-CIPHER: Discovery of Closed-form Partial Differential Equations »
Krzysztof Kacprzyk · Zhaozhi Qian · Mihaela van der Schaar -
2022 : Provable Re-Identification Privacy »
Zachary Izzo · Jinsung Yoon · Sercan Arik · James Zou -
2022 : Practical Approaches for Fair Learning with Multitype and Multivariate Sensitive Attributes »
Tennison Liu · Alex Chan · Boris van Breugel · Mihaela van der Schaar -
2022 : Closing Remarks »
Cheng Zhang · Mihaela van der Schaar -
2022 : Panel Discussion »
Cheng Zhang · Mihaela van der Schaar · Ilya Shpitser · Aapo Hyvarinen · Yoshua Bengio · Bernhard Schölkopf -
2022 Workshop: Causal Machine Learning for Real-World Impact »
Nick Pawlowski · Jeroen Berrevoets · Caroline Uhler · Kun Zhang · Mihaela van der Schaar · Cheng Zhang -
2022 : Opening Remarks »
Cheng Zhang · Mihaela van der Schaar -
2022 Workshop: Synthetic Data for Empowering ML Research »
Mihaela van der Schaar · Zhaozhi Qian · Sergul Aydore · Dimitris Vlitas · Dino Oglic · Tucker Balch -
2022 Poster: Concept Activation Regions: A Generalized Framework For Concept-Based Explanations »
Jonathan Crabbé · Mihaela van der Schaar -
2022 Poster: Online Decision Mediation »
Daniel Jarrett · Alihan Hüyük · Mihaela van der Schaar -
2022 Poster: Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability »
Jonathan Crabbé · Alicia Curth · Ioana Bica · Mihaela van der Schaar -
2022 Poster: Transfer Learning on Heterogeneous Feature Spaces for Treatment Effects Estimation »
Ioana Bica · Mihaela van der Schaar -
2022 Poster: Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data »
Nabeel Seedat · Jonathan Crabbé · Ioana Bica · Mihaela van der Schaar -
2022 Poster: Composite Feature Selection Using Deep Ensembles »
Fergus Imrie · Alexander Norcliffe · Pietro Lió · Mihaela van der Schaar -
2022 Poster: Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning »
Alex Chan · Mihaela van der Schaar -
2021 : Invited talk 8 »
Mihaela van der Schaar -
2021 : Invited talk #5: Mihaela van der Schaar »
Mihaela van der Schaar -
2021 : Mihaela Van Der Schaar Q&A »
Mihaela van der Schaar -
2021 : Mihaela Van Der Schaar »
Mihaela van der Schaar -
2021 Poster: Invariant Causal Imitation Learning for Generalizable Policies »
Ioana Bica · Daniel Jarrett · Mihaela van der Schaar -
2021 Poster: Explaining Latent Representations with a Corpus of Examples »
Jonathan Crabbe · Zhaozhi Qian · Fergus Imrie · Mihaela van der Schaar -
2021 Poster: Time-series Generation by Contrastive Imitation »
Daniel Jarrett · Ioana Bica · Mihaela van der Schaar -
2021 Poster: Closing the loop in medical decision support by understanding clinical decision-making: A case study on organ transplantation »
Yuchao Qin · Fergus Imrie · Alihan Hüyük · Daniel Jarrett · alexander gimson · Mihaela van der Schaar -
2021 Poster: DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks »
Boris van Breugel · Trent Kyono · Jeroen Berrevoets · Mihaela van der Schaar -
2021 Poster: MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms »
Trent Kyono · Yao Zhang · Alexis Bellot · Mihaela van der Schaar -
2021 Poster: Conformal Time-series Forecasting »
Kamile Stankeviciute · Ahmed M. Alaa · Mihaela van der Schaar -
2021 Poster: Integrating Expert ODEs into Neural ODEs: Pharmacology and Disease Progression »
Zhaozhi Qian · William Zame · Lucas Fleuren · Paul Elbers · Mihaela van der Schaar -
2021 Poster: SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data »
Alicia Curth · Changhee Lee · Mihaela van der Schaar -
2021 Poster: On Inductive Biases for Heterogeneous Treatment Effect Estimation »
Alicia Curth · Mihaela van der Schaar -
2021 Poster: SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes »
Zhaozhi Qian · Yao Zhang · Ioana Bica · Angela Wood · Mihaela van der Schaar -
2021 Poster: Estimating Multi-cause Treatment Effects via Single-cause Perturbation »
Zhaozhi Qian · Alicia Curth · Mihaela van der Schaar -
2020 : Closing remarks »
James Jordon -
2020 : What we learned from the Hide-and-Seek privacy challenge »
James Jordon -
2020 : Synthetic data in the healthcare setting »
James Jordon -
2020 : The importance of synthetic data »
James Jordon -
2020 : Introducing the Hide-and-Seek privacy challenge »
James Jordon -
2020 Poster: Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification »
Hyun-Suk Lee · Yao Zhang · William Zame · Cong Shen · Jang-Won Lee · Mihaela van der Schaar -
2020 Poster: Learning outside the Black-Box: The pursuit of interpretable models »
Jonathan Crabbe · Yao Zhang · William Zame · Mihaela van der Schaar -
2020 Poster: Strictly Batch Imitation Learning by Energy-based Distribution Matching »
Daniel Jarrett · Ioana Bica · Mihaela van der Schaar -
2020 Poster: Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Networks »
Ioana Bica · James Jordon · Mihaela van der Schaar -
2020 Poster: Gradient Regularized V-Learning for Dynamic Treatment Regimes »
Yao Zhang · Mihaela van der Schaar -
2020 Poster: OrganITE: Optimal transplant donor organ offering using an individual treatment effect »
Jeroen Berrevoets · James Jordon · Ioana Bica · alexander gimson · Mihaela van der Schaar -
2020 : Q&A for invited speaker, Mihaela van der Schaar »
Mihaela van der Schaar -
2020 : Interpretable AutoML: Powering the machine learning revolution in healthcare in the era of Covid-19 and beyond »
Mihaela van der Schaar -
2020 Poster: CASTLE: Regularization via Auxiliary Causal Graph Discovery »
Trent Kyono · Yao Zhang · Mihaela van der Schaar -
2020 Poster: When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes »
Zhaozhi Qian · Ahmed Alaa · Mihaela van der Schaar -
2020 Oral: When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes »
Zhaozhi Qian · Ahmed Alaa · Mihaela van der Schaar -
2019 Poster: Attentive State-Space Modeling of Disease Progression »
Ahmed Alaa · Mihaela van der Schaar -
2019 Poster: Demystifying Black-box Models with Symbolic Metamodels »
Ahmed Alaa · Mihaela van der Schaar -
2019 Poster: Time-series Generative Adversarial Networks »
Jinsung Yoon · Daniel Jarrett · Mihaela van der Schaar -
2019 Poster: Differentially Private Bagging: Improved utility and cheaper privacy than subsample-and-aggregate »
James Jordon · Jinsung Yoon · Mihaela van der Schaar -
2019 Poster: Conditional Independence Testing using Generative Adversarial Networks »
Alexis Bellot · Mihaela van der Schaar -
2019 Spotlight: Conditional Independence Testing using Generative Adversarial Networks »
Alexis Bellot · Mihaela van der Schaar -
2018 Poster: Multitask Boosting for Survival Analysis with Competing Risks »
Alexis Bellot · Mihaela van der Schaar -
2018 Poster: Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks »
Bryan Lim · Ahmed M. Alaa · Mihaela van der Schaar -
2017 : Coffee break and Poster Session II »
Mohamed Kane · Albert Haque · Vagelis Papalexakis · John Guibas · Peter Li · Carlos Arias · Eric Nalisnick · Padhraic Smyth · Frank Rudzicz · Xia Zhu · Theodore Willke · Noemie Elhadad · Hans Raffauf · Harini Suresh · Paroma Varma · Yisong Yue · Ognjen (Oggi) Rudovic · Luca Foschini · Syed Rameel Ahmad · Hasham ul Haq · Valerio Maggio · Giuseppe Jurman · Sonali Parbhoo · Pouya Bashivan · Jyoti Islam · Mirco Musolesi · Chris Wu · Alexander Ratner · Jared Dunnmon · Cristóbal Esteban · Aram Galstyan · Greg Ver Steeg · Hrant Khachatrian · Marc Górriz · Mihaela van der Schaar · Anton Nemchenko · Manasi Patwardhan · Tanay Tandon -
2017 Poster: DPSCREEN: Dynamic Personalized Screening »
Kartik Ahuja · William Zame · Mihaela van der Schaar -
2017 Poster: Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks »
Ahmed M. Alaa · Mihaela van der Schaar -
2017 Spotlight: Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks »
Ahmed M. Alaa · Mihaela van der Schaar -
2017 Poster: Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes »
Ahmed M. Alaa · Mihaela van der Schaar -
2016 Poster: Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition »
Ahmed M. Alaa · Mihaela van der Schaar -
2016 Poster: A Non-parametric Learning Method for Confidently Estimating Patient's Clinical State and Dynamics »
William Hoiles · Mihaela van der Schaar -
2014 Poster: Discovering, Learning and Exploiting Relevance »
Cem Tekin · Mihaela van der Schaar