Many real-world datasets contain missing entries and mixed data types including categorical and ordered (e.g. continuous and ordinal) variables. Imputing the missing entries is necessary, since many data analysis pipelines require complete data, but challenging especially for mixed data. This paper proposes a probabilistic imputation method using an extended Gaussian copula model that supports both single and multiple imputation. The method models mixed categorical and ordered data using a latent Gaussian distribution. The unordered characteristics of categorical variables is explicitly modeled using the argmax operator. The method makes no assumptions on the data marginals nor does it require tuning any hyperparameters. Experimental results on synthetic and real datasets show that imputation with the extended Gaussian copula outperforms the current state-of-the-art for both categorical and ordered variables in mixed data.
Yuxuan Zhao (Two Sigma Investments)
Alex Townsend (Cornell University)
Prof. Alex Townsend is the Goenka Family Tenure-Track Assistant Professor at Cornell University in the Mathematics Department. His research is in Applied Mathematics and focuses on spectral methods, low-rank techniques, fast transforms, and theoretical aspects of deep learning. Prior to Cornell, he was an Applied Math instructor at MIT (2014-2016) and a DPhil student at the University of Oxford (2010-2014). He was awarded an NSF CAREER in 2021, a SIGEST paper award in 2019, the SIAG/LA Early Career Prize in applicable linear algebra in 2018, and the Leslie Fox Prize in numerical analysis in 2015.
Madeleine Udell (Cornell)
More from the Same Authors
2023 : In Defense of Zero Imputation for Tabular Deep Learning »
John Van Ness · Madeleine Udell
2022 Poster: TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets »
Chengrun Yang · Gabriel Bender · Hanxiao Liu · Pieter-Jan Kindermans · Madeleine Udell · Yifeng Lu · Quoc V Le · Da Huang
2020 Poster: Matrix Completion with Quantified Uncertainty through Low Rank Gaussian Copula »
Yuxuan Zhao · Madeleine Udell
2020 Poster: Rational neural networks »
Nicolas Boulle · Yuji Nakatsukasa · Alex Townsend