Evaluating robustness of tabular models under meta-features based shifts
Abstract
Machine learning models for tabular data often encounter distribution shifts after deployment, yet target OOD samples are frequently unavailable at evaluation time. We propose a principled protocol that leverages aggregate dataset meta-features (MFs) to construct useful proxy OOD tests from in-distribution data. Our approach has two complementary branches: (1) an MFs based splitting procedure that searches for train/test partitions which maximize differences in selected meta-features, and (2) an MFs based synthetic data generator that uses multi-objective evolutionary optimization to produce datasets whose meta-characteristics match a (possibly unavailable) target. Evaluations on real-world source/target pairs of datasets and a diverse set of learners show that MFs based splits create substantially larger distributional differences than random splits and often yield more realistic stress tests; when splits fail to predict true OOD performance, targeted synthetic generation closes the gap. Our results indicate that selected meta-features - especially mutual information, class concentration, and joint entropy - are effective signals of concept shifts and can be used to construct practical pre-deployment OOD evaluations for tabular models.