Skip to yearly menu bar Skip to main content

Workshop: Mathematics of Modern Machine Learning (M3L)

A Theoretical Study of Dataset Distillation

Zachary Izzo · James Zou


Modern machine learning models are often trained using massive amounts of data. Such large datasets come at a high cost in terms of both storage and computation, especially when the data will need to be used repeatedly (e.g., for neural architecture search or continual learning). Dataset distillation (DD) describes the process of constructing a smaller ``distilled'' dataset (usually consisting of synthetic examples), such that models trained on the distilled dataset will be similar to models trained on the original dataset. In this paper, we study DD from a theoretical perspective. We show that for generalized linear models, it is possible to construct a distilled dataset with only a single point which will exactly recover the model trained on the original dataset, regardless of the original number of points. We provide a specialized distillation for linear regression with size independent of the original number of points, but which perfectly reconstructs the model obtained from the original dataset with any data-independent regularizer, or by combining the original dataset with any additional data. We also provide impossibility results showing that similar constructions are impossible for logistic regression, and that DD cannot be accomplished in general for kernel regression, even if the goal is only to recover a single model.

Chat is not available.