Timezone: »

Dataset Distillation via Factorization
Songhua Liu · Kai Wang · Xingyi Yang · Jingwen Ye · Xinchao Wang

Thu Dec 01 02:00 PM -- 04:00 PM (PST) @ Hall J #923

In this paper, we study dataset distillation (DD), from a novel perspective and introduce a \emph{dataset factorization} approach, termed \emph{HaBa}, which is a plug-and-play strategy portable to any existing DD baseline. Unlike conventional DD approaches that aim to produce distilled and representative samples, \emph{HaBa} explores decomposing a dataset into two components: data \emph{Ha}llucination networks and \emph{Ba}ses, where the latter is fed into the former to reconstruct image samples. The flexible combinations between bases and hallucination networks, therefore, equip the distilled data with exponential informativeness gain, which largely increase the representation capability of distilled datasets. To furthermore increase the data efficiency of compression results, we further introduce a pair of adversarial contrastive \xw{constraints} on the resultant hallucination networks and bases, which increase the diversity of generated images and inject more discriminant information into the factorization. Extensive comparisons and experiments demonstrate that our method can yield significant improvement on downstream classification tasks compared with previous state of the arts, while reducing the total number of compressed parameters by up to 65\%. Moreover, distilled datasets by our approach also achieve \textasciitilde10\% higher accuracy than baseline methods in cross-architecture generalization. Our code is available \href{https://github.com/Huage001/DatasetFactorization}{here}.

Author Information

Songhua Liu (National University of Singapore)
Kai Wang (National University of Singapore)
Kai Wang

Kai Wang is a second-year Ph.D. student at the National University of Singapore. He was awarded the AI Singapore Ph.D. fellowship in August 2021. His research area is Dataset Efficient AI and its applications, such as dataset condensation, dataset expandation, dataset denoising, and dataset privacy. He has published 11 papers in top-tier conferences or journals and obtained 10+ worldwide challenge top3 awards. He hopes to build a series of high-efficiency algorithms for more intelligent datasets.

Xingyi Yang (National University of Singapore)
Xingyi Yang

Xingyi Yang is a second-year Ph.D student at National University of Singapore(NUS) at Learning and Vision Lab. I am now working under the supervision of Prof.Xinchao Wang.

Jingwen Ye (National University of Singapore)
Xinchao Wang

More from the Same Authors