We propose the first unified theoretical analysis of mixed sample data augmentation (MSDA), such as Mixup and CutMix. Our theoretical results show that regardless of the choice of the mixing strategy, MSDA behaves as a pixel-level regularization of the underlying training loss and a regularization of the first layer parameters. Similarly, our theoretical results support that the MSDA training strategy can improve adversarial robustness and generalization compared to the vanilla training strategy. Using the theoretical results, we provide a high-level understanding of how different design choices of MSDA work differently. For example, we show that the most popular MSDA methods, Mixup and CutMix, behave differently, e.g., CutMix regularizes the input gradients by pixel distances, while Mixup regularizes the input gradients regardless of pixel distances. Our theoretical results also show that the optimal MSDA strategy depends on tasks, datasets, or model parameters. From these observations, we propose generalized MSDAs, a Hybrid version of Mixup and CutMix (HMix) and Gaussian Mixup (GMix), simple extensions of Mixup and CutMix. Our implementation can leverage the advantages of Mixup and CutMix, while our implementation is very efficient, and the computation cost is almost neglectable as Mixup and CutMix. Our empirical study shows that our HMix and GMix outperform the previous state-of-the-art MSDA methods in CIFAR-100 and ImageNet classification tasks.
Chanwoo Park (Massachusetts Institute of Technology)
Sangdoo Yun (Naver AI Lab)
Sanghyuk Chun (NAVER AI Lab)
I'm a research scientist and tech leader at NAVER AI Lab, working on machine learning and its applications. In particular, my research interests focus on bridging the gap between two gigantic topics: reliable machine learning tasks (e.g., robustness [C3, C9, C10, W1, W3], de-biasing or domain generalization [C6, A6], uncertainty estimation [C11, A3], explainability [C5, C11, A2, A4, W2], and fair evaluation [C5, C11]) and learning with limited annotations (e.g., multi-modal learning [C11], weakly-supervised learning [C2, C3, C4, C5, C7, C8, C12, W2, W4, W5, W6, A2, A4], and self-supervised learning). I have contributed large-scale machine learning algorithms [C3, C9, C10, C13] in NAVER AI Lab as well. Prior to working at NAVER, I worked as a research engineer at the advanced recommendation team (ART) in Kakao from 2016 to 2018. I received a master's degree in Electrical Engineering from Korea Advanced Institute of Science and Technology (KAIST) in 2016. During my master's degree, I researched a scalable algorithm for robust subspace clustering (the algorithm is based on robust PCA and k-means clustering). Before my master's study, I worked at IUM-SOCIUS in 2012 as a software engineering internship. I also did a research internship at Networked and Distributed Computing System Lab in KAIST and NAVER Labs during summer 2013 and fall 2015, respectively.
More from the Same Authors
2021 Workshop: ImageNet: Past, Present, and Future »
Zeynep Akata · Lucas Beyer · Sanghyuk Chun · A. Sophia Koepke · Diane Larlus · Seong Joon Oh · Rafael Rezende · Sangdoo Yun · Xiaohua Zhai
2021 Poster: SWAD: Domain Generalization by Seeking Flat Minima »
Junbum Cha · Sanghyuk Chun · Kyungjae Lee · Han-Cheol Cho · Seunghyun Park · Yunsung Lee · Sungrae Park
2021 Poster: Neural Hybrid Automata: Learning Dynamics With Multiple Modes and Stochastic Transitions »
Michael Poli · Stefano Massaroli · Luca Scimeca · Sanghyuk Chun · Seong Joon Oh · Atsushi Yamashita · Hajime Asama · Jinkyoo Park · Animesh Garg