NeurIPS Poster LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-tailed Problems

Poster

LLM-AutoDA: Large Language Model-Driven Automatic Data Augmentation for Long-tailed Problems

Pengkun Wang · Zhe Zhao · HaiBin Wen · Fanfu Wang · Binwu Wang · Qingfu Zhang · Yang Wang

West Ballroom A-D #5302

[ Abstract ]

[ Paper] [ OpenReview]

Abstract:

The long-tailed distribution is the underlying nature of real-world data, and it presents unprecedented challenges for training deep learning models. Existing long-tailed learning paradigms based on re-balancing or data augmentation have partially alleviated the long-tailed problem. However, they still have limitations, such as relying on manually designed augmentation strategies, having a limited search space, and using fixed augmentation strategies. To address these limitations, this paper proposes a novel LLM-based long-tailed data augmentation framework called LLM-AutoDA, which leverages large-scale pretrained models to automatically search for the optimal augmentation strategies suitable for long-tailed data distributions. In addition, it applies this strategy to the original imbalanced data to create an augmented dataset and fine-tune the underlying long-tailed learning model. The performance improvement on the validation set serves as a reward signal to update the generation model, enabling the generation of more effective augmentation strategies in the next iteration. We conducted extensive experiments on multiple mainstream long-tailed learning benchmarks. The results show that LLM-AutoDA outperforms state-of-the-art data augmentation methods and other re-balancing methods significantly.

Chat is not available.