Workshop: Workshop on Machine Learning Safety

Pre-training Robust Feature Extractor Against Clean-label Data Poisoning Attacks

Ting Zhou · Hanshu Yan · Lei LIU · Jingfeng Zhang · Bo Han


In the transfer learning paradigm, models pre-trained on large datasets are employed as foundation models in various downstream tasks. However, this paradigm exposes downstream practitioners to data poisoning threats. Poisoning attackers craft malicious samples on foundation models, then inject these samples into re-training datasets to manipulate the behaviors of models at inference. In this work, we propose an upstream defense strategy that significantly reduces the success rate of various data poisoning attacks. Our defense aims to pre-train robust foundation models by reducing adversarial feature distance and increasing inter-categories feature distance. Experiments demonstrate the excellent defense performance of the proposed strategy towards state-of-the-art clean-label attacks in the transfer learning setting.

Chat is not available.