Defending against Clean-Image Backdoor Attack in Multi-Label Classification

Published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024

Deep neural networks (DNNs) are known to be vulnerable to backdoor attacks. Specifically, the attacker endeavors to implant backdoors in the DNN model by injecting a set of poisoning samples such that the malicious model predicts target labels once the backdoor is triggered. The clean-image attack has recently emerged as a threat in multi-label classification, where an attacker is able to poison training labels without tampering with image contents. In this paper, we propose a simple but effective method to alleviate clean-image backdoor attacks. Considering the difference in weight convergence between the benign model and backdoor model, our method relies on partial weight initialization and fine-tuning to mitigate the backdoor behaviors of a suspicious model. The fine-tuned model sustains its clean accuracy through knowledge distillation over a few iterations. Importantly, our approach does not require extra clean images for purification. Extensive experiments demonstrate the effectiveness of our defenses against clean-image attacks for multi-label classifications across two benchmark datasets.

Recommended citation: Cheng-Yi Lee, Cheng-Chang Tsai, Ching-Chia Kao, Chun-Shien Lu, Chia-Mu Yu. (2024). "Defending against Clean-Image Backdoor Attack in Multi-Label Classification" ICASSP.
Download Paper