Host: The Japanese Society for Artificial Intelligence
Name : 34th Annual Conference, 2020
Number : 34
Location : Online
Date : June 09, 2020 - June 12, 2020
A backdoor attack is one of the model poisoning attacks against the machine learning system such as deep neural networks (DNNs). In the backdoor attack against the image classification system, an adversary creates some tampered data that has adversarial marks and injects them into a training dataset. A DNN model that is trained by the tampered training dataset can achieve high classification accuracy for clean input data but the inference on the input data with adversarial marks is misclassified to the adversarial target label. In this paper, we propose the countermeasure against the backdoor attack utilizing knowledge distillation. A DNN model user distills clean knowledge from the backdoored model utilizing clean unlabeled data. The distilled model achieves high classification accuracy without being affected by the backdoor. Furthermore, the user distinguishes the tampered data injected into the training dataset by comparing the classification results of the backdoored model and the distilled model.