Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
34th (2020)
Session ID : 4J3-GS-2-04
Conference information

Countermeasure and Identifying Poison Data against Backdoor Attack on Neural Networks Utilizing Knowledge Distillation
*Kota YOSHIDATakeshi FUJINO
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

A backdoor attack is one of the model poisoning attacks against the machine learning system such as deep neural networks (DNNs). In the backdoor attack against the image classification system, an adversary creates some tampered data that has adversarial marks and injects them into a training dataset. A DNN model that is trained by the tampered training dataset can achieve high classification accuracy for clean input data but the inference on the input data with adversarial marks is misclassified to the adversarial target label. In this paper, we propose the countermeasure against the backdoor attack utilizing knowledge distillation. A DNN model user distills clean knowledge from the backdoored model utilizing clean unlabeled data. The distilled model achieves high classification accuracy without being affected by the backdoor. Furthermore, the user distinguishes the tampered data injected into the training dataset by comparing the classification results of the backdoored model and the distilled model.

Content from these authors
© 2020 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top