IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532

この記事には本公開記事があります。本公開記事を参照してください。
引用する場合も本公開記事を引用してください。

Detecting Textual Backdoor Attacks via Class Difference for Text Classification System
Hyun KWONJun LEE
著者情報
ジャーナル フリー 早期公開

論文ID: 2023EDP7160

この記事には本公開記事があります。
詳細
抄録

A backdoor sample attack is an attack that causes a deep neural network to misrecognize data that include a specific trigger because the model has been trained on malicious data that insert triggers into the deep neural network. The deep neural network correctly recognizes data without triggers, but incorrectly recognizes data with triggers. These backdoor attacks have mainly been studied in the image domain; however, defense research in the text domain is insufficient. In this study, we propose a method to defend against textual backdoor samples using a detection model. The proposed method detects a textual backdoor sample by comparing the resulting value of the target model with that of the model trained on the original training data. This method can defend against attacks without access to the entire training data. For the experimental setup, we used the TensorFlow library, and the MR and IMDB datasets were used as the experimental datasets. As a result of the experiment, when 1000 partial training datasets were used to train the detection model, the proposed method could classify the MR and IMDB datasets with detection rates of 79.6% and 83.2%, respectively.

著者関連情報
© 2024 The Institute of Electronics, Information and Communication Engineers
feedback
Top