IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532

This article has now been updated. Please use the final version.

Detecting Textual Backdoor Attacks via Class Difference for Text Classification System
Hyun KWONJun LEE
Author information
JOURNAL FREE ACCESS Advance online publication

Article ID: 2023EDP7160

Details
Abstract

A backdoor sample attack is an attack that causes a deep neural network to misrecognize data that include a specific trigger because the model has been trained on malicious data that insert triggers into the deep neural network. The deep neural network correctly recognizes data without triggers, but incorrectly recognizes data with triggers. These backdoor attacks have mainly been studied in the image domain; however, defense research in the text domain is insufficient. In this study, we propose a method to defend against textual backdoor samples using a detection model. The proposed method detects a textual backdoor sample by comparing the resulting value of the target model with that of the model trained on the original training data. This method can defend against attacks without access to the entire training data. For the experimental setup, we used the TensorFlow library, and the MR and IMDB datasets were used as the experimental datasets. As a result of the experiment, when 1000 partial training datasets were used to train the detection model, the proposed method could classify the MR and IMDB datasets with detection rates of 79.6% and 83.2%, respectively.

Content from these authors
© 2024 The Institute of Electronics, Information and Communication Engineers
feedback
Top