IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Regular Section
Feature-Based Adversarial Training for Deep Learning Models Resistant to Transferable Adversarial Examples
Gwonsang RYUDaeseon CHOI
Author information
JOURNAL FREE ACCESS

2022 Volume E105.D Issue 5 Pages 1039-1049

Details
Abstract

Although deep neural networks (DNNs) have achieved high performance across a variety of applications, they can often be deceived by adversarial examples that are generated by adding small perturbations to the original images. Adversaries may generate adversarial examples using the property of transferability, in which adversarial examples that deceive one model can also deceive other models because adversaries do not obtain any information on the DNNs deployed in real scenarios. Recent studies show that adversarial examples with feature space perturbations are more transferable than others. Adversarial training is an effective method to defend against adversarial attacks. However, it results in a decrease in the classification accuracy for natural images, and it is not sufficiently robust against transferable adversarial examples because it does not consider adversarial examples with feature space perturbations. We propose a novel adversarial training method to train DNNs to be robust against transferable adversarial examples and maximize their classification accuracy for natural images. The proposed method trains DNNs to correctly classify natural images and adversarial examples and also minimize the feature differences between them. The robustness of the proposed method was similar to those of the previous adversarial training methods for MNIST dataset and was up to average 6.13% and 9.24% more robust against transfer adversarial examples for CIFAR-10 and CIFAR-100 datasets, respectively. In addition, the proposed method yielded an average classification accuracy that was approximately 0.53%, 6.82%, and 10.60% greater than some state-of-the-art adversarial training methods for all datasets, respectively. The proposed method is robust against a variety of transferable adversarial examples, which enables its implementation in security applications that may benefit from high-performance classification but are at high risk of attack.

Related papers from these authors
© 2022 The Institute of Electronics, Information and Communication Engineers
Previous article Next article
feedback
Top