Feature-Based Adversarial Training for Deep Learning Models Resistant to Transferable Adversarial Examples

Gwonsang RYU; Daeseon CHOI

doi:10.1587/transinf.2021EDP7198

Abstract

Although deep neural networks (DNNs) have achieved high performance across a variety of applications, they can often be deceived by adversarial examples that are generated by adding small perturbations to the original images. Adversaries may generate adversarial examples using the property of transferability, in which adversarial examples that deceive one model can also deceive other models because adversaries do not obtain any information on the DNNs deployed in real scenarios. Recent studies show that adversarial examples with feature space perturbations are more transferable than others. Adversarial training is an effective method to defend against adversarial attacks. However, it results in a decrease in the classification accuracy for natural images, and it is not sufficiently robust against transferable adversarial examples because it does not consider adversarial examples with feature space perturbations. We propose a novel adversarial training method to train DNNs to be robust against transferable adversarial examples and maximize their classification accuracy for natural images. The proposed method trains DNNs to correctly classify natural images and adversarial examples and also minimize the feature differences between them. The robustness of the proposed method was similar to those of the previous adversarial training methods for MNIST dataset and was up to average 6.13% and 9.24% more robust against transfer adversarial examples for CIFAR-10 and CIFAR-100 datasets, respectively. In addition, the proposed method yielded an average classification accuracy that was approximately 0.53%, 6.82%, and 10.60% greater than some state-of-the-art adversarial training methods for all datasets, respectively. The proposed method is robust against a variety of transferable adversarial examples, which enables its implementation in security applications that may benefit from high-performance classification but are at high risk of attack.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!