敵対的事例の脆弱性を用いた分類結果矯正の試み

森本 文哉; 赤垣 敬吾; 小野 智司

doi:10.11517/pjsai.JSAI2023.0_2K5GS201

Abstract

Deep neural networks (DNNs) have shown high performance in various fields, such as image classification and speech recognition, and are being applied in real-world applications. On the other hand, recent studies have revealed that DNN-based classifiers have the vulnerability of misrecognizing Adversarial Examples (AEs), which are small and specially perturbed input data to the extent that they are difficult for humans to perceive. For this reason, research on defense methods against AEs has been widely conducted. For example, detection methods that discriminate AEs based on features of input samples have been proposed, but they only detect AEs and do not consider AEs’ correct categories. While many tasks can simply reject detected AEs, some tasks, such as sign recognition for autonomous driving, require correct categories of AEs. This is because, when an attack is made on a stop sign, DNNs with the defense method cannot recognize it as a stop sign though they can detect the attacked sign. Such tasks require some post-processing in addition to detect AEs. For this reason, we propose a label rectification method for AEs detected by the defense method, that is, a method to estimate the correct labels in the original images of the AEs. This method based on counter-attacking can correct the misclassification results to those of the original images.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!