Self-Examination Mechanism: 説明可能AIを用いた敵対的攻撃に対する軽量な防御機構

末神 奏宙; 小栗 悠太郎; 趙 在瀛; 加賀谷 湧; 向井 皇喜; 吉田 舜; 琛 付; 山崎 俊彦

doi:10.11517/pjsai.JSAI2023.0_2A1GS203

37th (2023)

Session ID : 2A1-GS-2-03

DOI https://doi.org/10.11517/pjsai.JSAI2023.0_2A1GS203

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 37th Annual Conference of the Japanese Society for Artificial Intelligence

Number : 37

Location : [in Japanese]

Date : June 06, 2023 - June 09, 2023

Self-Examination Mechanism: Lightweight Defense Mechanism against Adversarial Examples using Explainable AI

*Sora SUEGAMI, Yutaro OGURI, Zaiying ZHAO, Yu KAGAYA, Koki MUKAI, Shun YOSHIDA, Fu CHEN, Toshihiko YAMASAKI

Author information

Keywords: Adversarial example, Explainable AI, Image classification

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

Deep learning-based image classification models are vulnerable to adversarial examples (AEs). Existing defense methods have improved the classification accuracy for AEs, but the classification accuracy for clean images without perturbations decreases. To solve this problem, we propose a new defense mechanism called self-examination mechanism. In the proposed method, the input image is first classified. Then, the inference process of the classification model is verified using SHapley Additive exPlanations (SHAP), a method of explainable AI. If the input image is abnormal, the classification is performed again based on the output of SHAP. Thus, misclassification of AEs can be prevented without significantly reducing the classification accuracy of clean images. Evaluations on ResNet and WideResNet trained with CIFAR10 demonstrate that our method improves the accuracy for AEs and hardly reduces the accuracy for clean images.

Corresponding author

Conference information

Register with J-STAGE for free!