人工知能学会論文誌
Online ISSN : 1346-8030
Print ISSN : 1346-0714
ISSN-L : 1346-0714
原著論文
ANJeL: 日本語を扱う深層ニューラルネットワー クを対象としたブラックボックス敵対的攻撃
河野 竜士秋本 一樹小野 智司
著者情報
ジャーナル フリー

2025 年 40 巻 2 号 p. C-O52_1-11

詳細
抄録

Recent studies revealed that Deep Neural Network (DNN) models misrecognize adversarial examples, crafted with malicious modifications to the input. Modifying inputs to induce errors in DNN models is known as adversarial attack, and this vulnerability is a concern even in DNNs that handle natural language. Generally, adversarial attack research on DNNs that manage natural language has either been language-agnostic or predominantly focused on English. Meanwhile, the need to examine vulnerabilities in specific languages has also arisen. Therefore, this paper proposes ANJeL (Adversarial attack to Neural networks for JapanesE Language) to detect vulnerabilities particular to DNNs specialized in Japanese. The proposed method creates adversarial examples based on character types and grammatical features of the Japanese language under the black-box condition. Experimental results have shown that the proposed method successfully detected vulnerabilities within open-source language models and a commercial cloud service by revealing the presence of adversarial examples incorporating perturbations based on Japanese linguistic features. In particular, the detected adversarial examples by ANJeL exhibited greater naturalness and similarity to the input than those detected through previous approaches.

著者関連情報
© JSAI (The Japanese Society for Artificial Intelligence)
前の記事 次の記事
feedback
Top