観測への敵対的摂動を介した模倣学習による深層強化学習エージェントへの標的型操作攻撃

山辺 翔二郎; 福地 一斗; 仙田 涼摩; 佐久間 淳

doi:10.11517/pjsai.JSAI2024.0_2G1GS1103

Abstract

Deep reinforcement learning (DRL) is known to be vulnerable to adversarial attacks. For real-world applications, it is necessary to improve the robustness of DRL agents. Therefore, in this study, we propose a targeted manipulation attack method that specifies the behavior of the victim agent assuming a real-world attack in order to investigate the vulnerability. As a threat model, we consider a situation in which an attacker can generate perturbations to the observations of victim agent. The goal of the attacker is to manipulate the victim agent. The attacker expresses the desired behavior as a trajectory and attacks the victim agent to imitate it. In this study, we use imitation learning to realize the attack. Finally, we confirm that the targeted manipulation attack succeeds under the threat model set by our experiments on MetaWorld, a benchmark for reinforcement learning.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!