Simple Black-Box Adversarial Examples Generation with Very Few Queries

Yuya SENZAKI; Satsuya OHATA; Kanta MATSUURA

doi:10.1587/transinf.2019INP0002

Abstract

Research on adversarial examples for machine learning has received much attention in recent years. Most of previous approaches are white-box attacks; this means the attacker needs to obtain before-hand internal parameters of a target classifier to generate adversarial examples for it. This condition is hard to satisfy in practice. There is also research on black-box attacks, in which the attacker can only obtain partial information about target classifiers; however, it seems we can prevent these attacks, since they need to issue many suspicious queries to the target classifier. In this paper, we show that a naive defense strategy based on surveillance of number query will not suffice. More concretely, we propose to generate not pixel-wise but block-wise adversarial perturbations to reduce the number of queries. Our experiments show that such rough perturbations can confuse the target classifier. We succeed in reducing the number of queries to generate adversarial examples in most cases. Our simple method is an untargeted attack and may have low success rates compared to previous results of other black-box attacks, but needs in average fewer queries. Surprisingly, the minimum number of queries (one and three in MNIST and CIFAR-10 dataset, respectively) is enough to generate adversarial examples in some cases. Moreover, based on these results, we propose a detailed classification for black-box attackers and discuss countermeasures against the above attacks.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!