Journal of Signal Processing
Online ISSN : 1880-1013
Print ISSN : 1342-6230
ISSN-L : 1342-6230
Black-Box Adversarial Attack against Deep Neural Network Classifier Utilizing Quantized Probability Output
Haruki MasudaTsunato NakaiKota YoshidaTakaya KubotaMitsuru ShiozakiTakeshi Fujino
Author information
JOURNAL FREE ACCESS

2020 Volume 24 Issue 4 Pages 145-148

Details
Abstract

Deep neural networks (DNNs) are vulnerable to welldesigned input samples, known as adversarial examples. In particular, an attack involving the generation of adversarial examples is called a black-box attack when an adversary attacks without any internal knowledge of the target network. In a simple black-box attack, adversarial perturbations are selected on the basis of changes in output probability when the input to the DNN is slightly changed. Output probability quantization has been proposed as a countermeasure against the simple black-box attack.
In this work, we quantitatively evaluate the effectiveness of this protection method by using the image degradation index and propose a new black-box attack that can overcome the output probability quantization. We conducted experiments to generate adversarial examples using the MNIST public dataset. In the conventional method, if the fourth digit after the decimal point of the output probability is truncated, perturbations that can easily be recognized by humans appear in the adversarial example, and the attack ability decreases. With the new attack method, we find that adversarial examples can be generated with a sufficiently small degradation even if the output probability is truncated after the second decimal place. This demonstrates that the output probability quantization countermeasure against the simple black-box attack is not effective.

Content from these authors
© 2020 Research Institute of Signal Processing, Japan
Previous article Next article
feedback
Top