2020 Volume 24 Issue 4 Pages 145-148
Deep neural networks (DNNs) are vulnerable to welldesigned input samples, known as adversarial examples. In particular, an attack involving the generation of adversarial examples is called a black-box attack when an adversary attacks without any internal knowledge of the target network. In a simple black-box attack, adversarial perturbations are selected on the basis of changes in output probability when the input to the DNN is slightly changed. Output probability quantization has been proposed as a countermeasure against the simple black-box attack.
In this work, we quantitatively evaluate the effectiveness of this protection method by using the image degradation index and propose a new black-box attack that can overcome the output probability quantization. We conducted experiments to generate adversarial examples using the MNIST public dataset. In the conventional method, if the fourth digit after the decimal point of the output probability is truncated, perturbations that can easily be recognized by humans appear in the adversarial example, and the attack ability decreases. With the new attack method, we find that adversarial examples can be generated with a sufficiently small degradation even if the output probability is truncated after the second decimal place. This demonstrates that the output probability quantization countermeasure against the simple black-box attack is not effective.