Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
37th (2023)
Session ID : 2U6-IS-1c-04
Conference information

Adversarial Self-attention Misdirection
Improving vision transformers performance with adversarial pre-training
*Luiz Henrique MORMILLEMasayasu ATSUMI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In recent years, the Transformer achieved remarkable results in computer vision related tasks, matching, or even surpassing those of convolutional neural networks. However, to achieve state-of-the-art results, vision transformers rely on large architectures and extensive pre-training on very large datasets. One of the main reasons for this limitation is the fact that vision transformers, whose core is its global self-attention computation, inherently lack inductive biases, with solutions often converging on a local minimum. This work presents a new method to pre-train vision transformers, denoted self-attention misdirection. In this pre-training method, an adversarial U-Net like network pre-processes the input images, altering them with the goal of misdirecting the self-attention computation process in the vision transformer. It uses style representations of image patches to generate inputs that are difficult for self-attention learning, leading the vision transformer to learn representations that generalize better on unseen data.

Content from these authors
© 2023 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top