人工知能学会全国大会論文集
Online ISSN : 2758-7347
37th (2023)
セッションID: 2U6-IS-1c-04
会議情報

Adversarial Self-attention Misdirection
Improving vision transformers performance with adversarial pre-training
*Luiz Henrique MORMILLEMasayasu ATSUMI
著者情報
会議録・要旨集 フリー

詳細
抄録

In recent years, the Transformer achieved remarkable results in computer vision related tasks, matching, or even surpassing those of convolutional neural networks. However, to achieve state-of-the-art results, vision transformers rely on large architectures and extensive pre-training on very large datasets. One of the main reasons for this limitation is the fact that vision transformers, whose core is its global self-attention computation, inherently lack inductive biases, with solutions often converging on a local minimum. This work presents a new method to pre-train vision transformers, denoted self-attention misdirection. In this pre-training method, an adversarial U-Net like network pre-processes the input images, altering them with the goal of misdirecting the self-attention computation process in the vision transformer. It uses style representations of image patches to generate inputs that are difficult for self-attention learning, leading the vision transformer to learn representations that generalize better on unseen data.

著者関連情報
© 2023 The Japanese Society for Artificial Intelligence
前の記事 次の記事
feedback
Top