Vision TransformersによるFractalDB事前学習効果の検証

中嶋 航大; 片岡 裕雄; 佐藤 雄隆

doi:10.2493/jjspe.89.99

Abstract

Since the introduction of the Vision Transformer, many transformer-based networks have been proposed. Nakashima et al. showed that ViT and gMLP can be pre-trained in FractalDB and achieve the same level of accuracy as ImageNet-1k. We hypothesize that other Transformer networks may also benefit from pre-training in FractalDB. If this hypothesis is proven, it can be expected that improving FDSL-based datasets such as FractalDB will improve the accuracy of existing networks and those to be proposed in the future. Therefore, in this paper, we perform exhaustive experiments on pre-training results of representative Transformer networks on FractalDB.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!