Generative Image Synthesis as a Substitute for Real Images in Pre-training of Vision Transformers

Luiz Henrique MORMILLE; Iskandar SALAMA; Masayasu ATSUMI

doi:10.11517/pjsai.JSAI2024.0_3Q5IS2b03

38th (2024)

セッションID: 3Q5-IS-2b-03

DOI https://doi.org/10.11517/pjsai.JSAI2024.0_3Q5IS2b03

会議情報

主催: The Japanese Society for Artificial Intelligence

会議名: 2024年度人工知能学会全国大会（第38回）

回次: 38

開催地: アクトシティ浜松＋オンライン

開催日: 2024/05/28 - 2024/05/31

Generative Image Synthesis as a Substitute for Real Images in Pre-training of Vision Transformers

*Luiz Henrique MORMILLE, Iskandar SALAMA, Masayasu ATSUMI

著者情報

キーワード: Stable-diffusion, Vision Transformer, Self-supervised Learning

会議録・要旨集フリー

詳細

抄録

Gathering data from the real world involves time-consuming aspects of web scraping, data cleaning, and labelling. Aiming to alleviate these costly tasks, this paper proposes the utilization of rapid stable diffusion to synthesize images efficiently from text prompts, thereby eliminating the need for manual data collection and mitigating biases and mislabelling risks. Through extensive experimentation with a small-scale vision transformer across 4 downstream classification tasks, our study includes a comprehensive comparison of models pre-trained on conventional datasets, datasets enriched with synthetic images, and entirely synthetic datasets. The outcomes underscore the remarkable efficacy of stable diffusion-synthesized images to yield consistent model generalization and accuracy. Beyond the immediate benefits of fast dataset creation, our approach represents a robust solution for bolstering the performance of computer vision models. The findings underscore the transformative potential of generative image synthesis, offering a new paradigm for advancing the capabilities of machine learning in the realm of computer vision.

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）