Few-shotによるText-to-Image Diffusion Modelsからの概念消去

渕 雅音; 高木 友博

doi:10.11517/pjsai.JSAI2024.0_2B6GS205

Abstract

Text-to-Image Diffusion Models trained on vast amount of web data can generate high-quality images under various conditions. However, they may also generate inappropriate images. To address this issue, methods have been researched for removing specific concepts from the pretrained models. In this paper, we propose a novel approach that involves fine-tuning the text encoder instead of relying on the conventional U-Net. This method allows for the removal of concepts with a couple of real images without compromising the model's generative ability (image fidelity). Our experiments confirmed that the specified real-world concepts are less likely to be generated. Furthermore, while previous methods required human intervention to control how the target concept changes internally within the model, our proposed method yields results suggesting that it leverages internal knowledge or image knowledge within the model.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!