Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Text-to-Image Diffusion Models trained on vast amount of web data can generate high-quality images under various conditions. However, they may also generate inappropriate images. To address this issue, methods have been researched for removing specific concepts from the pretrained models. In this paper, we propose a novel approach that involves fine-tuning the text encoder instead of relying on the conventional U-Net. This method allows for the removal of concepts with a couple of real images without compromising the model's generative ability (image fidelity). Our experiments confirmed that the specified real-world concepts are less likely to be generated. Furthermore, while previous methods required human intervention to control how the target concept changes internally within the model, our proposed method yields results suggesting that it leverages internal knowledge or image knowledge within the model.