Deep learning, which emerged around 2012, is the third generation of AI, and has since progressed at an extremely rapid pace, having a major impact on the world of medical imaging. Initially, different neural network structures were developed for each learning target, such as convolutional neural networks for images, recurrent neural networks for language and time series information, and the original generative models, variational autoencoders and adversarial generative networks, but when the Transformer was introduced in 2017, its superiority and flexibility led to a rapid unification of these models, making it possible to learn two modalities, images and language. In addition, the use of Transformers as a base has made it possible to learn extremely large amounts of linguistic information, and since the advent of ChatGPT in November 2022, AI technology has entered the fourth generation, centered on large-scale language models and generative AI. It has become clear that these large-scale language models, which have learned large amounts of linguistic information, are basic models that can respond appropriately to various human questions without additional learning, and subsequent rapid improvements have brought them to a level where they can mimic a significant part of human intellectual activity.