With the rapid development of generative AI, the Transformer architecture―centered on Self-Attention―has emerged as a fundamental technology across various fields, including natural language processing and image generation. This paper provides a step-by-step explanation of the mathematical foundations of Self-Attention, focusing on the roles of Query, Key, and Value, the computation and normalization of similarity scores using the Softmax function, and the generation of output vectors through weighted averaging. In addition, the architectural design and theoretical principles of Attention mechanisms within Transformers are reviewed to clarify the central role of Self-Attention in generative AI. Through the use of equations and illustrative figures, this work aims to support intuitive understanding and to contribute to the foundational knowledge required for future applications in the field of medical image analysis and visualization.
This paper discusses the application of artificial intelligence (AI) in the medical domain, with a particular focus on large language models (LLMs) and generative AI. As AI technologies have advanced, the field has seen a shift from traditional statistical models to deep learning approaches. In natural language processing (NLP), models such as BERT and LLMs have come to play a central role. These developments have enabled the simplification of previously complex tasks in medicine, leading to increased efficiency in clinical workflows. Specifically, automation using LLMs is progressing in areas such as routine patient interactions (e.g., obtaining informed consent and cancer consultations) and medical document generation. Furthermore, LLMs are increasingly being employed in the extraction of adverse drug events and the construction of clinical record databases, facilitating the implementation of systems that were previously considered difficult to realize. These advancements are expected to reduce the burden on NLP researchers and accelerate the adoption of AI technologies in clinical settings. This paper provides an overview of how LLMs are transforming the medical field, explores their societal impact, and discusses future prospects.
Many of today’s widely used image generation AIs utilize Diffusion Models for image synthesis. Unlike traditional generative models, Diffusion Models offer more stable training and the ability to produce high-quality images. As a result, their use has extended beyond natural images to the field of medical image processing, where various methods leveraging Diffusion Models for medical image generation and segmentation have been proposed. To conduct cutting-edge research, it is essential to understand the trends surrounding Diffusion Models. This paper provides an overview of the mechanisms of Diffusion Models, introduces their applications in medical image generation and segmentation, and presents examples of their implementation.
Collecting accurately annotated training data in the medical field is challenging. This paper proposes a method to adapt the YOLOv3 loss function for detecting metastatic liver cancer from abdominal ultrasound images, even with incomplete annotations. We aim to enhance detection accuracy for unannotated tumors by incorporating tumor-type-specific weights into the background component of the YOLOv3 loss function. After conducting experiments where only the weight for metastatic liver cancer was reduced, we observed improved recall rates for this tumor type. Furthermore, when the weight was set to 0.1, the precision for annotated metastatic liver cancer decreased by approximately 10%, due to the detection of non-annotated tumors during evaluation. It is important to note that the detection accuracy for other tumor types did not diminish. These findings indicate the effectiveness of our proposed method.