抄録
In this study, we proposed a method for the automatic detection of personally identifiable information
from various formats of medical data using a locally operated large language model (LLM). We performed
optimized text extraction for each data format, then input the extracted text into Llama3 for detection. In
addition, we also performed fine-tuning using LoRA and compared performance with the base model. The base
model achieved a high detection rate on text data, but the detection rate decreased on image data due to
misrecognition by Easy-OCR. While the output format improved after fine-tuning, the detection rate for patient
IDs significantly decreased.