大規模言語モデルを用いた医療データの個人情報自動検出

井上 愛; 盛田 健人; 藤井 武宏; 佐野 龍樹; 土肥 薫; 若林 哲史

doi:10.24466/pacbfsa.38.0_71

DOI https://doi.org/10.24466/pacbfsa.38.0_71

会議情報

会議名: 第38回バイオメディカル・ファジィ・システム学会

回次: 38

開催地: 九州工業大学大学院生命体工学研究科

開催日: 2025/12/13 - 2025/12/14

大規模言語モデルを用いた医療データの個人情報自動検出

*井上愛, *盛田健人, *藤井武宏, *佐野龍樹, *土肥薫, *若林哲史

著者情報

会議録・要旨集フリー

p. 71-74

詳細

抄録

In this study, we proposed a method for the automatic detection of personally identifiable information from various formats of medical data using a locally operated large language model (LLM). We performed optimized text extraction for each data format, then input the extracted text into Llama3 for detection. In addition, we also performed fine-tuning using LoRA and compared performance with the base model. The base model achieved a high detection rate on text data, but the detection rate decreased on image data due to misrecognition by Easy-OCR. While the output format improved after fine-tuning, the detection rate for patient IDs significantly decreased.

責任著者(Corresponding author)

会議情報

J-STAGEへの登録はこちら（無料）