病名アノテーションが付与された医療テキスト・コーパスの構築

荒牧 英治; 若宮 翔子; 矢野 憲; 永井 宥之; 岡久 太郎; 伊藤 薫

doi:10.5715/jnlp.25.119

Abstract

Sufficient data is required for research on advanced AI. In the field of medicine, especially clinical medicine, information retrieval is necessary to utilize the data fully since the data—mainly clinical records—uses natural language. The corpus we developed in this study has the following strong points: (i) The corpus consists of 45,000 case reports, which is the largest to our knowledge, and (ii) not only did we standardized the terminology and the method for annotation, we also annotated “factness,” which notes whether or not a disease name is actually the state of the patient in a case report. This paper describes the methods to develop the medical corpus for AI research, focusing on the annotation of the disease or symptom name. First, we define the concepts contained in the annotation criteria using examples. Next, we discuss the feasibility of the annotation through giving some indexes such as agreement rate. Finally, we report the development of the disease-name extraction system based on the corpus. We believe this corpus is a good reference for future clinical annotation.

Content from these authors

Licensed under CC BY 4.0
https://creativecommons.org/licenses/by/4.0/

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!