Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Report
Development of the Clinical Corpus with Disease Name Annotation
Eiji AramakiShoko WakamiyaKen YanoHiroyuki NagaiTaro OkahisaKaoru Ito
Author information
JOURNAL FREE ACCESS

2018 Volume 25 Issue 1 Pages 119-152

Details
Abstract

Sufficient data is required for research on advanced AI. In the field of medicine, especially clinical medicine, information retrieval is necessary to utilize the data fully since the data—mainly clinical records—uses natural language. The corpus we developed in this study has the following strong points: (i) The corpus consists of 45,000 case reports, which is the largest to our knowledge, and (ii) not only did we standardized the terminology and the method for annotation, we also annotated “factness,” which notes whether or not a disease name is actually the state of the patient in a case report. This paper describes the methods to develop the medical corpus for AI research, focusing on the annotation of the disease or symptom name. First, we define the concepts contained in the annotation criteria using examples. Next, we discuss the feasibility of the annotation through giving some indexes such as agreement rate. Finally, we report the development of the disease-name extraction system based on the corpus. We believe this corpus is a good reference for future clinical annotation.

Content from these authors
© 2018 The Association for Natural Language Processing
Previous article
feedback
Top