Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
Versatile Annotation Guidelines for Clinical-Medical Text with an Application to Critical Lung Diseases
Shuntaro YadaRibeka TanakaFei ChengEiji AramakiSadao Kurohashi
Author information
JOURNAL FREE ACCESS

2022 Volume 29 Issue 4 Pages 1165-1197

Details
Abstract

Natural language processing for medical applications (medical NLP) requires high-quality annotated corpora. In this study, we designed a versatile annotation scheme for clinical-medical text and a set of associated guidelines, which address two common subtasks used in medical NLP: named entity recognition (NER) and relation extraction (RE). The annotation scheme integrates similar existing schemes and defines clinical-medical entities and relations to encode useful information for many medical NLP applications. The guidelines aim to increase the annotation feasibility by reducing the necessity of judgement based on medical knowledge so as to enable non-medical professionals to annotate the text. We adopted a recursive discussion procedure involving NLP researchers, medical professionals, and annotators to develop the scheme and guidelines based on real annotation examples while increasing the corpus size. Further, we obtained annotated corpora comprising 3,769 medical records and radiology reports of patients with serious lung diseases. For improved efficiency, preliminary NER and RE models were created after the first half was annotated; they were subsequently applied to the second half, which was then corrected manually. This two-step annotation also increased the inter-coder agreement. Finally, a joint NER + RE model trained on our corpora showed sufficiently promising performance to suggest its practical implementation.

Content from these authors
© 2022 The Association for Natural Language Processing
Previous article Next article
feedback
Top