ComeJisyoの紹介と医療情報に含まれる誤字調査

相良 かおる

doi:10.2964/jsik_2014_019

Abstract

　The increasingly widespread use of electronic health records systems mean that large amounts of medical information are accumulated in text format. In order to support computer readability and analysis of text by natural language processing (NLP) of medical information, we have released ComeJisyoV5-1 with 77,760 entries of medical terms for morphological analysis of text.
　Furthermore, in order to aid NLP of medical text, knowing the kinds of typographical errors present is significant so that they can be reduced and that the terminology can still be interpreted by the computer. Overall, 53 kinds of typographical errors were found and analyzed following an ethically approved investigation of medical information in two facilities.
　As a result, in the two-step conversion process, whereby typed Roman alphabetic characters are converted into kana and then exchanged for selected kanji, most errors occurred in the process of exchanging kana for kanji resulting in 46 terms being converted into homophones or another same-sounding kanji.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!