Abstract
The increasingly widespread use of electronic health records systems mean that large amounts of medical information are accumulated in text format. In order to support computer readability and analysis of text by natural language processing (NLP) of medical information, we have released ComeJisyoV5-1 with 77,760 entries of medical terms for morphological analysis of text.
Furthermore, in order to aid NLP of medical text, knowing the kinds of typographical errors present is significant so that they can be reduced and that the terminology can still be interpreted by the computer. Overall, 53 kinds of typographical errors were found and analyzed following an ethically approved investigation of medical information in two facilities.
As a result, in the two-step conversion process, whereby typed Roman alphabetic characters are converted into kana and then exchanged for selected kanji, most errors occurred in the process of exchanging kana for kanji resulting in 46 terms being converted into homophones or another same-sounding kanji.