Abstract
In order to glean new information and knowledge from medical texts, clinical records and other related material, the first step of the language processing is to split medical texts into words. Generally, a morphological analyzer and special dictionary are used to divide a string into words or compound words.
ComeJisyo V1 is a dictionary for the morphological analyzer MeCab that was developed and released in November 2008, and was followed by the release of ComeJisyo V2 in January 2010 and ComeJisyo V3 in March 2011. ComeJisyo V1 included 30,146 words, while the new ComeJisyo V3 includes 41,592 words. Compared to ComeJisyo V1 which had an analysis accuracy of approximately 70%, the analysis accuracy when using ComeJisyo V3 to split medical texts into words or compound words is greater than 90%.
Herein, we provide an overview of ComeJisyo and its analysis accuracy.