Japan Journal of Medical Informatics
Online ISSN : 2188-8469
Print ISSN : 0289-8055
ISSN-L : 0289-8055
Interest Material
Comparative Evaluation of ComeJisyo V1, ComeJisyo V2 and ComeJisyo V3
K SagaraM OnoH OzakuT SuzukiM TakasakiG Shimada
Author information
JOURNAL FREE ACCESS

2012 Volume 32 Issue 6 Pages 301-307

Details
Abstract
 In order to glean new information and knowledge from medical texts, clinical records and other related material, the first step of the language processing is to split medical texts into words. Generally, a morphological analyzer and special dictionary are used to divide a string into words or compound words.
 ComeJisyo V1 is a dictionary for the morphological analyzer MeCab that was developed and released in November 2008, and was followed by the release of ComeJisyo V2 in January 2010 and ComeJisyo V3 in March 2011. ComeJisyo V1 included 30,146 words, while the new ComeJisyo V3 includes 41,592 words. Compared to ComeJisyo V1 which had an analysis accuracy of approximately 70%, the analysis accuracy when using ComeJisyo V3 to split medical texts into words or compound words is greater than 90%.
 Herein, we provide an overview of ComeJisyo and its analysis accuracy.
Content from these authors
© 2012 Japan Association for Medical Informatics
Previous article Next article
feedback
Top