Host: Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)
Name : 37th Fuzzy System Symposium
Number : 37
Location : [in Japanese]
Date : September 13, 2021 - September 15, 2021
This research is concerned with improving the performance of unsupervised morphological analysis, which segmentsgivencharacterstrings ofanunstructuredlanguage into word units.The conventional method, NPYLM, shows poor segmentation performance for less data. In this paper, we discuss the cause of the poor performance especially focusing on low frequency words. We investigated the relationship between the percentage of low-frequency words and the segmentation performance by varying the amount of data. As a result, we conclude that low-frequency words increased impact on the segmentation performance degradation with decreasing the amount of data.