Proceedings of the Fuzzy System Symposium
37th Fuzzy System Symposium
Session ID : TA4-4
Conference information

proceeding
Unsupervised Morphological Analysis for Unconstructed Languages ― Discussion on the influence of low-frequency words for less data―
*Ryotaro MuraseSinnya MatsushitaHaruhiko TakaseHidehiko Kita
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

This research is concerned with improving the performance of unsupervised morphological analysis, which segmentsgivencharacterstrings ofanunstructuredlanguage into word units.The conventional method, NPYLM, shows poor segmentation performance for less data. In this paper, we discuss the cause of the poor performance especially focusing on low frequency words. We investigated the relationship between the percentage of low-frequency words and the segmentation performance by varying the amount of data. As a result, we conclude that low-frequency words increased impact on the segmentation performance degradation with decreasing the amount of data.

Content from these authors
© 2021 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top