Host: Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT)
Name : 37th Fuzzy System Symposium
Number : 37
Location : [in Japanese]
Date : September 13, 2021 - September 15, 2021
This study aims to improve the performance of unsupervised morphological analysis using NPYLM for minority languages. Conventional methods require a large amount of data for training, but the amount of data for minority languages is limited. So far, we have tried to improve the performance of unsupervised morphological analysis by using the ”replacement” method, in which words that are correctly analyzed are replaced with different symbol types even if the amount of data is limited. As an improvement of the ”replacement” method, we have also studied a ”limited replacement” method based on TF-IDF under the assumption that words should be replaced and words that should not be replaced. In this paper, we aim to improve TF-IDF’s performance by making further improvements in its operation. As a result, the F-value of TF-IDF is greatly improved, enabling us to extract word candidates efficiently even from documents consisting of unknown words.