Proceedings of the Fuzzy System Symposium
37th Fuzzy System Symposium
Session ID : TD1-2
Conference information

proceeding
A method for improving unsupervised morphological analysis for minority languages Effective preprocessing under the condition that the amount of data required for training is very small
*Shinya MatsushitaRyotaro MuraseHaruhiko TakaseHidehiko Kita
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

This study aims to improve the performance of unsupervised morphological analysis using NPYLM for minority languages. Conventional methods require a large amount of data for training, but the amount of data for minority languages is limited. So far, we have tried to improve the performance of unsupervised morphological analysis by using the ”replacement” method, in which words that are correctly analyzed are replaced with different symbol types even if the amount of data is limited. As an improvement of the ”replacement” method, we have also studied a ”limited replacement” method based on TF-IDF under the assumption that words should be replaced and words that should not be replaced. In this paper, we aim to improve TF-IDF’s performance by making further improvements in its operation. As a result, the F-value of TF-IDF is greatly improved, enabling us to extract word candidates efficiently even from documents consisting of unknown words.

Content from these authors
© 2021 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top