Host: The Japanese Society for Artificial Intelligence
Name : The 36th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 36
Location : [in Japanese]
Date : June 14, 2022 - June 17, 2022
The text segmentation is a technology to divide texts according to topics. It is an important one to support natural language processing tasks such as document retrieval, summarization, and extraction, which is expected to be used for unstructured data. Unsupervised methods had been studied in the early days, most of which were heuristics, therefore, challenges were recognized in the text segmentation based on domain-specific knowledge and the text segmentation of various granularities. In recent years, deep learning-based supervised methods have been proposed to achieve highly accurate segmentation by using context-aware features, but their application is limited due to the high annotation cost. In this study, we propose an unsupervised method based on deep learning. Specifically, we introduce "Invariant Information Clustering" which is reportedly successful in the image field, to the Transformer-based network. We created a method of clustering approach that enables us to realize the text segmentation of various granularities. We show the lower error rate compared with the conventional unsupervised methods in the text segmentation of email documents containing job information.