Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
 
Phrase-Level Topic Modeling Based on Joint Embedding Space of Words, Phrases and Documents
Zikai ZhouKei WakabayashiHiroyoshi Ito
Author information
JOURNAL FREE ACCESS

2024 Volume 32 Pages 256-264

Details
Abstract

In topic modeling, phrases act as important grammatical units that help users interpret the semantics of extracted topics. Embedding-based topic modeling, which has been proposed recently, is a promising approach to extracting phrase-level topics because it does not suffer from scalability issues due to the increased vocabulary size by adding phrases. However, the quality of the phrase-level topics extracted by this approach has not been evaluated, and the effect of the choice of the embedding models used for this method has not been investigated. In this paper, we validate the performance of the phrase-level embedding-based topic modeling and evaluate the effect of the embedding models on the quality of the phrase-level topics. From the result of the evaluation, we realized that the existing pre-trained BERT models have limitations in either sentence or phrase representation; therefore, we further propose a joint fine-tuning of BERT for phrase and sentence embeddings to improve the quality of phrase-level topic modeling. The experimental results quantitatively and qualitatively demonstrate that the jointly fine-tuned BERT yields more coherent phrase-level topics compared with other methods, including popular LDA-based phrase topic modeling.

Content from these authors
© 2024 by the Information Processing Society of Japan
Previous article Next article
feedback
Top