Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper (Peer-Reviewed)
A Simple and Effective Usage of Word Clusters for CBOW Model
Yukun FengChenlong HuHidetaka KamigaitoHiroya TakamuraManabu Okumura
Author information
JOURNAL FREE ACCESS

2022 Volume 29 Issue 3 Pages 785-806

Details
Abstract

We propose a simple and effective method for incorporating word clusters into the Continuous Bag-of-Words (CBOW) model. Specifically, we propose replacing infrequent input and output words in CBOW with their clusters. The resulting cluster-incorporated CBOW model produces embeddings of frequent words and a small amount of cluster embeddings, which will be fine-tuned in downstream tasks. We empirically demonstrate that our replacing method works well on several downstream tasks. Through our analysis, we also show that our method is potentially useful for other similar models that produce word embeddings.

Content from these authors
© 2022 The Association for Natural Language Processing
Previous article Next article
feedback
Top