Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
General Paper
Compact Word Embeddings Based on Global Similarity
Sora OhashiMao IsogawaTomoyuki KajiwaraYuki Arase
Author information
JOURNAL FREE ACCESS

2021 Volume 28 Issue 1 Pages 235-252

Details
Abstract

We reduce the model size of word embeddings while preserving its quality. Previous studies composed word embeddings from those of subwords and mimicked the pre-trained word embeddings. Although these methods can reduce the vocabulary size, it is difficult to extremely reduce the model size while preserving its quality. Inspired by the observation of words with similar meanings having similar embeddings, we propose a multitask learning that mimicks not only the pre-trained word embeddings but also the similarity distribution between words. Experimental results on word similarity estimation tasks show that the proposed method improves the performance of existing methods and reduces the model size by a factor of 30 while preserving the quality of the original word embeddings. In addition, experimental results on text classification tasks show that we reduce the model size by a factor of 200 while preserving 90% of the quality of the original word embeddings.

Content from these authors
© 2021 The Association for Natural Language Processing
Previous article Next article
feedback
Top