2021 Volume 28 Issue 1 Pages 235-252
We reduce the model size of word embeddings while preserving its quality. Previous studies composed word embeddings from those of subwords and mimicked the pre-trained word embeddings. Although these methods can reduce the vocabulary size, it is difficult to extremely reduce the model size while preserving its quality. Inspired by the observation of words with similar meanings having similar embeddings, we propose a multitask learning that mimicks not only the pre-trained word embeddings but also the similarity distribution between words. Experimental results on word similarity estimation tasks show that the proposed method improves the performance of existing methods and reduces the model size by a factor of 30 while preserving the quality of the original word embeddings. In addition, experimental results on text classification tasks show that we reduce the model size by a factor of 200 while preserving 90% of the quality of the original word embeddings.