2017 Volume 24 Issue 2 Pages 187-204
This research proposes a context-restricted Skip-gram model for acquiring synonyms by employing various properties of the context words. The original Skip-gram model learned the word vector of each target word by utilizing all the context words around it. In contrast, the proposed context-restricted Skip-gram model learns multiple word vector types of each target word by limiting the context words to those pertaining to specific parts of speech or those present at specific relative positions. The proposed method calculates the cosine similarities on multiple word vector types and combines these similarities using linear support vector machines. The proposed method has high interpretability because it is a weighted linear summation of simple models. The interpretability of the proposed method enables us to investigate the degree of influence for acquiring synonyms from various properties of the context words. Moreover, the proposed method has high extendability because the conditions of context restriction can be easily changed and added. Experimental results using actual Japanese corpora showed that the proposed method aggregating multiple context-restricted models achieved a higher performance than the previous single Skip-gram model. In addition, the estimated weights of various properties of the context words could appropriately elucidate some grammatical characteristics of the Japanese language.