日本語大規模SNS+Webコーパスによる単語分散表現のモデル構築

松野 省吾; 水木 栄; 榊 剛史

doi:10.11517/pjsai.JSAI2019.0_4Rin113

Abstract

In this paper, we present the word embedding model constructed by Japanese text existing on SNS including Twitter. This model is created from a Japanese large-scale corpus using multiple categories such as SNS data, Wikipedia, and Web pages as media. Perorming the evaluation by the word similarity calculation task with Speaman's rank correlation coefficient as the evaluation index for the created word embedding model resulted in a performance of about 7 points better than the model created by only Wikipedia as the learning corpus was obtained. The presented word embedding model in this paper is planned to be released through the website, and we hope that by utilizing this model, natural language processing research for SNS data will become more active.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!