Utilizing Emoji in Collecting Training Instances of a Model for Sentiment Analysis of Tweets

Youchao LIN; Hongyi CUI; Takehito UTSURO

doi:10.3156/jsoft.32.5_923

Abstract

This paper proposes a method of automatically collecting a dataset for training a tweet sentiment analysis model. The proposed method automatically collects the training dataset based on the existence and the types of emoji. In this paper, we make a comparison between the sentiment analysis model trained with the automatically collected training dataset and the model trained with a training datasetthat has manually identified sentiment labels. The evaluation result demonstrates that, in terms of the performance of the trained tweet sentiment analysis model, automatically collected 9,000–136,212 tweets correspond to 270–540 tweets with manually identified sentiment labels.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!