2020 Volume 32 Issue 5 Pages 923-933
This paper proposes a method of automatically collecting a dataset for training a tweet sentiment analysis model. The proposed method automatically collects the training dataset based on the existence and the types of emoji. In this paper, we make a comparison between the sentiment analysis model trained with the automatically collected training dataset and the model trained with a training datasetthat has manually identified sentiment labels. The evaluation result demonstrates that, in terms of the performance of the trained tweet sentiment analysis model, automatically collected 9,000–136,212 tweets correspond to 270–540 tweets with manually identified sentiment labels.