大規模言語モデルによる自由記述アンケート自動集約のための疑似訓練事例生成

長谷川 遼; 銭本 友樹; 宇津呂 武仁; 西崎 博光; 吉岡 真治; 神門 典子

doi:10.11517/pjsai.JSAI2024.0_1J4OS10b03

38th (2024)

Session ID : 1J4-OS-10b-03

DOI https://doi.org/10.11517/pjsai.JSAI2024.0_1J4OS10b03

Conference information

Host: The Japanese Society for Artificial Intelligence

Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence

Number : 38

Location : [in Japanese]

Date : May 28, 2024 - May 31, 2024

Pseudo Training Data Generation for Automatic Aggregation of Open-Ended Questionnaire Responses by Large Language Models

*Ryo HASEGAWA, Yuki ZENIMOTO, Takehito UTSURO, Hiromitsu NISHIZAKI, Masaharu YOSHIOKA, Noriko KANDO

Author information

Keywords: large language models, open-ended questionnaire, pseudo data, automatic aggregation

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

Analyzing surveys utilizing open-ended responses to questionnaires is a valuable approach to elucidating respondents' perspectives and opinions, thereby gaining insights. However, the analysis of responses on a large scale necessitates a considerable amount of manual labor. Thus, this paper takes an approach of automating the analysis of open-ended responses using large language models. We have generated several types of pseudo data for training category classification models and evaluated the performance of the models trained on each dataset. Through this process, we examine the performance improvements of category classification models using the pseudo datasets automatically generated and annotated by large language models. Evaluation results show that, through several operations of pseudo open-ended responses, we improved the category classification performance against real open-ended responses from 77% to 83%.

Corresponding author

Conference information

Register with J-STAGE for free!