Difyで作成した図書館サービスへの質問に回答するRAG型レファレンスチャットボットを用いた文章生成AIの性能比較実験

橋本 郷史

doi:10.11291/jpla.70.3_90

Abstract

This study develops a RAG-based chatbot augmented with our institution’s homepage as a knowledge source and conducts comparative experiments with different language models (LMs). We compared combinations of two embedding LMs (OpenAI text-embedding-3-large and Snowflake arctic-embed2) and five response LMs (GPT-4o mini and Gemma3 series 1B, 4B, 12B, and 27B). Each combination was evaluated using 30 questions in total : 10 questions for each of three categories : (i) open-ended question answering, (ii) true/false judgments on closed questions, and (iii) refusal to answer irrelevant questions. Performance improved as the parameter count of the response LMs increased, with Gemma3 27B achieving results nearly equivalent to GPT-4o mini. Models with 4B parameters or more appropriately refused to answer irrelevant questions, while models with 12B parameters or fewer failed to adequately handle categories (i) and (ii). Results suggest that response models at the Gemma3 27B scale or larger are necessary for practical RAG chatbot construction.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!