Pharmaceutical Library Bulletin
Online ISSN : 2186-070X
Print ISSN : 0386-2062
ISSN-L : 0386-2062
 
Performance Comparison of Language Models in a RAG-based Reference Chatbot Created with Dify : Exploring Applicability to Library Reference Services
Satoshi HASHIMOTO
Author information
JOURNAL FREE ACCESS

2025 Volume 70 Issue 3 Pages 90-97

Details
Abstract

This study develops a RAG-based chatbot augmented with our institution’s homepage as a knowledge source and conducts comparative experiments with different language models (LMs). We compared combinations of two embedding LMs (OpenAI text-embedding-3-large and Snowflake arctic-embed2) and five response LMs (GPT-4o mini and Gemma3 series 1B, 4B, 12B, and 27B). Each combination was evaluated using 30 questions in total : 10 questions for each of three categories : (i) open-ended question answering, (ii) true/false judgments on closed questions, and (iii) refusal to answer irrelevant questions. Performance improved as the parameter count of the response LMs increased, with Gemma3 27B achieving results nearly equivalent to GPT-4o mini. Models with 4B parameters or more appropriately refused to answer irrelevant questions, while models with 12B parameters or fewer failed to adequately handle categories (i) and (ii). Results suggest that response models at the Gemma3 27B scale or larger are necessary for practical RAG chatbot construction.

Content from these authors
© 2025 Japan Pharmaceutical Library Association
Previous article Next article
feedback
Top