Host: The Japanese Society for Artificial Intelligence
Name : The 38th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 38
Location : [in Japanese]
Date : May 28, 2024 - May 31, 2024
Retrieval-Augmented Generation (RAG) is a technique that enables question-answering for internal organizational documents by integrating external information with large-scale language models. In recent years, there has been a growing trend in question-answering services that combine ChatGPT with RAG. However, using high-performance models like GPT-4 in large-scale settings can lead to increased API costs due to the rising number of input tokens. This study proposes an additional step that utilizes lower-cost models, such as GPT-3.5, to selectively extract only the necessary information from documents before generating responses. This approach aims to reduce the number of tokens used during response generation, thereby potentially lowering the operational costs associated with GPT-4. The paper also compares the results of this proposed method with those of conventional methods to assess its effectiveness. The findings indicate that the proposed method manages to reduce costs while maintaining accuracy.