Host: The Japanese Society for Artificial Intelligence
Name : The 39th Annual Conference of the Japanese Society for Artificial Intelligence
Number : 39
Location : [in Japanese]
Date : May 27, 2025 - May 30, 2025
To improve business processes, Retrieval-Augmented Generation (RAG) applied to internal documents ideally allows AI to generate insights regarding the intent and purpose of tasks, and then retrieve and answer using relevant documents. However, conventional RAG relies on similarity between query and document embeddings, making it difficult to retrieve information from image-containing documents where such insights are not explicitly stated. Existing Multi-Representation-Indexing methods, which convert image captions into embeddings, also lack this insight generation capability. This study proposes a novel method that generates insight sentences from image-containing documents to enhance retrieval. Documents are decomposed page-by-page; for each page, an image caption and subsequent insight sentences are generated, along with anticipated question-answer pairs. These are then converted into embeddings. Experiments using open datasets demonstrate that incorporating these generated insights improves retrieval accuracy compared to conventional approaches.