Proceedings of the Annual Conference of JSAI
Online ISSN : 2758-7347
39th (2025)
Session ID : 3A6-GS-10-02
Conference information

Performance Evaluation of Multimodal LLM with Life Insurance Business Data
*Tamao SHIMIZUHibiki BANNAIYoshiaki ONISHI
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract

In order to apply multimodal LLM to a life insurance company's inquiry response task, we constructed a benchmark using business data to compare and evaluate the actual performance of multiple models. We evaluated three models, Claude 3.5 Sonnet, Gemini 1.5 Pro, and GPT-4o, focusing on document QA and textualization of image content tasks. As a result, Claude 3.5 Sonnet showed the highest accuracy in document QA, and Gemini 1.5 Pro showed the highest accuracy in the image content text conversion task. In addition, we identified the characteristics of charts and tables in in-house documents that were difficult for LLM to recognize. Through these evaluations, we confirmed that benchmarking using business data yields results that are different from those obtained by general-purpose benchmarks that are publicly available.

Content from these authors
© 2025 The Japanese Society for Artificial Intelligence
Previous article Next article
feedback
Top