生命保険業務データによるマルチモーダルLLMの性能評価

清水 玉緒; 坂内 響; 大西 啓晃

doi:10.11517/pjsai.JSAI2025.0_3A6GS1002

Abstract

In order to apply multimodal LLM to a life insurance company's inquiry response task, we constructed a benchmark using business data to compare and evaluate the actual performance of multiple models. We evaluated three models, Claude 3.5 Sonnet, Gemini 1.5 Pro, and GPT-4o, focusing on document QA and textualization of image content tasks. As a result, Claude 3.5 Sonnet showed the highest accuracy in document QA, and Gemini 1.5 Pro showed the highest accuracy in the image content text conversion task. In addition, we identified the characteristics of charts and tables in in-house documents that were difficult for LLM to recognize. Through these evaluations, we confirmed that benchmarking using business data yields results that are different from those obtained by general-purpose benchmarks that are publicly available.

Content from these authors

Favorites & Alerts

Corresponding author

Conference information

Register with J-STAGE for free!