JSAI Technical Report, Type 2 SIG
Online ISSN : 2436-5556
Development and Preliminary Evaluation of UT-MedEval: A Clinical Medicine Knowledge Assessment Set for LLMs Utilizing Clinical Medicine Ontology
Masafumi ITOKAZUSatoshi KATAOKAKouji KOZAKITakeshi IMAI
Author information
RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

2024 Volume 2024 Issue AIMED-014 Pages 06-

Details
Abstract

In this paper, we present the development and evaluation of a clinical medical knowledge assessment set for large language models (LLMs), named UT-MedEval, using the detailed version of the disease ontology from the Clinical Ontology in Anatomical Structure and Disease (CONAND). UT-MedEval covers multiple medical domains, including cardiology, gastroenterology, neurology, nephrology-endocrinology, diabetes-metabolism, allergy-rheumatology, and orthopedics. It consists of 980 questions across three types of tasks (question-answering) and three response formats: free-form answers, multiple-choice (20 options), and yes/no questions. We evaluated OpenAI's GPT-4o and GPT-4o mini on this dataset. The accuracy results were as follows: GPT-4o achieved a correctness rate of 74.3% (95% CI: 71.5-77.0), while GPT-4o mini achieved 65.5% (95% CI: 62.5-68.4). The task requiring answers about the causes of diseases had the lowest accuracy.

Content from these authors
© 2024 Authors
Previous article Next article
feedback
Top