周術期中止薬の評価支援における生成AIモデルの性能比較―臨床薬学的評価に基づく検討―

槇枝 大貴; 猪田 宏美; 廣江 訓子; 奥田 浩人; 髙橋 徹多; 黒田 智; 道原 あやな; 成本 由佳; 丸尾 陽成; 濱野 裕章; 西原 茂樹; 座間味 義人

doi:10.5649/jjphcs.52.200

一般論文

周術期中止薬の評価支援における生成AIモデルの性能比較―臨床薬学的評価に基づく検討―

槇枝大貴, 猪田宏美, 廣江訓子, 奥田浩人, 髙橋徹多, 黒田智, 道原あやな, 成本由佳, 丸尾陽成, 濱野裕章, 西原茂樹, 座間味義人

著者情報

キーワード: generative artificial intelligence, perioperative medication discontinuation, clinical decision support, pharmacy practice

ジャーナルフリー

2026 年 52 巻 4 号 p. 200-210

DOI https://doi.org/10.5649/jjphcs.52.200

詳細

抄録

Generative artificial intelligence (AI) has rapidly advanced and is expected to enhance the efficiency and quality of clinical practice. In pharmacy practice, determining perioperative medication discontinuation is a critical task that is directly related to patient safety. However, standardized guidance remains limited, particularly for over-the-counter (OTC) drugs and dietary supplements, and decisions often rely on the individual expertise of pharmacists. In this study, the accuracy of nine generative AI models available in April 2025 (GPT-4o, GPT-4o mini, OpenAI o3, Gemini 2.5 Pro Exp, Gemini 2.0 Flash, Claude 3.7 Sonnet, Grok 3, Llama 4 Scout, and DeepSeek R1) for perioperative medication discontinuation decisions was evaluated using 15 mock prescription sets comprising 105 items. Each model received a standardized Japanese prompt, and outputs were independently assessed by five hospital pharmacists (≥5 years of clinical experience) based on three criteria: accurate drug identification, appropriateness of discontinuation and resumption timing, and the validity of the pharmacological rationale. A response was considered correct when at least four of the pharmacists agreed. OpenAI o3 demonstrated the highest accuracy (87.6％), followed by Gemini 2.5 Pro Exp (84.8％) and GPT-4o (83.8％). Lightweight models demonstrated lower accuracy, particularly for OTC products, dietary supplements, and fixed-dose combination drugs. High-performance models with advanced reasoning capabilities exhibited high accuracy and may serve as useful decision-support tools. However, incorrect responses occurred in approximately 10 – 20％ of cases, even among the top-performing models. Therefore, safe clinical implementation requires careful model selection, integration with institutional knowledge resources, and final verification by pharmacists.

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）