2025 Volume 27 Issue 2 Pages 43-47
Objective: This study evaluated the practical utility of Microsoft Copilot PRO®, a GPT-4-based generative AI tool, in addressing diverse medication inquiries at a hospital's drug information center.
Design: We conducted an observational study in which three experienced drug information pharmacists independently evaluated the AI-generated responses and rated their overall utility using a three-point scale.
Methods: A total of 330 inquiries drawn from 11 predefined categories (e.g., dose conversion, pharmacokinetics, and administration during pregnancy) were selected from five years of archived queries. Each question was entered verbatim into Copilot's "GPT-4 Strict" mode to generate one response per query. The raters classified the answers as "useful," "not useful," or "uncertain".
Results: A total of 44.9% of the 330 answers were deemed "useful." Among the 11 categories, the highest proportion of "useful" responses was observed for pregnancy-related inquiries (58. 9%), while the lowest occurred for questions on admixture stability (15.6%).
Conclusion: Although Microsoft Copilot PRO® achieved a 44.9% "useful" rating overall, its performance varied across inquiry categories, with particularly low usefulness for admixture-related questions. Despite presenting reference links that allow pharmacists to scrutinize supporting evidence, concerns remain regarding the limited agreement among evaluators, especially for "uncertain" cases. Further refinement of AI tools and increased availability of high-quality online drug information may enhance the reliability of the copilot. Future studies should examine the reproducibility, explore optimal prompt designs, and involve larger multicenter samples. Although Copilot shows promise, it cannot replace critical human judgment in drug information services, underscoring the need for ongoing evaluation.