2025 年 16 巻 論文ID: PP4030
Large language models (LLMs) offer new possibilities for analyzing unstructured traffic accident narratives by automating information extraction and classification. This study evaluates the performance of ChatGPT-4o, ChatGPT-o1, ChatGPT-o3-mini, DeepSeek, Gemini, and Llama 3 in extracting structured data from approximately 18,000 Mongolian accident reports (2017–2022). Results indicate that ChatGPT-4o and ChatGPT-o1 achieve the highest accuracy in structured data extraction and fault assignment, while Gemini and Llama 3 struggle with event sequencing and contextual understanding. Despite their strengths, LLMs face challenges in handling ambiguous descriptions and multi-event accidents, highlighting the need for hybrid AI-human validation. Compared to prior research, our study confirms that LLMs improve classification accuracy but require prompt optimization and domain adaptation. Future work should focus on integrating multimodal datasets, refining event extraction algorithms, improving robustness to language variation, expanding dataset diversity, and fine-tuning models for enhanced accident analysis, predictive modeling, and decision support systems.