2026 Volume 7 Issue 1 Pages 133-142
This study proposes a method for automatically generating structured data from borehole log images that do not contain embedded text information, using Vision-Language Models (VLMs). While conventional OCR technology can recognize characters in images, it has limitations in understanding the complex tabular structure specific to borehole logs and associating geological layer information with test values. Our method employs a two-phase VLM processing approach (schema element selection + YAML extraction) using the Google Gemini API to generate XML DTD (Document Type Definition)-compliant structured data represented in YAML directly from images. We evaluated the effects of model selection and image resolution on extraction accuracy using 10 borehole datasets (12 pages) obtained from the Hokuriku Ground Information System. Experimental results confirmed that when using the Gemini 3 Pro model, the F1 score for geological layer extraction was 95.0%, the F1 score for SPT depth matching was 79.3%, the N-value exact match rate was 80.8%, and the coordinate match rate was 90.0%. These results demonstrate that automated structuring of borehole log images is achievable with practical accuracy.