2025 Volume 6 Issue 2 Pages 173-178
In recent years, efficient infrastructure management has become increasingly important due to a decline in the number of engineers. However, structural drawings required for inspections have not been integrated into a centralized database, highlighting the need for an efficient method to convert scanned drawings into CAD data. This study proposes a method for text recognition in drawings by integrating a text detection model (FCENet) with a Large Multimodal Model (LMM, GPT-4o) to facilitate CAD conversion. Experimental results demonstrate that the proposed method, which first detects digit locations using the text detection model and then inputs individual text detection results into the LMM while minimizing the influence of background noise and unnecessary lines, reduces the burden on the LMM to infer digit positions. This approach enables more stable and accurate text recognition. Furthermore, updates to the LMM model play a crucial role in improving text recognition accuracy in drawings, and future adoption of more advanced models is expected to further enhance accuracy.