Article ID: 2025PCP0006
This study proposes a knowledge-based handcrafted building region extraction algorithm that can accurately identify the building and its background from street image at pixel level. The proposed algorithm leverages a customized patch-based graph cut inspired by human visual perception mechanisms. At the patch-based graph cut, the similarity of patches is measured by the cutting-edge deep neural networks (DNNs). The graph settings are based on the knowledge that buildings are captured at the center of the image owing to their main subject. Our experiment, which employed 300 images included in well-known open dataset, demonstrated that the proposed method employing GrabCut for a pixel-level segmentation significantly increased the comprehensive accuracy of building region extraction, which is measured by intersection over union (IoU), by 12.29% or more compared with the conventional knowledge-based method using color segmentation. This stems from the fact that the proposed method presents the more accurate building and background candidates by 8.57% or more. In addition, the GrabCut-based proposed method represented a similar accuracy to the state-of-the-art DNN-based semantic segmentation based on a transformer architecture. Further comparisons and discussions are provided in this paper to clarify the effectiveness of the proposed method.