2025 Volume 6 Issue 3 Pages 692-702
With the advancement of Mobile Mapping Systems (MMS) and ground-based LiDAR, acquiring high- density 3D point cloud data in urban environments has become increasingly feasible. This study investigates the applicability of Transformer-based deep learning models, specifically PointTransformer, for high-accuracy semantic segmentation of urban structures such as ground surfaces, buildings, utility poles and wires, vegetation, and vehicles. We evaluated the impact of various loss functions—including Weighted Cross-Entropy (WCE), Dice Loss, and Focal Loss—and the presence or absence of RGB color information on classification performance. The combination of PointTransformer with RGB data and WCE achieved the highest accuracy, reaching over 90 % overall accuracy and mIoU values of 0.876 and 0.667 in test areas A and C, respectively. Dice Loss showed high precision but suffered from lower recall, indicating sensitivity to class imbalance and region characteristics. Excluding RGB information led to a noticeable drop in performance, especially in identifying vegetation and vehicles. Compared to PointNet++,PointTransformer demonstrated superior performance, owing to its self-attention mechanism, which effectively captures long-range spatial dependencies in complex urban scenes. The results suggest that Transformer-based models are well-suited for urban point cloud classification and can contribute to infrastructure maintenance, disaster response, and vegetation management. Future work includes exploring projection-based approaches, sensor fusion with camera imagery, and few-shot learning to enhance generalizability and accuracy, particularly for small or structurally similar objects.