Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding

Toshihiko NISHIMURA; Hirofumi ABE; Kazuhiko MURASAKI; Taiga YOSHIDA; Ryuichi TANIDA

doi:10.1587/transinf.2025DVL0006

Abstract

This letter describes a training-free 3D semantic segmentation method using virtual cameras and a 2D foundation model guided by language prompts. Aggregating multi-view predictions via weighted voting achieves accuracy comparable to supervised methods and supports open-vocabulary recognition without requiring annotated 3D data or paired RGB images.

Content from these authors

Favorites & Alerts

Add to favorites
Additional info alert
Citation alert
Authentication alert

Corresponding author

Register with J-STAGE for free!