In this study, we propose a method that enables intuitive three-dimensional annotation on real-world environments reconstructed using 3D Gaussian Splatting (3DGS). A 3D scene is generated from images captured by drones or other arbitrary devices, and users can place annotations through simple and intuitive operations such as clicking and dragging. To ensure accurate placement, annotation objects—such as lines, text, and textures—are aligned to the surface geometry of the 3D scene based on depth information obtained from the mesh and screen coordinates. Evaluation through usage examples demonstrates that annotations can be placed without interference and viewed from arbitrary angles. This method is expected to dramatically improve the efficiency of visual communication for rescue teams and related agencies.
In this research, we propose a novel approach using unsupervised metric learning tailored to datasets characterized by complex similarities and connections, such as those found in paintings and makeup, which are challenging to express linguistically. These datasets often present the difficulty of adequately analyzing data points due to the intricate interplay of defining elements, a limitation of traditional labeling methods. Additionally, the high degree of specialization required makes annotation significantly costly. Unsupervised metric learning emerges as a powerful tool for extracting more cost-effective features and for the comprehensive analysis of these datasets. Expanding upon previous research that utilized style transfer models, our study further explores feature design, specifically focusing on extracting detailed information about critical aspects of similarity assessment, such as color and shape. Our model adeptly incorporates visual information, unveiling the hidden abstract connections within datasets. We validated our approach using a dataset of Ukiyo-e, a genre of Japanese painting, and achieved accuracy comparable to supervised learning models. This research opens up new possibilities for the analysis of complex image datasets with abstract relational depth, fostering a deeper understanding of the data.
To automate the alignment task of a flat cable and its receptacle, this paper proposes flat cable pose estimation which is robust to background variation and control framework. To estimate the flat cable pose, we obtain entire three-channel edge map with sub-pixel order and segmentation mask which is initially generated entirely inside the cable region. To isolate only flat cable boundary edges, the initial mask iteratively expanded until its outermost pixels contact edges. The final cable boundary edges are obtained by retaining only those edge pixels contacted during expansion from the entire edge map. For the receptacle, corner positions are yielded using SIFT-based matching by leveraging the consistency of background texture. The obtained corners are used for pose estimation via stereo matching. Experimental results demonstrate that the proposed method localizes the corners of the flat cable with a maximum error of less than one pixel under background variation. In the control experiments, cable pose reached target pose within 5 seconds that are required within industrial site, in all 50 trials. This paper presents pose estimation method that is robust to background variation achieves sub-pixel order, along with automated alignment framework based on estimated results.