Journal of the Japan Society for Precision Engineering
Online ISSN : 1882-675X
Print ISSN : 0912-0289
ISSN-L : 0912-0289
Current issue
Displaying 1-22 of 22 articles from this issue
Special Issue : 2025 JSPE Technology Award
Lecture
Episode
My Experience in Precision Engineering
Gravure
Introduction to Precision Engineering
Introduction of Laboratories
Visit to Corporate Members
 
Paper
  • Keisuke TOIDA, Naoki KATO, Osamu SEGAWA, Takeshi NAKAMURA, Kazuhiro HO ...
    2026Volume 92Issue 2 Pages 169-172
    Published: February 05, 2026
    Released on J-STAGE: February 05, 2026
    JOURNAL FREE ACCESS

    In this paper, we propose Ground IoU (Gr-IoU) to address the data association problem in multi-object tracking. When tracking objects detected by cameras, ID switches where different IDs are assigned to the same object in consecutive frames—frequently occur, especially when objects are in close proximity or overlapping. To address this issue, we intro-duce Gr-IoU, which incorporates constraints from the 3D structure of the scene. Gr-IoU transforms conventional bounding boxes from image space to the ground plane using vanishing point geometry. The IoU calculated with these transformed bounding boxes demonstrates higher sensitivity to the front-back relationships between objects, resulting in improved data association accuracy and reduced ID switches. We evaluated the performance of Gr-IoU using the MOT17 and MOT20 datasets, which include various tracking scenarios such as crowded scenes and sequences with frequent occlusions. Experimental results confirm that Gr-IoU outperforms conventional real-time methods that do not utilize appearance features.

    Download PDF (1483K)
  • Shota MATSUMIYA
    2026Volume 92Issue 2 Pages 173-180
    Published: February 05, 2026
    Released on J-STAGE: February 05, 2026
    JOURNAL FREE ACCESS

    We are developing a fine-grained action recognition system to detect human errors with the aim of achieving Quality Assurance. Recently, there have been numerous studies utilizing multiple sensors and viewpoints to enhance the accuracy of action recognition. However, in manufacturing sites, available locations for camera included sensor installation are often limited, making it difficult to attach cameras to fixed positions. In this study, we propose a method for view-invariant action recognition using 3D skeleton data. By converting the 3D skeleton data into a body-coordinate system and utilizing it for training, we have developed and evaluated a model capable of recognizing actions from unknown viewpoints. Furthermore, challenges that need to be addressed before implementing the proposed method in practical manufacturing environments are discussed.

    Download PDF (1664K)
  • Tatsuma OKADA, Yusuke FUJITA
    2026Volume 92Issue 2 Pages 181-186
    Published: February 05, 2026
    Released on J-STAGE: February 05, 2026
    JOURNAL FREE ACCESS

    Aging concrete structures pose growing risks to public safety and the resilience of infrastructure systems. This situation highlights the importance of detecting damage at an early stage and performing maintenance in a timely manner. In Japan, many infrastructure facilities have been in service for more than 50 years. As a result, there is an urgent need to improve the efficiency of inspections and reduce their associated costs. Recent advances in deep learning offer potential for automating infrastructure inspections. However, these methods typically require a large amount of labeled data. In particular, the annotation process requires expert knowledge and involves significant time and cost, which creates a major obstacle to practical application. In this study, we present an efficient annotation method for detecting cracks in concrete surfaces. The method uses a convolutional neural network for classification, which is improved by incorporating the Convolutional Block Attention Module (CBAM) and Score-CAM. CBAM enhances the model's ability to extract features related to cracks. In addition, Score-CAM is applied to the layers at both early and later stages of the network. The feature maps from these layers are then combined to improve the stability of the predictions and to achieve high accuracy in identifying cracks at the pixel level.

    Download PDF (2471K)
  • Eiichiro MOMMA, Yuki YOSHIHARA, Yoshio NAKAMURA
    2026Volume 92Issue 2 Pages 187-192
    Published: February 05, 2026
    Released on J-STAGE: February 05, 2026
    JOURNAL FREE ACCESS

    We investigate a method for video fire detection that contributes to rapid fire extinguishing, using semantic segmentation based on deep learning with images. In this study, we aim to detect fires in video footage from existing surveillance cameras, focusing on the detection of smoke with a transparent background shortly after ignition. In this paper, we investigate fire detection using semantic segmentation, which does not require training on images of fires or smoke. By utilizing the inference results from PSPNet, which was trained on the ADE20K dataset, we demonstrate that the pixel-wise class and confidence vary depending on the presence of flames and smoke.

    Download PDF (9671K)
  • Mei SUZUKI, Kohei TORIMI, Yoshimitsu AOKI
    2026Volume 92Issue 2 Pages 193-198
    Published: February 05, 2026
    Released on J-STAGE: February 05, 2026
    JOURNAL FREE ACCESS

    Interaction recognition between multiple individuals is crucial in many practical scenarios, particularly in sports analysis, yet remains challenging due to complex dynamics and directional ambiguity. Most existing methods primarily focus on single-person actions, which are insufficient for analyzing multi-person interactions. The SportsHHI dataset, designed specifically for human-human interactions in sports videos, provides detailed annotations but faces limitations from strict subject-object direction definitions, class imbalance, and inadequate encoding of relative spatial information. To overcome these issues, this study proposes three enhancements: (1) redefining relative spatial encoding to include global positional context, (2) introducing a directionality-agnostic evaluation method suitable for bidirectional interactions, and (3) employing focal loss to address class imbalance. Experimental results on SportsHHI demonstrate the effectiveness of the proposed improvements, achieving up to an increase 2.67 at mAP compared to baseline methods.

    Download PDF (6865K)
feedback
Top