-
Bin YANG, Mingyuan LI, Yuzhi XIAO, Haixing ZHAO, Zhen LIU, Zhonglin YE
Article type: PAPER
Subject area: Fundamentals of Information Systems
2025Volume E108.DIssue 11 Pages
1292-1301
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: April 24, 2025
JOURNAL
FREE ACCESS
Aiming at the problem that existing graph neural network architectures usually use a single scale to process graph data, which leads to information loss and simplification, this paper proposes a novel graph neural network approach, the M2GNN framework, which aims to enhance the feature learning capability of graph structured data through multi-scale fusion and attention mechanism. In M2GNN, each channel handles graph features at different scales separately, and integrates local and global information using multi-scale fusion methods to capture features at different levels in the graph structure. The learned features from each channel are then weighted and fused using an attention mechanism to extract the most representative feature representation. The experimental results show that compared with the traditional graph neural network approach, M2GNN improves the performance by 0.70% to 54.14%, 0.34% to 54.31%, and 0.68% to 54.40% for the node classification task with different label coverages, which verifies the effectiveness of the multi-channel and multi-scale fusion strategies.
View full abstract
-
Mitsuhiro WATANABE, Go HASEGAWA
Article type: PAPER
Subject area: Information Network
2025Volume E108.DIssue 11 Pages
1302-1314
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: May 23, 2025
JOURNAL
FREE ACCESS
As the Internet becomes larger-scaled and more diversified, the traditional end-to-end (E2E) congestion control faces various problems such as low throughput on long-delay networks and unfairness among flows with different network situations. In this paper, we propose a novel congestion control architecture, called in-network congestion control (NCC). Specifically, by introducing one or more nodes (NCC nodes) on an E2E network path, we divide the network path into multiple sub-paths and maintain a congestion-control feedback loop on each sub-path. In each sub-path, a specialized congestion control algorithm can be applied according to its network characteristics. This architecture can provide various advantages compared with the traditional E2E congestion control, such as higher data transmission throughput, better per-flow fairness, and incremental deployment nature. In this paper, we describe NCC’s advantages and challenges, and clarify its potential performance by evaluation results. We reveal that the E2E throughput improves by as much as 159% by just introducing NCC nodes. Furthermore, increasing the number of NCC nodes improves the E2E throughput and fairness among flows by up to 258% and 151%, respectively.
View full abstract
-
Shuhei YAMAMOTO, Yasunori AKAGI, Tomu TOMINAGA, Takeshi KURASHIMA
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2025Volume E108.DIssue 11 Pages
1315-1324
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: May 14, 2025
JOURNAL
FREE ACCESS
Present bias, the cognitive bias that prioritizes immediate rewards over future ones, is considered one of the factors that can hinder goal achievement. Estimating present bias is crucial for developing effective intervention strategies for behavioral change. This paper proposes a novel method for estimating present bias using using behavior history data collected by wearable devices. We utilize the Transformer model due to its proficiency in learning relationships within sequential data, such as behavioral history, which includes continuous data (e.g., heart rate) and event data (e.g., sleep onset). To enable the Transformer to capture behavior patterns potentially affected by present bias, we introduce two novel architectures for effectively processing continuous and event data timestamp information in behavioral history: temporal and event encoders (TE and EE). TE discerns the periodic characteristics of continuous data, while EE examines temporal interdependencies in the event data. These encoders enable our proposed model to capture temporally (ir)regular behavioral patterns that may associated with present bias. Our experiments using the behavior history logs of 257 subjects collected over 28 days demonstrated that our method estimates the subjects’ present bias accurately.
View full abstract
-
Trung Minh BUI, Jung-Hoon HWANG, Sewoong JUN, Wonha KIM, DongIn SHIN
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2025Volume E108.DIssue 11 Pages
1325-1334
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: April 23, 2025
JOURNAL
FREE ACCESS
This paper develops a grasp pose detection method that achieves high success rates in real-world industrial environments where elongated objects are densely cluttered. Conventional Vision Transformer (ViT)-based methods capture fused feature maps, which successfully encode comprehensive global object layouts, but these methods often suffer from spatial detail reduction. Therefore, they predict grasp poses that could efficiently avoid collisions, but are insufficiently precisely located. Motivated by these observations, we propose Oriented Region-based Vision Transformer (OR-ViT), a network that preserves critical spatial details by extracting a fine-grained feature map directly from the shallowest layer of a ViT backbone and also understands global object layouts by capturing the fused feature map. OR-ViT decodes precise grasp pose locations from the fine-grained feature map and integrates this information into its understanding of global object layouts from the fused map. In this way, the OR-ViT is able to predict accurate grasp pose locations with reduced collision probabilities. Extensive experiments on the public Cornell and Jacquard datasets, as well as on our customized elongated-object dataset, verify that OR-ViT achieves competitive performance on both public and customized datasets when compared to state-of-the-art methods.
View full abstract
-
Koji ABE, Ryoma KITANISHI, Hitoshi HABE, Masayuki OTANI, Nobukazu IGUC ...
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2025Volume E108.DIssue 11 Pages
1335-1347
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: May 07, 2025
JOURNAL
FREE ACCESS
At fish farms and fish farming facilities, the number of fish is continuously monitored from hatching until shipment. Especially, whenever hatchery-produced juvenile fish are transferred from one indoor aquaculture tank to another, fish farmers who manage the juvenile fish must manually count thousands of the fish, which places a significant burden on them. This paper presents an automated system for counting hatchery-produced juvenile fish in fish farming facilities. This system aims to serve as a foundational technology for aquaculture production management, supporting sustainable production through data-driven aquaculture. In the proposed system, a slide is set up with a video camera positioned above to capture the surface of the slide. The flow of juvenile fish along with water on the slide is recorded, and the number of juvenile fish captured in the video is counted. In every frame of the video, the starting and the ending lines are prepared perpendicular to the direction of fish movement, and the fish regions are tracked between these lines. The count is increased by one when a fish region has crossed the starting line. Subsequently, each fish region is tracked across frames, where the count is increased if a fish region in which an occlusion occurs between multiple fish regions has been separated. Under a custom-built recording setup, experiments were conducted with 10 videos of approximately 200 black medaka being released down the slide, and 2 videos with thousands of hatchery-produced juvenile fish being released down the slide, recorded at an aquaculture facility. The results indicated that the proposed system counted the number of fish accurately in most cases, even in the presence of occlusions.
View full abstract
-
Chuanyang LIU, Jingjing LIU, Yiquan WU, Zuo SUN
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2025Volume E108.DIssue 11 Pages
1348-1358
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: April 24, 2025
JOURNAL
FREE ACCESS
As a common type of defect, the rust defect of power components is one of the important potential hazards endangering the safe operation of transmission lines. How to quickly and accurately discover and repair the rusted power components is an urgent problem to be solved in power inspection. Aiming at the above problems, this study proposes Rust-Defect YOLO (RD-YOLO) for detecting rust defects in power components of transmission lines. Firstly, the Coordinate Channel Attention Residual Module (CCARM) is proposed to improve the multi-scale detection precision. Secondly, the Receptive Field Block (RFB) and the Efficient Convolutional Block Attention Module (ECBAM) are introduced into the Path Aggregation Network (PANet) to strengthen the fusion of deep and shallow features. Finally, the contrast sample strategy and the Focal loss function are adopted to train and optimize RD-YOLO, and experiments are carried out on a self-built dataset. The experimental results show that the average precision of rust defect detection by RD-YOLO reaches 95%, which is 9% higher than that of the original YOLOX. The comparative experimental results demonstrate that RD-YOLO performs excellently in power components identification and rust defect detection, and has broad application prospects in the future automatic visual inspection of transmission lines.
View full abstract
-
Shrey SINGH, Prateek KESERWANI, Katsufumi INOUE, Masakazu IWAMURA, Par ...
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2025Volume E108.DIssue 11 Pages
1359-1372
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: May 14, 2025
JOURNAL
FREE ACCESS
Sign language recognition (SLR) using a video is a challenging problem. In the SLR problem, I3D network, which has been proposed for action recognition problems, is the best performing model. However, the action recognition and SLR are inherently different problems. Therefore, there is room to develop it for the SLR problem to achieve better performance, considering the task-specific features of SLR. In this work, we revisit I3D model to extend its performance in three essential design aspects. They include a better inception module named dilated inception module (DIM) and an attention mechanism-based temporal attention module (TAM) to identify the essential features of signs. In addition, we propose to eliminate a loss function that deteriorate the performance. The proposed method has been extensively validated on WLASL and MS-ASL public datasets. The proposed method has outperformed the state-of-the-art approaches in WLSAL dataset and produced competitive results on MS-ASL dataset, though the results of MS-ASL dataset are indicative due to unavailability of the original data. The Top-1 accuracy of the proposed method on WLASL100 and MS-ASL100 were 79.08% and 82.78%, respectively.
View full abstract
-
Qingxia YANG, Deng PAN, Wanlin HUANG, Erkang CHEN, Bin HUANG, Sentao W ...
Article type: PAPER
Subject area: Image Recognition, Computer Vision
2025Volume E108.DIssue 11 Pages
1373-1380
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: May 23, 2025
JOURNAL
FREE ACCESS
Ship detection in maritime monitoring is crucial for ensuring public safety in marine environments. However, maritime surveillance faces significant challenges due to weak targets (small, low-contrast objects) caused by complex environments and long distances. To address these challenges, we propose YOLO-MSD, a maritime surveillance detection model based on YOLOv8. In YOLO-MSD, Receptive-Field Attention Convolution (RFAConv) replaces standard convolution, learning attention maps via receptive-field interaction to enhance detail extraction and reduce information loss. The C2f module In the neck integrates Omni-Dimensional Dynamic Convolution (ODConv), which dynamically adjusts convolution kernel parameters to effectively capture contextual information, thereby achieving superior multi-scale feature fusion. We introduce a dedicated detection head specifically for small objects to enhance detection accuracy. Furthermore, to address detection box quality imbalance, we employ Wise-IoU for bounding box regression loss, enhancing multi-scale target localization and accelerating convergence. The model achieves precision, recall and mean average precision (mAP50) rates of 93.0%, 90.05% and 95.0%, respectively, on the self-constructed Maritime Vessel Surveillance Dataset (MVSD), effectively meeting the requirements for maritime target detection. We further conduct comparative experiments on the public McShips dataset, demonstrating YOLO-MSD’s broad applicability in ship detection.
View full abstract
-
Guanghui CAI, Junguo ZHU
Article type: PAPER
Subject area: Natural Language Processing
2025Volume E108.DIssue 11 Pages
1381-1391
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: May 15, 2025
JOURNAL
FREE ACCESS
Deep learning has transformed Neural Machine Translation (NMT), but the complexity of these models makes them hard to interpret, thereby limiting improvements in translation quality. This study explores the widely used Transformer model, utilizing linguistic features to clarify its inner workings. By incorporating three linguistic features—part-of-speech, dependency relations, and syntax trees—we demonstrate how the model’s attention mechanism interacts with these features during translation. Additionally, we improved translation quality by masking nodes that were identified to have negative effects. Our approach bridges the complex nature of NMT with clear linguistic knowledge, offering a more intuitive understanding of the model’s translation process.
View full abstract
-
Congda MA, Tianyu ZHAO, Manabu OKUMURA
Article type: PAPER
Subject area: Natural Language Processing
2025Volume E108.DIssue 11 Pages
1392-1401
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: May 07, 2025
JOURNAL
FREE ACCESS
Due to biases inherently present in data for pre-training, current pre-trained Large Language Models (LLMs) also ubiquitously manifest the same phenomena. Since the bias influences the output from the LLMs across various tasks, the widespread deployment of the LLMs is hampered. We propose a simple method that utilizes structured knowledge to alleviate this issue, aiming to reduce the bias embedded within the LLMs and ensuring they have an encompassing perspective when used in applications. Experimental results indicated that our method has good debiasing ability when applied to existing both autoregressive and masked language models. Additionally, it could ensure that the performances of LLMs on downstream tasks remain uncompromised. Importantly, our method obviates the need for training from scratch, thus offering enhanced scalability and cost-effectiveness.
View full abstract
-
Shunya ISHIKAWA, Toru NAKASHIKA
Article type: PAPER
Subject area: Music Information Processing
2025Volume E108.DIssue 11 Pages
1402-1411
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: May 14, 2025
JOURNAL
FREE ACCESS
Recent research in chord recognition has utilized machine learning models. However, few models adequately consider harmonic co-occurrence, a known musical feature. Since the harmonic structure is complex and varies with instrument and pitch, the model itself would need to consider harmonics explicitly, but few such methods exist. We propose the classification semi-restricted Boltzmann machine (CSRBM), a machine learning model that can explicitly consider the co-occurrence of any two pitches. A model parameter learns the co-occurrence function to enable chord recognition with flexible consideration of the harmonic structure. We demonstrate how to incorporate the structure as prior knowledge into the model by setting up a prior distribution of the parameter. We also propose weight-sharing CSRBM (WS-CSRBM), an extension of CSRBM that allows time series to be considered. This model enables the CSRBM to consider time series more efficiently not only by arranging some of the CSRBMs in parallel with the number of frames to be considered but also by sharing some of the parameters. Experimental results show that the recognition accuracies of the proposed methods outperform that of a conventional method that considers the co-occurrence of some harmonics. The effectiveness of the CSRBM’s parameter in learning pitch co-occurrence, setting up a prior distribution for the parameter, and sharing some parameters in WS-CSRBM are also confirmed.
View full abstract
-
Olivier NOURRY, Masanari KONDO, Shinobu SAITO, Yukako IIMURA, Naoyasu ...
Article type: LETTER
Subject area: Software Engineering
2025Volume E108.DIssue 11 Pages
1412-1415
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: May 14, 2025
JOURNAL
FREE ACCESS
[Background] Throughout their lifetime, open-source software systems will naturally attract new contributors and lose existing contributors. Not all OSS contributors are equal, however, as some contributors within a project possess significant knowledge and expertise of the codebase (i.e., core developers). When investigating a project’s ability to attract new contributors and how often a project loses contributors, it is therefore important to take into account the expertise of the contributors. [Goal] Since core developers are vital to a project’s longevity, we therefore aim to find out: can OSS projects attract new core developers and how often do OSS projects lose core developers? [Results] To investigate core developer contribution patterns, we calculate the truck factor (or bus factor) of over 36,000 OSS projects to investigate how often TF developers join or abandon OSS projects. We find that 89% of our studied projects have experienced losing their core development team at least once. Our results also show that in 70% of cases, this project abandonment happens within the first three years of a project’s life. We also find that most OSS projects rely on a single core developer to maintain development activities. Finally, we find that only 27% of projects that were abandoned were able to attract at least one new TF developer.
View full abstract
-
Xingxin WAN, Peng SONG, Siqi FU, Changjia WANG
Article type: LETTER
Subject area: Pattern Recognition
2025Volume E108.DIssue 11 Pages
1416-1420
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: May 14, 2025
JOURNAL
FREE ACCESS
In ideal facial expression recognition (FER) tasks, the training and test data are assumed to share the same distribution. However, in reality, they are often sourced from different domains, which follow different feature distributions and would seriously impair the recognition performance. In this letter, we present a novel Dynamic Graph-Guided Domain-Invariant Feature Representation (DG-DIFR) method, which addresses the issue of distribution shifts across different domains. First, we learn a robust common subspace to minimize the data distribution differences, facilitating the extraction of invariant feature representations. Concurrently, the retargeted linear regression is employed to enhance the discrimination of the proposed model. Furthermore, a maximum entropy based dynamic graph is further introduced to maintain the topological structure information in the low-dimensional subspace. Finally, numerous experiments conducted on four benchmark datasets confirm the superiority of the proposed method over state-of-the-art methods.
View full abstract
-
Yuewei ZHANG, Huanbin ZOU, Jie ZHU
Article type: LETTER
Subject area: Speech and Hearing
2025Volume E108.DIssue 11 Pages
1421-1426
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: April 23, 2025
JOURNAL
FREE ACCESS
Multi-resolution spectrum feature analysis has demonstrated superior performance over traditional single-resolution methods in speech enhancement. However, previous multi-resolution-based methods typically have limited use of multi-resolution features, and some suffer from high model complexity. In this paper, we propose a more lightweight method that fully leverages the multi-resolution spectrum features. Our approach is based on a convolutional recurrent network (CRN) and employs a low-complexity multi-resolution spectrum fusion (MRSF) block to handle and fuse multi-resolution noisy spectrum information. We also improve the existing encoder-decoder structure, enabling the model to extract and analyze multi-resolution features more effectively. Furthermore, we adopt the short-time discrete cosine transform (STDCT) for time-frequency transformation, avoiding the phase estimation problem. To optimize our model, we design a multi-resolution STDCT loss function. Experiments demonstrate that the proposed multi-resolution STDCT-based CRN (MRCRN) achieves excellent performance and outperforms current advanced systems.
View full abstract
-
Huayang HAN, Yundong LI, Menglong WU
Article type: LETTER
Subject area: Image Recognition, Computer Vision
2025Volume E108.DIssue 11 Pages
1427-1430
Published: November 01, 2025
Released on J-STAGE: November 01, 2025
Advance online publication: April 22, 2025
JOURNAL
FREE ACCESS
Building damage assessment (BDA) plays a crucial role in accelerating humanitarian relief efforts during natural disasters. Recent studies have shown that the state-space model-based Mamba architecture exhibits significant performance across various natural language processing tasks. In this paper, we propose a new model, OS-Mamba, which utilizes an Overall-Scan Convolution Modules (OSCM) for multidimensional global modeling of image backgrounds, enabling comprehensive capture and analysis of large spatial features from various directions, thereby enhancing the model’s understanding and performance in complex scenes. Extensive experiments on the xBD dataset demonstrate that our proposed OS-Mamba model outperforms current state-of-the-art solutions.
View full abstract