-
Kazuki SUNAGA, Keisuke SUGIURA, Hiroki MATSUTANI
Article type: PAPER
Subject area: Computer System
2025Volume E108.DIssue 10 Pages
1159-1169
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 04, 2025
JOURNAL
FREE ACCESS
Recently, graph structures have been utilized in IoT (Internet of Things) environments such as network anomaly detection, smart transportation, and smart grids. A graph embedding is a representation of a graph as a fixed-length, low-dimensional vector, which can concisely represent the characteristics of the graph. node2vec is one of the well-known algorithms for obtaining such a graph embedding by sampling neighboring nodes on a given graph using a random walk technique. However, the original node2vec algorithm relies on a conventional batch training using backpropagation algorithm. In other words, we have to retain the training data to retrain the model, which makes it unsuitable for real-world applications where the graph structure changes after the deployment. To address the changes of graph structures after the IoT devices are deployed in edge environments, this paper proposes a combination of an online sequential training algorithm and node2vec. The proposed model is implemented on an FPGA (Field-Programmable Gate Array) device for efficient sequential training. The proposed FPGA implementation achieves up to a 205.25 times speed improvement compared to the original model on an ARM Cortex-A53 CPU. We also evaluate the performance of the proposed model in the sequential training task from various perspectives. For example, evaluation results on dynamic graphs show that while the accuracy decreases in the original model, the proposed sequential model can obtain better graph embedding that achieves a higher accuracy even when the graph structure changes. In addition, the proposed FPGA implementation is evaluated in terms of the power consumption, and the results show that it significantly improves the power efficiency compared to the CPU and embedded GPU implementations.
View full abstract
-
Kei KOGAI, Yoshikazu UEDA
Article type: PAPER
Subject area: Software Engineering
2025Volume E108.DIssue 10 Pages
1170-1182
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: March 28, 2025
JOURNAL
FREE ACCESS
Information and control systems operated in the field of social infrastructure are required to enhance their quality in terms of safety and reliability, and model checking is an effective technique to validate their behavior in the design phase. Model checking generates a state transition diagram from a model of system behavior and verifies that the model satisfies a system requirement by exploring the state space. However, as the number of model attributes and attribute value combinations increases, the state space expands, leading to a state explosion that makes completing the search within a realistic time impossible. To solve this problem, methods to reduce the state space by dividing the model are commonly applied, although these methods require human judgment based on knowledge of the system and the designer’s experience. The purpose of this paper is to propose a method for partitioning behavioral models of information and control system (ICS) without relying on such judgment. The structures of ICS are represented by attributes, and the behaviors are described by rules using these attributes. The description includes attributes that are characteristics of an ICS. This method extracts dependency relationships between rules from the reference to the attribute and generates the dependency graph. The graph is partitioned by clustering into clusters corresponding to the rules, thus reducing the state space. Clustering partitions the model at points where relationships between clusters, such as rule dependencies, are sufficiently low. Modularity is used as a measure to ensure that the total number of states after partitioning is less than before. The authors will confirm the effectiveness of this method by using the ICS example to show the partitioning of the system using this method, compare the number of states in the behavior models generated from the partitioned system, and show the results of model checking using these behavior models.
View full abstract
-
Kazuhiro WADA, Masaya TSUNOKAKE, Shigeki MATSUBARA
Article type: PAPER
Subject area: Data Engineering, Web Information Systems
2025Volume E108.DIssue 10 Pages
1183-1193
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: March 28, 2025
JOURNAL
FREE ACCESS
Citations using URL (URL citations) that appear in scholarly papers can be used as an information source for the research resource search engines. In particular, the information about the types of cited resources and reasons for their citation is crucial to describe the resources and their relations in the search services. To obtain this information, previous studies proposed some methods for classifying URL citations. However, their methods trained the model using a simple fine-tuning strategy and exhibited insufficient performance. We propose a classification method using a novel intermediate task. Our method trains the model on our intermediate task of identifying whether sample pairs belong to the same class before being fine-tuned on the target task. In the experiment, our method outperformed previous methods using the simple fine-tuning strategy with higher macro F-scores for different model sizes and architectures. Our analysis results indicate that the model learns the class boundaries of the target task by training our intermediate task. Our intermediate task also demonstrated higher performance and computational efficiency than an alternative intermediate task using triplet loss. Finally, we applied our method to other text classification tasks and confirmed the effectiveness when a simple fine-tuning strategy does not stably work.
View full abstract
-
Keigo WAKAYAMA, Takafumi KANAMORI
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2025Volume E108.DIssue 10 Pages
1194-1205
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: March 28, 2025
JOURNAL
FREE ACCESS
Neural architecture search (NAS) is very useful for automating the design of DNN architectures. In recent years, a number of methods for training-free NAS have been proposed, and reducing search cost has raised expectations for real-world applications. In a state-of-the-art (SOTA) training-free NAS based on theoretical background, i.e., NASI, however, the proxy for estimating the test performance of candidate architectures is based on the training error, not the generalization error. In this research, we propose a NAS based on a proxy theoretically derived from the bias-variance decomposition of the normalized generalization error, called NAS-NGE, i.e., NAS based on normalized generalization error. Specifically, we propose a surrogate of the normalized 2nd order moment of Neural Tangent Kernel (NTK) and use it together with the normalized bias to construct NAS-NGE. We use NAS Benchmarks and DARTS search space to demonstrate the effectiveness of the proposed method by comparing it to SOTA training-free NAS in a short search time.
View full abstract
-
Zhe ZHANG, Yiding WANG, Jiali CUI, Han ZHENG
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2025Volume E108.DIssue 10 Pages
1206-1216
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 02, 2025
JOURNAL
FREE ACCESS
Multimodal Emotion Recognition (MER) is a critical task in sentiment analysis. Current methods primarily focus on multimodal fusion and representation of emotions, but they fail to capture the collaborative interaction in modalities effectively. In this study, we propose an MER model with intra-modal enhancement and inter-modal interaction (IEII). Firstly, this model extracts emotion information through RoBERTa, openSMILE, and DenseNet architectures from text, audio and video modalities respectively. The model designs the Large Enhanced Kernel Attention (LEKA) module which utilizes a simplified attention mechanism with large convolutional kernels, enhances intra-modal emotional information, and aligns modalities effectively. Then the multimodal representation space is proposed, which is constructed with transformer encoders to explore inter-modal interactions. Finally, the model designs a Dual-Branch Multimodal Attention Fusion (DMAF) module based on grouped query attention and rapid attention mechanisms. The DMAF module integrates multimodal emotion representations and realizes the MER. The experimental results indicate that the model achieves superior overall accuracy and F1-scores on the IEMOCAP and MELD datasets compared to existing methods. It proved that the proposed model effectively enhances intra-modal emotional information and captures inter-modal interactions.
View full abstract
-
Ye TIAN, Mei HAN, Jinyi ZHANG
Article type: PAPER
Subject area: Image Processing and Video Processing
2025Volume E108.DIssue 10 Pages
1217-1229
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 14, 2025
JOURNAL
FREE ACCESS
This paper mainly proposes a line segment detection method based on false peak suppression and local Hough transform. It can effectively suppress the impact of noise in binary images on line segment detection performance and solve the problems of short line false detection, missing detection and over-segmentation, and acquire line segment features in pantograph-catenary panoramic images robustly. In the actual pantograph-catenary panoramic images dataset test, the comparison results of the detection accuracy of the pantograph-catenary panoramic images line segments and contact points show that the proposed method is able to reduce the missing detection while improving the detection accuracy of the pantograph-catenary contact point. Moreover, the tests on the public YorkUrban dataset also show that the proposed method in this paper achieves the optimal results in the evaluation of accuracy and processing speed, which also has a strong generalization ability in high-quality natural images.
View full abstract
-
Ruidong CHEN, Baohua QIANG, Xianyi YANG, Shihao ZHANG, Yuan XIE
Article type: PAPER
Subject area: Image Processing and Video Processing
2025Volume E108.DIssue 10 Pages
1230-1238
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 17, 2025
JOURNAL
FREE ACCESS
Image-text retrieval (ITR) aims at querying one type of data based on a given another type of data. The main challenge is mapping images and texts to a common space. Although existing methods obtain excellent performance on ITR tasks, they also have the drawbacks of weak information interaction and insufficient capture of deeper associative relationships. To address these problems, we propose CDISA: a Cross-modal Deep Interaction and Semantic Aligning method by combining vision-language pre-training model with semantic feature extraction capabilities. Specifically, we first design a cross-modal deep interaction module to enhance the interaction of image and text features by performing deep interaction matching computations. Secondly, to align the image and text features, bidirectional cosine matching is proposed to improve the differentiation of bimodal data within the feature space. We propose arguably the extensive experimental evaluation against recent state-of-the-art ITR methods on three datasets which include Wikipedia, Pascal-Sentence and NUS-WIDE.
View full abstract
-
Mami TAKEMOTO, Kousei HAYASHI, Sunao HARA, Toru YAMASHITA, Masanobu AB ...
Article type: PAPER
Subject area: Biological Engineering
2025Volume E108.DIssue 10 Pages
1239-1245
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 04, 2025
JOURNAL
FREE ACCESS
Parkinson’s disease (PD) is an intractable neurological disease that affects approximately 100-150 per 100, 000 people in Japan, with more than 95% of these patients aged 60 years or older. The Hoehn and Yahr staging scale (H-Y scale) and Unified Parkinson’s Disease Rating Scale (UPDRS) score are typical indicators of PD severity, but subjective aspects cannot be completely excluded. This is due to the limited means to quantitatively measure and evaluate gait as a motor symptom of PD. However, in recent years, human movement has been measured by wearable devices and other devices in daily life. In this paper, we took this wearable-device measurement one step further and aimed to apply insole-type pressure sensors to medical care. We collected data from two patients with PD at H-Y stage 2 and nine patients with PD at H-Y stage 3, then generated a time-series pattern representing their gait patterns. By analyzing these gait patterns, we found that the sum of the three toe sensors and the sum of the two heel sensors provided stable data. The overlap time of the two summed sensor values and the deviations of each summed sensor value were defined as features. Using the features, we proposed PD severity estimation based on the k-means method. Experimental results showed that the estimation of Yahr 3 and healthy individuals was achieved with high accuracy, but Yahr 2 was often classified as a healthy individual.
View full abstract
-
Hyunsik YOON, Yon Dohn CHUNG
Article type: LETTER
Subject area: Data Engineering, Web Information Systems
2025Volume E108.DIssue 10 Pages
1246-1249
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 17, 2025
JOURNAL
FREE ACCESS
The execution time of an Apache Spark application is heavily influenced by its configuration settings. Accordingly, Bayesian Optimization (BO) is commonly used for automated tuning, employing the acquisition function, Expected Improvement (EI). However, existing works did not compare the performance to the other acquisition functions empirically. In this paper, we show that EI may not work well for Spark applications due to a huge search space compared to the other optimization problems. In addition, we demonstrate the performance of BO based on Probability of Improvement (PI), which achieves exploration via rich random initialization and exploitation via the PI acquisition function. Through the experimental evaluations, we show that the PI-based BO outperforms the EI-based BO in both optimal time and optimization cost.
View full abstract
-
Xichang CAI, Jingxuan CHEN, Ziyi LIU, Menglong WU, HongYang GUO, Xueji ...
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2025Volume E108.DIssue 10 Pages
1250-1254
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 14, 2025
JOURNAL
FREE ACCESS
In recent years, convolutional recurrent neural networks (CRNNs) have achieved notable success in sound event detection (SED) tasks by leveraging the strengths of convolutional neural networks (CNNs) and recurrent neural networks (RNNs). However, existing models still face limitations in the temporal dimension, resulting in suboptimal temporal localization accuracy for SED. To address this issue, we designed a model called Temporal Enhanced Full-Frequency Dynamic Convolution (TEFFDConv). This model incorporates both temporal and frequency attention mechanisms with the full-dynamic convolution, enhancing the model’s ability to localize sound events at the frame level. Experimental results demonstrate that our proposed model significantly improved PSDS1 and CB-F1 and IB-F1, marking a notable advancement compared to similar methods. Additionally, the PSDS2 also showed improvements over most methods. These results show the superior performance of our proposed method in enhancing temporal localization, while also demonstrating the better performance in event classification.
View full abstract
-
Rihito SHODA, Seiji MIYOSHI
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2025Volume E108.DIssue 10 Pages
1255-1259
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 17, 2025
JOURNAL
FREE ACCESS
Anomaly detection is essential in a wide range of fields. In this study, we focus on an Efficient GAN applied to anomaly detection, and aim to improve its performance by random erasing data augmentation and enhancing the loss function to incorporate mapping consistency. Experiments using images of normal lemons and damaged lemons reveal that the proposed method significantly improves the anomaly detection performance of Efficient GAN.
View full abstract
-
Jialong LI, Shogo MORITA, Wei WANG, Yan ZHANG, Takuto YAMAUCHI, Kenji ...
Article type: LETTER
Subject area: Human-computer Interaction
2025Volume E108.DIssue 10 Pages
1260-1264
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: June 02, 2025
JOURNAL
FREE ACCESS
Human-robot collaboration has become increasingly complex and dynamic, highlighting the need for effective and intuitive communication. Two communication strategies for robots have been explored: (i) global-perspective strategy to share an overview of task progress, aimed at achieving consensus on completed and upcoming tasks; and (ii) local-perspective strategy to share the robot’s intent, aimed at conveying the robot’s immediate intentions and next actions. However, existing studies merely rely on the distinct focus to differentiate between the use of different strategies, lacking a deeper exploration of how these strategies affect user perceptions and responses in practice. For example, a possible concern could be which strategy is more likely to inspire human effort in collaboration. To this end, this paper conducts a user experiment (N = 15) within a collaborative cooking scenario, and provides design insights into the strengths and weaknesses of each strategy from three dimensions to inform the design of human-sensitive communication.
View full abstract
-
Lintang Matahari HASANI, Kasiyah JUNUS, Lia SADITA, Ayano OHSAKI, Tsuk ...
Article type: LETTER
Subject area: Educational Technology
2025Volume E108.DIssue 10 Pages
1265-1269
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: June 02, 2025
JOURNAL
FREE ACCESS
Learners need to progress through certain inquiry stages to experience a good online discussion. This study analyzes the discussion of two classes that received different preparation: Kit-build concept mapping (KBCM) and summary writing. By using epistemic network analysis, KBCM class showed close to ideal connectivity between the inquiry stages.
View full abstract
-
Mingyang XU, Ao ZHAN, Chengyu WU, Zhengqiang WANG
Article type: LETTER
Subject area: Pattern Recognition
2025Volume E108.DIssue 10 Pages
1270-1274
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 02, 2025
JOURNAL
FREE ACCESS
Recognizing fatigue drivers is essential for improving road safety. Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have been applied to identify the state of drivers. However, these models frequently encounter various challenges, including a vast number of parameters and low detection effectiveness. To address these challenges, we propose Dual-Lightweight-Swin-Transformer (DLS) for driver drowsiness detection. We also propose the Spatial-Temporal Fusion Model (STFM) and Global Saliency Fusion Model (GSFM), where STFM fuses the spatial-temporal features and GSFM fuses the features from different layers of STFM to enhance detection efficiency. Simulation results show that DLS increases accuracy by 0.33% and reduces the computational complexity by 49.3%. The running time per test epoch of DLS is reduced by 33.1%.
View full abstract
-
ZiYi CHEN, XiaoRan HAO, Ming CHEN, Mao NI, Yi Heng ZHANG
Article type: LETTER
Subject area: Image Recognition, Computer Vision
2025Volume E108.DIssue 10 Pages
1275-1278
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 14, 2025
JOURNAL
FREE ACCESS
Object detection from a UAV perspective faces challenges such as object occlusion, unclear boundaries, and small target sizes, resulting in reduced detection accuracy. Additionally, traditional object detection algorithms have large parameter counts, making them unsuitable for resource-constrained edge devices. To address this issue, we propose a lightweight small-object detection algorithm: Sky-YOLO. Specifically, we introduce the MSFConv multi-scale feature map fusion convolution module into the backbone network to enhance feature extraction capability. The Neck part is replaced with the L-BiFPN module to reduce parameter count and strengthen feature fusion between layers. Additionally, based on the characteristics of UAV imagery, we incorporate the WIOU loss function, enabling efficient detection of blurred and occluded targets. Experimental results show that the Sky-YOLO model, with 60% fewer parameters than the original model, still achieves 39.7% accuracy on the VisDrone2019 validation dataset, a 6.7% improvement in accuracy over the original model.
View full abstract
-
Duc-Dung NGUYEN
Article type: LETTER
Subject area: Image Recognition, Computer Vision
2025Volume E108.DIssue 10 Pages
1279-1282
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 17, 2025
JOURNAL
FREE ACCESS
Compared to general object detection problems, the detection of mathematical expressions (MED) in document images has its own challenges, like the small size of inline formulas, the rich set of mathematical symbols, and the similarity between variables and normal text characters. To deal with those challenges, we transform the multi-class MED task into a multi-label semantic segmentation problem. With a basic encoder-decoder structure of 3.9 million parameters and trained from scratch, our proposed MEDNet model can achieve top detection performance on three public datasets: TFD2019, Marmot, and IBEM2021. MEDNet is especially effective in detecting small formulas when achieving the F1 score of 95.40% for the inline and 95.82% for all expressions on the test set of the IBEM2021 competition data.
View full abstract
-
Zewen QIAN, Zhezhe HAN, Haoran JIANG, Ziyi ZHANG, Mohan ZHANG, Hao MA, ...
Article type: LETTER
Subject area: Biocybernetics, Neurocomputing
2025Volume E108.DIssue 10 Pages
1283-1286
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: June 02, 2025
JOURNAL
FREE ACCESS
Identifying the combustion conditions in power-plant furnaces is crucial for optimizing combustion efficiency and reducing pollutant emissions. Traditional image-processing methods heavily rely on prior empirical knowledge, limiting their ability to comprehensively extract features from flame images. To address these deficiencies, this study proposed a novel approach for combustion condition identification through flame imaging and a convolutional autoencoder (CAE). In this approach, the flame images are first preprocessed, then the CAE is established to extract the deep features of the flame image, and finally the Softmax classifier is employed to determine the combustion conditions. Experimental research is carried out on a 600MW opposed wall boiler, and the effectiveness of the proposed method is evaluated using captured flame images. Results demonstrate that the proposed CAE-Softmax model achieves an identification accuracy of 98.2% under the investigated combustion conditions, significantly outperforming traditional models. These findings reveal the method feasibility, offering an intelligent and efficient solution for enhancing the operational performance of power-plant boilers.
View full abstract
-
Jinsoo SEO
Article type: LETTER
Subject area: Music Information Processing
2025Volume E108.DIssue 10 Pages
1287-1291
Published: October 01, 2025
Released on J-STAGE: October 01, 2025
Advance online publication: April 07, 2025
JOURNAL
FREE ACCESS
In many applications, data imbalance and annotation challenges limit the size of training datasets, hindering the ability of deep neural networks to fully leverage their representational capacity. Data augmentation is a widely used countermeasure that generates additional training samples by manipulating existing data. This paper investigates spectral-domain data augmentation methods specifically for cover song identification task, enabling on-the-fly augmentation with minimal computational overhead. We explore various spectral modifications and mixing techniques, applying them directly in the frequency domain, and evaluate their effectiveness on two cover song identification datasets. Among the augmentation methods tested, a mixing approach involving cut-and-paste operations in the spectral domain achieved the highest accuracy, demonstrating the potential of spectral augmentations to enhance the performance of neural networks for cover song identification.
View full abstract