IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Current issue
Displaying 1-18 of 18 articles from this issue
Regular Section
  • Kazuki SUNAGA, Keisuke SUGIURA, Hiroki MATSUTANI
    Article type: PAPER
    Subject area: Computer System
    2025Volume E108.DIssue 10 Pages 1159-1169
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 04, 2025
    JOURNAL FREE ACCESS

    Recently, graph structures have been utilized in IoT (Internet of Things) environments such as network anomaly detection, smart transportation, and smart grids. A graph embedding is a representation of a graph as a fixed-length, low-dimensional vector, which can concisely represent the characteristics of the graph. node2vec is one of the well-known algorithms for obtaining such a graph embedding by sampling neighboring nodes on a given graph using a random walk technique. However, the original node2vec algorithm relies on a conventional batch training using backpropagation algorithm. In other words, we have to retain the training data to retrain the model, which makes it unsuitable for real-world applications where the graph structure changes after the deployment. To address the changes of graph structures after the IoT devices are deployed in edge environments, this paper proposes a combination of an online sequential training algorithm and node2vec. The proposed model is implemented on an FPGA (Field-Programmable Gate Array) device for efficient sequential training. The proposed FPGA implementation achieves up to a 205.25 times speed improvement compared to the original model on an ARM Cortex-A53 CPU. We also evaluate the performance of the proposed model in the sequential training task from various perspectives. For example, evaluation results on dynamic graphs show that while the accuracy decreases in the original model, the proposed sequential model can obtain better graph embedding that achieves a higher accuracy even when the graph structure changes. In addition, the proposed FPGA implementation is evaluated in terms of the power consumption, and the results show that it significantly improves the power efficiency compared to the CPU and embedded GPU implementations.

    Download PDF (3001K)
  • Kei KOGAI, Yoshikazu UEDA
    Article type: PAPER
    Subject area: Software Engineering
    2025Volume E108.DIssue 10 Pages 1170-1182
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: March 28, 2025
    JOURNAL FREE ACCESS

    Information and control systems operated in the field of social infrastructure are required to enhance their quality in terms of safety and reliability, and model checking is an effective technique to validate their behavior in the design phase. Model checking generates a state transition diagram from a model of system behavior and verifies that the model satisfies a system requirement by exploring the state space. However, as the number of model attributes and attribute value combinations increases, the state space expands, leading to a state explosion that makes completing the search within a realistic time impossible. To solve this problem, methods to reduce the state space by dividing the model are commonly applied, although these methods require human judgment based on knowledge of the system and the designer’s experience. The purpose of this paper is to propose a method for partitioning behavioral models of information and control system (ICS) without relying on such judgment. The structures of ICS are represented by attributes, and the behaviors are described by rules using these attributes. The description includes attributes that are characteristics of an ICS. This method extracts dependency relationships between rules from the reference to the attribute and generates the dependency graph. The graph is partitioned by clustering into clusters corresponding to the rules, thus reducing the state space. Clustering partitions the model at points where relationships between clusters, such as rule dependencies, are sufficiently low. Modularity is used as a measure to ensure that the total number of states after partitioning is less than before. The authors will confirm the effectiveness of this method by using the ICS example to show the partitioning of the system using this method, compare the number of states in the behavior models generated from the partitioned system, and show the results of model checking using these behavior models.

    Download PDF (1451K)
  • Kazuhiro WADA, Masaya TSUNOKAKE, Shigeki MATSUBARA
    Article type: PAPER
    Subject area: Data Engineering, Web Information Systems
    2025Volume E108.DIssue 10 Pages 1183-1193
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: March 28, 2025
    JOURNAL FREE ACCESS

    Citations using URL (URL citations) that appear in scholarly papers can be used as an information source for the research resource search engines. In particular, the information about the types of cited resources and reasons for their citation is crucial to describe the resources and their relations in the search services. To obtain this information, previous studies proposed some methods for classifying URL citations. However, their methods trained the model using a simple fine-tuning strategy and exhibited insufficient performance. We propose a classification method using a novel intermediate task. Our method trains the model on our intermediate task of identifying whether sample pairs belong to the same class before being fine-tuned on the target task. In the experiment, our method outperformed previous methods using the simple fine-tuning strategy with higher macro F-scores for different model sizes and architectures. Our analysis results indicate that the model learns the class boundaries of the target task by training our intermediate task. Our intermediate task also demonstrated higher performance and computational efficiency than an alternative intermediate task using triplet loss. Finally, we applied our method to other text classification tasks and confirmed the effectiveness when a simple fine-tuning strategy does not stably work.

    Download PDF (3218K)
  • Keigo WAKAYAMA, Takafumi KANAMORI
    Article type: PAPER
    Subject area: Artificial Intelligence, Data Mining
    2025Volume E108.DIssue 10 Pages 1194-1205
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: March 28, 2025
    JOURNAL FREE ACCESS

    Neural architecture search (NAS) is very useful for automating the design of DNN architectures. In recent years, a number of methods for training-free NAS have been proposed, and reducing search cost has raised expectations for real-world applications. In a state-of-the-art (SOTA) training-free NAS based on theoretical background, i.e., NASI, however, the proxy for estimating the test performance of candidate architectures is based on the training error, not the generalization error. In this research, we propose a NAS based on a proxy theoretically derived from the bias-variance decomposition of the normalized generalization error, called NAS-NGE, i.e., NAS based on normalized generalization error. Specifically, we propose a surrogate of the normalized 2nd order moment of Neural Tangent Kernel (NTK) and use it together with the normalized bias to construct NAS-NGE. We use NAS Benchmarks and DARTS search space to demonstrate the effectiveness of the proposed method by comparing it to SOTA training-free NAS in a short search time.

    Download PDF (1571K)
  • Zhe ZHANG, Yiding WANG, Jiali CUI, Han ZHENG
    Article type: PAPER
    Subject area: Artificial Intelligence, Data Mining
    2025Volume E108.DIssue 10 Pages 1206-1216
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 02, 2025
    JOURNAL FREE ACCESS

    Multimodal Emotion Recognition (MER) is a critical task in sentiment analysis. Current methods primarily focus on multimodal fusion and representation of emotions, but they fail to capture the collaborative interaction in modalities effectively. In this study, we propose an MER model with intra-modal enhancement and inter-modal interaction (IEII). Firstly, this model extracts emotion information through RoBERTa, openSMILE, and DenseNet architectures from text, audio and video modalities respectively. The model designs the Large Enhanced Kernel Attention (LEKA) module which utilizes a simplified attention mechanism with large convolutional kernels, enhances intra-modal emotional information, and aligns modalities effectively. Then the multimodal representation space is proposed, which is constructed with transformer encoders to explore inter-modal interactions. Finally, the model designs a Dual-Branch Multimodal Attention Fusion (DMAF) module based on grouped query attention and rapid attention mechanisms. The DMAF module integrates multimodal emotion representations and realizes the MER. The experimental results indicate that the model achieves superior overall accuracy and F1-scores on the IEMOCAP and MELD datasets compared to existing methods. It proved that the proposed model effectively enhances intra-modal emotional information and captures inter-modal interactions.

    Download PDF (5847K)
  • Ye TIAN, Mei HAN, Jinyi ZHANG
    Article type: PAPER
    Subject area: Image Processing and Video Processing
    2025Volume E108.DIssue 10 Pages 1217-1229
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 14, 2025
    JOURNAL FREE ACCESS

    This paper mainly proposes a line segment detection method based on false peak suppression and local Hough transform. It can effectively suppress the impact of noise in binary images on line segment detection performance and solve the problems of short line false detection, missing detection and over-segmentation, and acquire line segment features in pantograph-catenary panoramic images robustly. In the actual pantograph-catenary panoramic images dataset test, the comparison results of the detection accuracy of the pantograph-catenary panoramic images line segments and contact points show that the proposed method is able to reduce the missing detection while improving the detection accuracy of the pantograph-catenary contact point. Moreover, the tests on the public YorkUrban dataset also show that the proposed method in this paper achieves the optimal results in the evaluation of accuracy and processing speed, which also has a strong generalization ability in high-quality natural images.

    Download PDF (20911K)
  • Ruidong CHEN, Baohua QIANG, Xianyi YANG, Shihao ZHANG, Yuan XIE
    Article type: PAPER
    Subject area: Image Processing and Video Processing
    2025Volume E108.DIssue 10 Pages 1230-1238
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 17, 2025
    JOURNAL FREE ACCESS

    Image-text retrieval (ITR) aims at querying one type of data based on a given another type of data. The main challenge is mapping images and texts to a common space. Although existing methods obtain excellent performance on ITR tasks, they also have the drawbacks of weak information interaction and insufficient capture of deeper associative relationships. To address these problems, we propose CDISA: a Cross-modal Deep Interaction and Semantic Aligning method by combining vision-language pre-training model with semantic feature extraction capabilities. Specifically, we first design a cross-modal deep interaction module to enhance the interaction of image and text features by performing deep interaction matching computations. Secondly, to align the image and text features, bidirectional cosine matching is proposed to improve the differentiation of bimodal data within the feature space. We propose arguably the extensive experimental evaluation against recent state-of-the-art ITR methods on three datasets which include Wikipedia, Pascal-Sentence and NUS-WIDE.

    Download PDF (10378K)
  • Mami TAKEMOTO, Kousei HAYASHI, Sunao HARA, Toru YAMASHITA, Masanobu AB ...
    Article type: PAPER
    Subject area: Biological Engineering
    2025Volume E108.DIssue 10 Pages 1239-1245
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 04, 2025
    JOURNAL FREE ACCESS

    Parkinson’s disease (PD) is an intractable neurological disease that affects approximately 100-150 per 100, 000 people in Japan, with more than 95% of these patients aged 60 years or older. The Hoehn and Yahr staging scale (H-Y scale) and Unified Parkinson’s Disease Rating Scale (UPDRS) score are typical indicators of PD severity, but subjective aspects cannot be completely excluded. This is due to the limited means to quantitatively measure and evaluate gait as a motor symptom of PD. However, in recent years, human movement has been measured by wearable devices and other devices in daily life. In this paper, we took this wearable-device measurement one step further and aimed to apply insole-type pressure sensors to medical care. We collected data from two patients with PD at H-Y stage 2 and nine patients with PD at H-Y stage 3, then generated a time-series pattern representing their gait patterns. By analyzing these gait patterns, we found that the sum of the three toe sensors and the sum of the two heel sensors provided stable data. The overlap time of the two summed sensor values and the deviations of each summed sensor value were defined as features. Using the features, we proposed PD severity estimation based on the k-means method. Experimental results showed that the estimation of Yahr 3 and healthy individuals was achieved with high accuracy, but Yahr 2 was often classified as a healthy individual.

    Download PDF (1956K)
  • Hyunsik YOON, Yon Dohn CHUNG
    Article type: LETTER
    Subject area: Data Engineering, Web Information Systems
    2025Volume E108.DIssue 10 Pages 1246-1249
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 17, 2025
    JOURNAL FREE ACCESS

    The execution time of an Apache Spark application is heavily influenced by its configuration settings. Accordingly, Bayesian Optimization (BO) is commonly used for automated tuning, employing the acquisition function, Expected Improvement (EI). However, existing works did not compare the performance to the other acquisition functions empirically. In this paper, we show that EI may not work well for Spark applications due to a huge search space compared to the other optimization problems. In addition, we demonstrate the performance of BO based on Probability of Improvement (PI), which achieves exploration via rich random initialization and exploitation via the PI acquisition function. Through the experimental evaluations, we show that the PI-based BO outperforms the EI-based BO in both optimal time and optimization cost.

    Download PDF (1097K)
  • Xichang CAI, Jingxuan CHEN, Ziyi LIU, Menglong WU, HongYang GUO, Xueji ...
    Article type: LETTER
    Subject area: Artificial Intelligence, Data Mining
    2025Volume E108.DIssue 10 Pages 1250-1254
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 14, 2025
    JOURNAL FREE ACCESS

    In recent years, convolutional recurrent neural networks (CRNNs) have achieved notable success in sound event detection (SED) tasks by leveraging the strengths of convolutional neural networks (CNNs) and recurrent neural networks (RNNs). However, existing models still face limitations in the temporal dimension, resulting in suboptimal temporal localization accuracy for SED. To address this issue, we designed a model called Temporal Enhanced Full-Frequency Dynamic Convolution (TEFFDConv). This model incorporates both temporal and frequency attention mechanisms with the full-dynamic convolution, enhancing the model’s ability to localize sound events at the frame level. Experimental results demonstrate that our proposed model significantly improved PSDS1 and CB-F1 and IB-F1, marking a notable advancement compared to similar methods. Additionally, the PSDS2 also showed improvements over most methods. These results show the superior performance of our proposed method in enhancing temporal localization, while also demonstrating the better performance in event classification.

    Download PDF (681K)
  • Rihito SHODA, Seiji MIYOSHI
    Article type: LETTER
    Subject area: Artificial Intelligence, Data Mining
    2025Volume E108.DIssue 10 Pages 1255-1259
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 17, 2025
    JOURNAL FREE ACCESS

    Anomaly detection is essential in a wide range of fields. In this study, we focus on an Efficient GAN applied to anomaly detection, and aim to improve its performance by random erasing data augmentation and enhancing the loss function to incorporate mapping consistency. Experiments using images of normal lemons and damaged lemons reveal that the proposed method significantly improves the anomaly detection performance of Efficient GAN.

    Download PDF (3148K)
  • Jialong LI, Shogo MORITA, Wei WANG, Yan ZHANG, Takuto YAMAUCHI, Kenji ...
    Article type: LETTER
    Subject area: Human-computer Interaction
    2025Volume E108.DIssue 10 Pages 1260-1264
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: June 02, 2025
    JOURNAL FREE ACCESS

    Human-robot collaboration has become increasingly complex and dynamic, highlighting the need for effective and intuitive communication. Two communication strategies for robots have been explored: (i) global-perspective strategy to share an overview of task progress, aimed at achieving consensus on completed and upcoming tasks; and (ii) local-perspective strategy to share the robot’s intent, aimed at conveying the robot’s immediate intentions and next actions. However, existing studies merely rely on the distinct focus to differentiate between the use of different strategies, lacking a deeper exploration of how these strategies affect user perceptions and responses in practice. For example, a possible concern could be which strategy is more likely to inspire human effort in collaboration. To this end, this paper conducts a user experiment (N = 15) within a collaborative cooking scenario, and provides design insights into the strengths and weaknesses of each strategy from three dimensions to inform the design of human-sensitive communication.

    Download PDF (884K)
  • Lintang Matahari HASANI, Kasiyah JUNUS, Lia SADITA, Ayano OHSAKI, Tsuk ...
    Article type: LETTER
    Subject area: Educational Technology
    2025Volume E108.DIssue 10 Pages 1265-1269
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: June 02, 2025
    JOURNAL FREE ACCESS

    Learners need to progress through certain inquiry stages to experience a good online discussion. This study analyzes the discussion of two classes that received different preparation: Kit-build concept mapping (KBCM) and summary writing. By using epistemic network analysis, KBCM class showed close to ideal connectivity between the inquiry stages.

    Download PDF (510K)
  • Mingyang XU, Ao ZHAN, Chengyu WU, Zhengqiang WANG
    Article type: LETTER
    Subject area: Pattern Recognition
    2025Volume E108.DIssue 10 Pages 1270-1274
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 02, 2025
    JOURNAL FREE ACCESS

    Recognizing fatigue drivers is essential for improving road safety. Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have been applied to identify the state of drivers. However, these models frequently encounter various challenges, including a vast number of parameters and low detection effectiveness. To address these challenges, we propose Dual-Lightweight-Swin-Transformer (DLS) for driver drowsiness detection. We also propose the Spatial-Temporal Fusion Model (STFM) and Global Saliency Fusion Model (GSFM), where STFM fuses the spatial-temporal features and GSFM fuses the features from different layers of STFM to enhance detection efficiency. Simulation results show that DLS increases accuracy by 0.33% and reduces the computational complexity by 49.3%. The running time per test epoch of DLS is reduced by 33.1%.

    Download PDF (4092K)
  • ZiYi CHEN, XiaoRan HAO, Ming CHEN, Mao NI, Yi Heng ZHANG
    Article type: LETTER
    Subject area: Image Recognition, Computer Vision
    2025Volume E108.DIssue 10 Pages 1275-1278
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 14, 2025
    JOURNAL FREE ACCESS

    Object detection from a UAV perspective faces challenges such as object occlusion, unclear boundaries, and small target sizes, resulting in reduced detection accuracy. Additionally, traditional object detection algorithms have large parameter counts, making them unsuitable for resource-constrained edge devices. To address this issue, we propose a lightweight small-object detection algorithm: Sky-YOLO. Specifically, we introduce the MSFConv multi-scale feature map fusion convolution module into the backbone network to enhance feature extraction capability. The Neck part is replaced with the L-BiFPN module to reduce parameter count and strengthen feature fusion between layers. Additionally, based on the characteristics of UAV imagery, we incorporate the WIOU loss function, enabling efficient detection of blurred and occluded targets. Experimental results show that the Sky-YOLO model, with 60% fewer parameters than the original model, still achieves 39.7% accuracy on the VisDrone2019 validation dataset, a 6.7% improvement in accuracy over the original model.

    Download PDF (1300K)
  • Duc-Dung NGUYEN
    Article type: LETTER
    Subject area: Image Recognition, Computer Vision
    2025Volume E108.DIssue 10 Pages 1279-1282
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 17, 2025
    JOURNAL FREE ACCESS

    Compared to general object detection problems, the detection of mathematical expressions (MED) in document images has its own challenges, like the small size of inline formulas, the rich set of mathematical symbols, and the similarity between variables and normal text characters. To deal with those challenges, we transform the multi-class MED task into a multi-label semantic segmentation problem. With a basic encoder-decoder structure of 3.9 million parameters and trained from scratch, our proposed MEDNet model can achieve top detection performance on three public datasets: TFD2019, Marmot, and IBEM2021. MEDNet is especially effective in detecting small formulas when achieving the F1 score of 95.40% for the inline and 95.82% for all expressions on the test set of the IBEM2021 competition data.

    Download PDF (1350K)
  • Zewen QIAN, Zhezhe HAN, Haoran JIANG, Ziyi ZHANG, Mohan ZHANG, Hao MA, ...
    Article type: LETTER
    Subject area: Biocybernetics, Neurocomputing
    2025Volume E108.DIssue 10 Pages 1283-1286
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: June 02, 2025
    JOURNAL FREE ACCESS

    Identifying the combustion conditions in power-plant furnaces is crucial for optimizing combustion efficiency and reducing pollutant emissions. Traditional image-processing methods heavily rely on prior empirical knowledge, limiting their ability to comprehensively extract features from flame images. To address these deficiencies, this study proposed a novel approach for combustion condition identification through flame imaging and a convolutional autoencoder (CAE). In this approach, the flame images are first preprocessed, then the CAE is established to extract the deep features of the flame image, and finally the Softmax classifier is employed to determine the combustion conditions. Experimental research is carried out on a 600MW opposed wall boiler, and the effectiveness of the proposed method is evaluated using captured flame images. Results demonstrate that the proposed CAE-Softmax model achieves an identification accuracy of 98.2% under the investigated combustion conditions, significantly outperforming traditional models. These findings reveal the method feasibility, offering an intelligent and efficient solution for enhancing the operational performance of power-plant boilers.

    Download PDF (1249K)
  • Jinsoo SEO
    Article type: LETTER
    Subject area: Music Information Processing
    2025Volume E108.DIssue 10 Pages 1287-1291
    Published: October 01, 2025
    Released on J-STAGE: October 01, 2025
    Advance online publication: April 07, 2025
    JOURNAL FREE ACCESS

    In many applications, data imbalance and annotation challenges limit the size of training datasets, hindering the ability of deep neural networks to fully leverage their representational capacity. Data augmentation is a widely used countermeasure that generates additional training samples by manipulating existing data. This paper investigates spectral-domain data augmentation methods specifically for cover song identification task, enabling on-the-fly augmentation with minimal computational overhead. We explore various spectral modifications and mixing techniques, applying them directly in the frequency domain, and evaluate their effectiveness on two cover song identification datasets. Among the augmentation methods tested, a mixing approach involving cut-and-paste operations in the spectral domain achieved the highest accuracy, demonstrating the potential of spectral augmentations to enhance the performance of neural networks for cover song identification.

    Download PDF (166K)
feedback
Top