IEICE Transactions on Information and Systems

Special Section on Human Communication VI

FOREWORD

Kazuaki KONDO

2025 Volume E108.D Issue 6 Pages 444
Published: June 01, 2025
Released on J-STAGE: June 01, 2025

DOIhttps://doi.org/10.1587/transinf.2024HCF0001

JOURNAL FREE ACCESS

Download PDF (277K)
Multimodal Voice Activity Projection for Turn-Taking and Effects on Speaker Adaptation

Kazuyo ONISHI, Hiroki TANAKA, Satoshi NAKAMURA

Article type: PAPER
2025 Volume E108.D Issue 6 Pages 445-453
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: November 13, 2024

DOIhttps://doi.org/10.1587/transinf.2024HCP0002

JOURNAL FREE ACCESS

Show abstractHide abstract

The prediction of utterances in two-party conversations is a crucial technology for realizing natural turn-taking between humans and virtual agents. Recently, Voice Activity Projection (VAP) models, capable of a unified approach to various turn-taking events, have gained attention. This study investigates the incorporation of non-verbal features to enhance the performance of VAP models. Our results indicate that the integration of non-verbal features leads to significantly better performance in the VAP models, particularly in aspects of turn-shift prediction, overlap prediction, and backchannel prediction. Moreover, we explored the performance of VAP models using only single-speaker features, targeting their implementation in virtual agents. The findings demonstrate the feasibility of adequately predicting turn-taking from the user to the spoken dialogue system. The study also outlines the potential for further performance enhancement by integrating a variety of language and non-verbal features.

View full abstract

Download PDF (1940K)
Effects of Color Discrimination of Perceived Objects by Cognitive Speech Acts on the Reaction Time

Masatoshi YAMADA, Ryosuke TAKATA

Article type: PAPER
2025 Volume E108.D Issue 6 Pages 454-464
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: March 07, 2025

DOIhttps://doi.org/10.1587/transinf.2024HCP0004

JOURNAL FREE ACCESS

Show abstractHide abstract

Since ancient times, many masters have warned against using words when performing skills because it can cause a momentary delay. Also, there have been many reports that words are related to perceptions from the perspective of cognitive science, and it is equally clear that conscious word processing affects movement judgments. On the other hand, the effect of word usage during motion of skills on the reaction has not been fully clarified. The purpose of this study is to empirically verify the effects of color discrimination of perceptual objects by cognitive speech acts on the reaction time based on a perceptual reaction test and subjects’ kinesthetic impressions. As a method, under the setting where the control task (CT) was defined as a saying “yes (hai)” regardless of whether a red or blue circle was displayed on the screen, whereas the target task (TT) was defined as a saying “red (aka)” when a red circle was displayed and “blue (ao)” when a blue circle was displayed, 30 subjects were instructed to click the space key as quickly as possible only when a red circle was displayed to verify the differences of reaction time. In addition, as an exploratory approach, the brain activity of the prefrontal cortex was measured using a portable brain activity measurement device. As a result, to verify the equivalence of the accuracy rates for both tasks using a two one-sided tests was within the range of the pre-set equivalence threshold (Δ = ±5%), indicating that the difficulty levels of both tasks were equivalent. Next, the result of an analysis of the reaction times using a paired t-test showed a significant difference (at the 1% level) between CT and TT (t(29) = 5.71, p < .001). And regarding the number of people who answered the question about the kinesthetic impression of the speed of clicking the space key were analyzed using the chi-square goodness-of-fit test, the results showed that the distribution of the responses was statistically equal (χ²(2) = 1.40, p = .497). In addition, exploratory analysis of brain activity showed that the right prefrontal cortex was significantly more active in CT than in TT (t(29) = 2.22, p = .035). The discussion suggested that TT compared to CT involved a judgment for color discrimination of perceptual objects and had slower reaction times due to the additional cognitive processes. The results that the subjective kinesthetic impressions and the objective movement do not always match were also demonstrated, suggesting that the difference in speech act which the subjects are not aware of during movement may affect performance of skill. The brain activity of the prefrontal cortex examined was suggested that the rhythm of speech in CT was related to the activation of the right prefrontal cortex, which is responsible for controlling information for decision-making.

View full abstract

Download PDF (8034K)
Detecting Praising Behaviors Based on Multimodal Information

Toshiki ONISHI, Asahi OGUSHI, Ryo ISHII, Akihiro MIYATA

Article type: PAPER
2025 Volume E108.D Issue 6 Pages 465-473
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: January 23, 2025

DOIhttps://doi.org/10.1587/transinf.2024HCP0008

JOURNAL FREE ACCESS

Show abstractHide abstract

Praising behavior is an important part of human communication. However, people who are unfamiliar with often praising have difficulty improving their praising skills. To solve this problem, we aim to construct a system for evaluating praising skills. So far, we have attempted to predict the degree of praising skills from verbal and nonverbal behaviors. However, our previous studies were focused on scenes in which the praiser was actually praising, and we have not dealt with scenes in which the praiser was not praising. In this paper, we attempt to detect whether the praiser is actually praising the receiver by including scenes in the study in which the praiser is not praising the receiver. First, we extract features related to the verbal and nonverbal behaviors of the praiser and receiver. Second, we construct machine learning models that utilize these features to detect whether or not the praiser is actually praising the receiver. Our results show that the machine learning model utilizing the acoustic and embedding-based linguistic behaviors of the praiser and the visual and acoustic behaviors of the receiver has the best detection performance.

View full abstract

Download PDF (1119K)
Online Communication Environment Design for Encouraging Reciprocal Use of Information among Groups

Masanari ICHIKAWA, Yugo TAKEUCHI

Article type: PAPER
2025 Volume E108.D Issue 6 Pages 474-483
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 19, 2024

DOIhttps://doi.org/10.1587/transinf.2024HCP0010

JOURNAL FREE ACCESS

Show abstractHide abstract

In face-to-face settings such as meeting rooms or classrooms, it is possible to have multiple dialogical spaces within one place. This provides an opportunity to easily access others’ opinions and ideas through dialogues with nearby individuals, contributing to increased creativity and understanding in group discussions for problem-solving within the same venue. However, in many online communication environments, due to their structure, opportunities to utilize external dialogues and information are lost. This poses a challenge as discussions often rely solely on the abilities and resources within each group for problem-solving. This study conducted an experiment in which participants generated ideas in a face-to-face situation that simulated an online interactive environment to clarify whether allowing groups of problem solvers to observe each other improves the problem-solving performance of the groups. The results show that there was interaction as sharing ideas among the groups; however, the efficiency of idea generation at the individual group level did not improve depending on the amount of bystanding opportunities. Additionally, in the condition where the opportunity to observe other groups was temporary, there was a tendency for the ideas generated by the entire group to become more similar compared to the condition where the opportunity was constantly available.

View full abstract

Download PDF (5372K)

Special Section on the Architectures, Protocols, and Applications for the Future Internet

FOREWORD

Yuuichi TERANISHI

2025 Volume E108.D Issue 6 Pages 484
Published: June 01, 2025
Released on J-STAGE: June 01, 2025

DOIhttps://doi.org/10.1587/transinf.2024NTF0001

JOURNAL FREE ACCESS

Download PDF (227K)
Leveraging Heterogeneous Programmable Data Planes for Security and Privacy of Cellular Networks, 5G & Beyond

Toru HASEGAWA, Yuki KOIZUMI, Junji TAKEMASA, Jun KURIHARA, Toshiaki TA ...

Article type: INVITED PAPER
2025 Volume E108.D Issue 6 Pages 485-493
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: November 21, 2024

DOIhttps://doi.org/10.1587/transinf.2024NTI0001

JOURNAL FREE ACCESS

Show abstractHide abstract

Cellular networks have become a critical part of our networking infrastructure, enabling ubiquitous communication. However, they are likely to be under threat, and can also be the vehicle through which cellular-connected end-systems can be subject to attacks. This paper introduces our efforts to leverage data plane devices such as programmable network interface cards, switches, and end-hosts to efficiently detect attacks and ensure user privacy at terabit per second speeds. Specifically, our project designs a heterogeneous data plane framework that cohesively combines multiple data plane devices, and designs two security solutions on the framework: security monitoring and privacy protection. This paper briefly introduces the goals and initial results for the two solutions.

View full abstract

Download PDF (2030K)
BMR: a New BBR Algorithm with Moderate Delivery Rate

Shotaro ISHIKURA, Ryosuke MINAMI, Miki YAMAMOTO

Article type: PAPER
2025 Volume E108.D Issue 6 Pages 494-504
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 10, 2024

DOIhttps://doi.org/10.1587/transinf.2024NTP0001

JOURNAL FREE ACCESS

Show abstractHide abstract

Google proposed a new congestion-based congestion control, BBR. BBR is originally rate-based congestion control, but was revealed to operate at inflight cap when BBR flows share the bottleneck link. In this paper, from our simple analysis results, we propose a new BBR algorithm, BMR (BBR with Moderate Rate) which adequately reduces transmission rate. Our simulation results show operation point of BMR is close to Kleinrock’s ideal point, not a control point of BBR with inflight cap. We also show that BMR significantly improves RTT fairness and late arrival fairness issues of BBR.

View full abstract

Download PDF (2188K)
GAMPALv2: An Anomaly Detection Mechanism for Internet Traffic by Predicting Flow Size Range from Time Features

Taku WAKUI, Fumio TERAOKA, Takao KONDO

Article type: PAPER
2025 Volume E108.D Issue 6 Pages 505-516
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: January 07, 2025

DOIhttps://doi.org/10.1587/transinf.2024NTP0002

JOURNAL FREE ACCESS

Show abstractHide abstract

To detect anomalies on an Internet backbone network, we proposed GAMPAL (General-purpose Anomaly detection Mechanism using Prefix Aggregate without Labeled data). For scalability to the number of entries in the BGP RIB (Border Gateway Protocol Routing Information Base), GAMPAL introduces PA (Prefix Aggregate). It adopts an LSTM-RNN (Long Short-Term Memory Recurrent Neural Network) as a model that focuses on the periodicity of Internet traffic patterns at a weekly scale. However, GAMPAL has three issues: (i) computational complexity, (ii) difficulty in defining detection threshold, and (iii) difficulty in detecting when and in which PA anomaly occurred. Therefore, this paper proposes GAMPALv2, which solves these problems for the practical use of GAMPAL. To solve (i), GAMPALv2 reduces the dimension of the input variables from 288 (five-minute slots in a day) to 7 by defining time features. It also adopts the RFR (Random Forest Regressor) as a prediction model. To solve (ii) and (iii), GAMPALv2 defines the predicted range based on the predicted values of the RFR and detects anomalies for each PA by comparing the predicted range with the observed value. As a result, the training and prediction time is reduced from four days using a GPU to 23 minutes using an 8-core CPU. Utilizing semantics such as date, time, and day of the week defined in the time features improves prediction accuracy. The evaluation results show that GAMPALv2 can detect anomalies in the real world, such as connection failure on YouTube, DDoS (Distributed Denial of Service) attacks, and increasing traffic due to an event. In addition, the accuracy evaluation shows that the recall is improved. Although not precisely comparable due to the different calculation methods, the average recall in the previous work is 81.8%, whereas recall improves to 93.1% in GAMPALv2.

View full abstract

Download PDF (2889K)
An Accelerated Integrity-Secured Name Resolution Architecture Using Two Full-Service Resolvers with and without DNSSEC Validation in Parallel

Yong JIN, Kazuya IGUCHI, Nariyoshi YAMAI, Rei NAKAGAWA, Toshio MURAKAM ...

Article type: PAPER
2025 Volume E108.D Issue 6 Pages 517-525
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: November 22, 2024

DOIhttps://doi.org/10.1587/transinf.2024NTP0003

JOURNAL FREE ACCESS

Show abstractHide abstract

Domain Name System (DNS) is the most widely used name resolution architecture in the current Internet and Domain Name System Security Extensions (DNSSEC) is the fundamental solution for DNS cache poisoning attacks. However, some extra overhead caused by DNSSEC mitigated its wide deployment during the last two decades and there is still no effective solution. In order to mitigate the overhead caused by DNSSEC in DNS full-service resolvers, in the literature, we proposed a terminal-based DNSSEC validation solution. The solution can help to avoid the extra overhead caused by DNSSEC validation on DNS full-service resolvers, but the results of DNSSEC validation cannot be shared among the end terminals. In the improved version, two DNS full-service resolvers, which are with and without DNSSEC validation respectively, are used in parallel in order to solve the issues. Though in the improved solution, one DNS full-service resolver (DNSSEC-enabled) can share the results among the terminals and the other full-service resolver (DNSSEC-disabled) can be used the terminal-based DNSSEC validation, the overhead issue on the end terminal still has not been solved. In this paper, we expanded the improved version by adding the functionality of terminating the slower name resolution process in order to reduce the resource consumption on end terminals. The evaluation results show that the integrity-secured name resolution works properly and the improvement of name resolution performance also has been confirmed.

View full abstract

Download PDF (2590K)
DGA-Based Malware Communication Detection from DoH Traffic Using Hierarchical Machine Learning Analysis

Rikima MITSUHASHI, Yong JIN, Katsuyoshi IIDA, Yoshiaki TAKAI

Article type: PAPER
2025 Volume E108.D Issue 6 Pages 526-534
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: November 21, 2024

DOIhttps://doi.org/10.1587/transinf.2024NTP0004

JOURNAL FREE ACCESS

Show abstractHide abstract

Encrypted domain name resolution is increasingly being used to protect the privacy of Internet users, but it may prevent network administrators from detecting malicious communications. Unfortunately, DGA-based malware can exploit it to hide the domain names it generates, so network administrators need a monitoring framework to maintain network security. In this paper, we propose a novel malware detection system using hierarchical machine learning analysis, which incorporates machine learning models, including XGBoost, LightGBM, CatBoost, and RGF. The evaluation results confirm that the proposed system can detect DGA-based malware communication generated by PadCrypt, Sisron, Tinba, and Zloader with 99.19% accuracy. The results show that the proposed system can detect DGA-based malware communications from DoH traffic with sufficient accuracy to support network administrators.

View full abstract

Download PDF (1633K)
A Component Placement Mechanism for Latency-Constrained Applications in Cloud-Edge Environments

Mudai KOBAYASHI, Mohammad Mikal Bin Amrul Halim GAN, Takahisa SEKI, Ta ...

Article type: LETTER
2025 Volume E108.D Issue 6 Pages 535-539
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 11, 2024

DOIhttps://doi.org/10.1587/transinf.2024NTL0001

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a component placement mechanism for latency-constrained applications in a distributed system comprising mobile edge and cloud datacenters. It maximizes the achievable processing rate of requests of an application while satisfying the acceptable maximum end-to-end latency to process each request, by placing the components of the application to optimal locations. We evaluated it by simulation and confirmed that it can find an optimal placement according to given situations. As a case study, we applied the mechanism to a V2X application and confirmed its effectiveness.

View full abstract

Download PDF (170K)

Regular Section

A Necessary and Sufficient Condition for Controlled Generation of Right Linear Grammars with Unknown Behaviors

Daihei ISE, Satoshi KOBAYASHI

Article type: PAPER
Subject area: Fundamentals of Information Systems
2025 Volume E108.D Issue 6 Pages 540-548
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 19, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7175

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper deals with a formal language theoretic framework for the generation of DNA nanostructures by DNA hybridization control. In order to model such processes of generating DNA linear nanostructures, Kimoto et al. proposed right linear grammars with unknown behaviors (RLUBs, for short), in which behavior of derivation is not determined completely, and only the upper bound and the lower bound of behaviors of derivations are known beforehand. In bio-lab experiments, it is required to control the generation process (chemical reaction process) in order to output only a target nanostructure even if we do not know completely the behavior of chemical reaction process. Kimoto et al. focused on the problem of controlling the generation process of RLUBs using control systems in order to output a target string using as few control devices (control symbols) as possible. However, there has been no general discussion about which language classes can be generated by the control of RLUBs. This paper deals with the general theory on finite classes of finite languages to be generated by the control of RLUBs. In particular, we consider physical properties of control devices used in control systems in a detailed manner and formulate them as quasiorders over control alphabets of control systems. We give a necessary and sufficient condition for given finite classes of finite languages to be generated by RLUBs and their control systems with the constraint of a given quasiorder over the control alphabet. The obtained results would be the first step toward the construction of the theory for answering to the question from bio-lab experiment researchers of whether the physical properties of devices affect the ability to generate nanostructures.

View full abstract

Download PDF (604K)
Performance Enhancement of the LFSR-Based Unpredictable Random Number Generator in Rocket Core

Takayoshi SHIKANO, Shuichi ICHIKAWA

Article type: PAPER
Subject area: Computer System
2025 Volume E108.D Issue 6 Pages 549-557
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 10, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7098

JOURNAL FREE ACCESS

Show abstractHide abstract

Masaoka et al. introduced an unpredictable random number generator (URNG) using a linear feedback shift register (LFSR) embedded within the CPU. Subsequent work by Kamogari and Ichikawa elucidated the LFSR requirements and the minimal essential period to pass the Diehard test. In this study we investigate a Rocket Core with a built-in LFSR, which was designed according to the results of preceding studies. By sampling the lower 32 bits of the 128-bit LFSR, a random number sequence was generated at a rate of 49.4 Mbit/s on a 50-MHz Rocket Core. The derived random sequence passed both the Diehard and NIST tests. Furthermore, we propose to replace an LFSR with a Leap-ahead LFSR, which applies its characteristic polynomial 32 times in a cycle. This improvement results in a significantly greater generation rate of 451 Mbit/s, while maintaining compliance with the Diehard and the NIST tests. The resource overhead of this URNG is negligible compared to the logic scale of the base system (LiteX/Rocket). Considering its low cost, high generation rate, high randomness quality, and ease of use, the proposed design is regarded to be a promising RNG support solution for a wide range of processors.

View full abstract

Download PDF (705K)
Enhancing GPU Performance Through Complexity-Effective Out-of-Order Execution Using Distance-Based ISA

Reoma MATSUO, Toru KOIZUMI, Hidetsugu IRIE, Shuichi SAKAI, Ryota SHIOY ...

Article type: PAPER
Subject area: Computer System
2025 Volume E108.D Issue 6 Pages 558-569
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 16, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7203

JOURNAL FREE ACCESS

Show abstractHide abstract

Graphics processing units (GPUs) have been introduced in various fields due to their high parallel computing performance. A key feature of GPUs is multi-threaded execution, where a GPU executes many threads simultaneously to hide various latencies. However, even with such multi-threaded execution, there is a limit to the number of threads that can be launched, and long latency instructions eventually stall the GPU core. While long latencies can be hidden by out-of-order execution, it requires expensive circuits such as rename logic and load-store queues and is not typically introduced on GPUs with massively multi-threaded execution. We propose the TURBULENCE architecture for very low-cost out-of-order execution on GPUs. TURBULENCE consists of a novel ISA that introduces the concept of referencing operands by inter-instruction distance instead of register numbers, and a novel microarchitecture that executes the novel ISA. This distance-based operand has the property of not causing false dependencies. By exploiting this property, we achieve complexity-effective out-of-order execution on GPUs without introducing any expensive hardware. Simulation results show that TURBULENCE improves performance by 20.4% while reducing energy consumption over an existing GPU.

View full abstract

Download PDF (2243K)
A Boosting Method Based on Center-of-Gravity Oversampling and Pruning for Classifying Imbalanced Data

Fengqi GUO, Qicheng LIU

Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2025 Volume E108.D Issue 6 Pages 570-582
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 05, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7147

JOURNAL FREE ACCESS

Show abstractHide abstract

Data imbalance frequently occurs across multiple sectors, including healthcare, security, and finance. It substantially increases the difficulty of classification. To tackle the issue of existing techniques for binary imbalanced data classification easily changing data distribution and to enhance the performance of the classifier, this paper introduces a boosting method named GAPBoost based on center-of-gravity oversampling and pruning for classifying imbalanced data in binary classification scenarios. The algorithm first clusters all instances of the minority class into k distinct clusters by utilizing K-means clustering. Then, it performs center-of-gravity oversampling on the clusters with enough instances to constitute a triangle and generates new instances using the interpolation method on the clusters containing only two minority class instances. Subsequently, pruning is employed to eliminate noisy data from both the majority and minority classes, followed by the AdaBoost algorithm to improve the performance of the classifier on the denoised training set. Ten-fold stratified cross-validation experiments of the GAPBoost algorithm and several other classic ensemble algorithms are performed on 20 benchmark unbalanced datasets using AUC, F1, and G-mean as performance evaluation criteria. The results of the experiments indicate that the GAPBoost algorithm introduced in this paper can effectively handle the classification problem of binary imbalanced datasets and outperform other ensemble algorithms in three evaluation metrics.

View full abstract

Download PDF (4753K)
A Transformer-Based Fully Trainable Point Process

Hirotaka HACHIYA, Fumiya NISHIZAWA

Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2025 Volume E108.D Issue 6 Pages 583-592
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 12, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7181

JOURNAL FREE ACCESS

Show abstractHide abstract

Using prior physical and mathematical knowledge, an appropriate intensity function should be designed when applying a point process to a real-world problem. A novel Transformer-based partially trainable model has been proposed. This model adaptively extracts a sequence feature from the past event sequence using a self-attention mechanism. However, because the feature vector is the transformed vector of the latest event and the intensity function is modeled in a handmade manner given the feature vector, the approximated intensity function and the predicted next event depend strongly on the latest event. To overcome these problems, a novel Transformer-based fully trainable point process (Transformer-FTPP) is proposed. With this model, multiple trainable vectors are transformed through an encoder-decoder Transformer architecture to extract past sequence-representative and future event candidate vectors. This facilitates the realization of an adaptive and general approximation of the intensity function and a prediction of the next event. The effectiveness of the proposed method was proved experimentally using synthetic and real-world event data.

View full abstract

Download PDF (1271K)
FusionReg: LiDAR-Camera Fusion Regression Enhancement for 3D Object Detection

Rongchun XIAO, Yuansheng LIU, Jun ZHANG, Yanliang HUANG, Xi HAN

Article type: PAPER
Subject area: Image Recognition, Computer Vision
2025 Volume E108.D Issue 6 Pages 593-603
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: November 27, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7158

JOURNAL FREE ACCESS

Show abstractHide abstract

The rapid advancement of autonomous driving has heightened safety concerns, making it essential to adopt a comprehensive approach for secure navigation. Multi-modal methods for 3D object detection play a critical role in enhancing driving safety by integrating data from various sensor types. However, existing methods face challenges, such as feature misalignment and loss, which can lead to overfitting and undermine perception performance and reliability. Building on findings that suggest excluding direct camera branch features from the regression task can improve detection performance, this paper delves deeper into the detection pipeline and introduces a novel multi-modal 3D object detection approach. The proposed approach starts with the introduction of an attention-based module designed to align features across different modalities, thereby enhancing the fusion of features through channel and spatial attention mechanisms. Additionally, an image-guided feature candidate generation strategy is employed to identify candidate regions within the fused features. These original features are then divided into two distinct branches for regression and classification tasks, which are subsequently processed by the detection heads. This approach reduces the model’s dependence on precise depth estimation from the image branch and minimizes the impact of sensor calibration errors. Experimental results validate that the proposed method delivers outstanding detection performance. Notably, our best model achieves a competitive performance of 71.0% mAP and 74.0% NDS, while demonstrating strong robustness in scenarios with missing camera data, underscoring its capability to manage complex real-world situations.

View full abstract

Download PDF (3821K)
Multi-Modal Fake News Detection Enhanced by Fine-Grained Knowledge Graph

Runlong HAO, Hui LUO, Yang LI

Article type: PAPER
Subject area: Multimedia Pattern Processing
2025 Volume E108.D Issue 6 Pages 604-614
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: November 29, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7234

JOURNAL FREE ACCESS

Show abstractHide abstract

The pervasive dissemination of multimodal fake news, which includes both textual and visual elements, significantly confuses the public. Previous studies have addressed this issue by promoting multimodal fusion for fake news detection. They focus on multimodal fusion but neglect the importance of fine-grained data analysis, the effective information contained in unimodal data is not fully mined resulting in underutilization of data, especially. At the same time, with the increase of the confusion of fake news, the lack of background knowledge also brings trouble to the fake news detection task. To address these limitations, we introduce FKGFND (Fine-grained Knowledge Graph enhanced Multi-modal Fake News Detection), a novel framework designed to make full use of uni-modal data through detailed data modeling, and introduce external knowledge to provide background knowledge and discrimination basis for the model to realize fine-grained fake news detection. Initially, we model image information at a fine-grade level, extracting embedded text and character details and integrating background knowledge about the characters. Concurrently, we construct a ternary knowledge graph to optimize the use of extracted data, featuring nodes representing embedded text, character names, and the background information. Subsequently, we augment the effectiveness of uni-modal data by enriching multimodal data integration with refined uni-modal information. To enhance the accuracy of multi-modal fake news detection, we developed FKGFND-data, a dataset founded on a fine-grained knowledge graph. Experimental evaluations indicate that FKGFND outperforms existing approaches in both multi-modal and uni-modal fake news detection tasks.

View full abstract

Download PDF (4908K)
Clinical Learning Order-Guided Deep Neural Network for Brain Tumor Segmentation

Pengfei ZHANG, Jinke WANG, Yuanzhi CHENG, Shinichi TAMURA

Article type: PAPER
Subject area: Biological Engineering
2025 Volume E108.D Issue 6 Pages 615-628
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 06, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7222

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we present an innovative multi-label local-to-global learning segmentation (MLLGL-Seg) neural network model for brain tumor segmentation. Our framework is grounded on a clinical learning order from the whole tumor (WT) to the tumor core (TC) and then to the enhanced tumor (ET). Thus, we first propose a multi-label segmentation network to embed the clinical learning sequence of WT-TC-ET during training. We then introduce a local-to-global learning algorithm and integrate it into multi-label to construct the MLLGL-Seg model. In addition, considering the hierarchical structure of the output tumor region, we further perform hierarchical consistency transformation on the network output to ensure that it complies with hierarchical constraints. The novelty of this paper lies in the proposal of the MLLGL-Seg by introducing curriculum learning based on category space into segmentation, in the construction of a learning sequence based on boundary difficulty and category similarity based on clinical experience, and in the device of learning sequences based on anti-similarity and label noise. Experimental results show that the proposed method achieves one of the most competitive results in brain tumor segmentation accuracy on three publicly available datasets.

View full abstract

Download PDF (3800K)
CLOCK-DPP: Hybrid Disk Buffer Replacement Policy for SSDs with Dirty Page Preservation for Write Intensive Environments

Jung Min LIM, Won Ho LEE, Jun-Hyeong CHOI, Jong Wook KWAK

Article type: LETTER
Subject area: Software System
2025 Volume E108.D Issue 6 Pages 629-633
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 10, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDL8065

JOURNAL FREE ACCESS

Show abstractHide abstract

As the demand for state-of-the-art solid state drive (SSD) increases, overcoming their disadvantages such as limited lifespan and asymmetric operation latency in write intensive environments becomes crucial. In this letter, we propose a hybrid disk buffer replacement policy named CLOCK with dirty page preservation (CLOCK-DPP) that manages dynamic random-access memory (DRAM) and nonvolatile memory (NVM) with separate CLOCKs to store different pages based on the operation types. By employing an additional CLOCK hand for both page eviction and migration, the CLOCK-DPP exhibited 86.62% and 47.53% larger reductions than those of existing policies in the write count of the disk buffer and block erase count of the SSD, respectively.

View full abstract

Download PDF (1011K)
Propagation-Based Code Clone Analysis for Detecting Smart Contract Vulnerability

Zhuo ZHANG, Donghui LI, Kun JIANG, Ya LI, Junhu WANG, Xiankai MENG

Article type: LETTER
Subject area: Software Engineering
2025 Volume E108.D Issue 6 Pages 634-639
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 10, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDL8079

JOURNAL FREE ACCESS

Show abstractHide abstract

Smart contracts are self-executing programs that operate on a blockchain. Once deployed, they cannot be altered, which introduces distinct maintenance challenges unlike those found in traditional software systems. Bugs and vulnerabilities in smart contracts have led to significant economic losses, drawing increased attention to their security. The immutability of smart contracts has made thorough security checks prior to deployment a priority. In this paper, we introduce a smart contract timestamp vulnerability detection technique PropaDT with propagation-based code clone analysis. The core idea of this technique involves using dataflow analysis based on an Abstract Syntax Tree (AST) to extract propagation chains that reveal how variables interact, potentially leading to vulnerabilities. Next, we extract code snippets based on the propagation chains and compare them with known vulnerability patterns in a database. This allows us to determine whether the tested smart contract contains a timestamp vulnerability, facilitating the detection of potential timestamp vulnerabilities in the code.

View full abstract

Download PDF (1628K)
Self-Supervised Neural Architecture Search for Multimodal Deep Neural Networks

Shota SUZUKI, Satoshi ONO

Article type: LETTER
Subject area: Biocybernetics, Neurocomputing
2025 Volume E108.D Issue 6 Pages 640-643
Published: June 01, 2025
Released on J-STAGE: June 01, 2025
Advance online publication: December 18, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDL8018

JOURNAL FREE ACCESS

Show abstractHide abstract

Neural architecture search (NAS), which automates the architectural design process of deep neural networks (DNN), has attracted increasing attention. Multimodal DNNs that necessitate feature fusion from multiple modalities benefit from NAS due to their structural complexity; however, constructing an architecture for multimodal DNNs through NAS requires a substantial amount of labeled training data. Thus, this paper proposes a self-supervised learning (SSL) method for architecture search of multimodal DNNs. The proposed method applies SSL comprehensively for both the architecture search and model pretraining processes. Experimental results demonstrated that the proposed method successfully designed architectures for DNNs from unlabeled training data.

View full abstract

Download PDF (404K)

Errata

Erratum: Learn Discriminative Features for Small Object Detection through Multi-Scale Image Degradation with Contrastive Learning [IEICE Transactions on Information and Systems Vol. E108.D (2025), No. 4 pp.371-383]

Xiaoguang TU, Zhi HE, Gui FU, Jianhua LIU, Mian ZHONG, Chao ZHOU, Xia ...

2025 Volume E108.D Issue 6 Pages 644_e1
Published: June 01, 2025
Released on J-STAGE: June 01, 2025

DOIhttps://doi.org/10.1587/transinf.2025EDe0001

JOURNAL FREE ACCESS

Download PDF (14K)

Register with J-STAGE for free!