IEICE Transactions on Information and Systems

Special Section on Enriched Multimedia — Media technologies supporting the digital society —

FOREWORD

Michiharu NIIMI

2025Volume E108.DIssue 4 Pages 299
Published: April 01, 2025
Released on J-STAGE: April 01, 2025

DOIhttps://doi.org/10.1587/transinf.2024MUF0001

JOURNAL FREE ACCESS

Download PDF (219K)
Deepfake Speech Detection: Approaches from Acoustic Features to Deep Neural Networks

Masashi UNOKI, Kai LI, Anuwat CHAIWONGYEN, Quoc-Huy NGUYEN, Khalid ZAM ...

Article type: INVITED PAPER
2025Volume E108.DIssue 4 Pages 300-310
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: October 07, 2024

DOIhttps://doi.org/10.1587/transinf.2024MUI0001

JOURNAL FREE ACCESS

Show abstractHide abstract

Skillfully fabricated artificial replicas of authentic media using advanced AI-based generators are known as “deepfakes.” Deepfakes have become a growing concern due to their increased distribution in cyber-physical spaces. In particular, deepfake speech, which is fabricated by using advanced AI-based speech analysis/synthesis techniques, can be abused for spoofing and tampering with authentic speech signals. This can enable attackers to commit serious offenses such as fraud by voice impersonation and unauthorized speaker verification. Our research project aims to construct the basis of auditory-media signal processing for defending against deepfake speech attacks. To this end, we introduce current challenges and state-of-the-art techniques for deepfake speech detection and examine current trends and remaining issues. We then introduce the basis of the acoustical features related to auditory perception and propose methods for detecting deepfake speech based on auditory-media signal processing consisting of these features and deep neural networks (DNNs).

View full abstract

Download PDF (2038K)
Video Watermarking Method Based on 3D U-Net Robust Against Re-Shooting

Takaharu TSUBOYAMA, Ryota TAKAHASHI, Motoi IWATA, Koichi KISE

Article type: PAPER
2025Volume E108.DIssue 4 Pages 311-319
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: October 07, 2024

DOIhttps://doi.org/10.1587/transinf.2024MUP0003

JOURNAL FREE ACCESS

Show abstractHide abstract

In recent years, digital signage has become popular as a means of information dissemination to the general public. However, unlike advertisements displayed on PCs or smartphones, it is impossible to directly acquire information displayed on such signages even if the content is interesting. Mizushima et al. proposed a video watermarking method that is robust against re-shooting so that the watermark can be extracted from watermarked videos displayed on digital signage. Conventional methods have the problem of limited information capacity. In recent years, watermarking methods based on deep learning have attracted attention for embedding large watermarks. In this paper, we implemented a video electronic watermark based on 3D U-Net, which makes it possible to embed larger watermarks than existing methods. In addition, the proposed method was able to extract the watermark from the re-shot video, and the shortest average processing time is 1.85 seconds to extract the correct watermark.

View full abstract

Download PDF (6500K)

Regular Section

Leveraging Different Boolean Function Decompositions to Reduce T-Count in LUT-Based Quantum Circuit Synthesis

David CLARINO, Naoya ASADA, Atsushi MATSUO, Shigeru YAMASHITA

Article type: PAPER
Subject area: Fundamentals of Information Systems
2025Volume E108.DIssue 4 Pages 320-329
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: October 30, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7194

JOURNAL FREE ACCESS

Show abstractHide abstract

Lookup Table (LUT) based synthesis methods have recently been proposed as a way to synthesize quantum Boolean circuits in a qubit-constrained environment. Other recent research papers have demonstrated the possibility of using relative phase quantum circuits when compute/uncompute logic is used in tandem, reducing T-count in quantum Boolean circuits in the fault-tolerant quantum computing paradigm. Because LUT-based synthesis methods use compute/uncompute pairs on ancilla qubits, this suggests that implementing the arbitrary Boolean logic that make up the individual Boolean logic network nodes in a relative phase manner could reduce the T-count. To generate such arbitrary Boolean functions, we utilize Shannon’s decomposition, Davio expansions, as well as alternating balanced and unbalanced relative phase circuits. Experimental results demonstrate that our method can reduce the T-count to an average of 24% of the existing method.

View full abstract

Download PDF (1625K)
A Fully Digital Transmitting-Receiving Platform for MIMO Radar Waveform Diversity Experiment

Wei LEI, Yue ZHANG, Hanfeng XIE, Zebin CHEN, Zengping CHEN, Weixing LI

Article type: PAPER
Subject area: Computer System
2025Volume E108.DIssue 4 Pages 330-340
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: October 30, 2024

DOIhttps://doi.org/10.1587/transinf.2023EDP7276

JOURNAL FREE ACCESS

Show abstractHide abstract

Radio Frequency (RF) transmitting-receiving platforms play important foundational roles in radar, communication, and so on. In this thesis, based on the Radio Frequency System on Chip (RFSoC), we design and develop a fully digital transmitting-receiving platform for the Multiple Input Multiple Output (MIMO) radar waveform diversity experiment. Firstly, the overall design is shown, and the implementation of each module, including multi-channel arbitrary waveform generation, multi-channel signal pre-processing, multi-channel synchronous, data forwarding and storage are elaborated in detail. Secondly, the RF signal quality evaluation methods are introduced, and the system RF performance is evaluated. The results indicate that its performance is good enough to meet radar requirements. Finally, by using a mutually orthogonal discrete frequency encoding waveform, the detection experiments for Unmanned Aerial Vehicles (UAV) are conducted, which indicates that the target is observed clearly. It verifies the effectiveness of our platform and its applicability to MIMO mode. Compared to conventional radio platforms of radar, our platform possesses many advantages. Firstly, it bears arbitrary waveform ability, and each channel is entirely independent. Secondly, it not only supports narrow and wide bands but also its sampling rates can be switched according to the bandwidth. Last but not least, it facilitates data analysis and processing as a high-speed data forwarding and storage path is designed.

View full abstract

Download PDF (30763K)
Criticality and Tolerance in Injection Timing in Cup-Stacking Method for Collective Communication

Takashi YOKOTA, Kanemitsu OOTSU

Article type: PAPER
Subject area: Computer System
2025Volume E108.DIssue 4 Pages 341-348
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: October 28, 2024

DOIhttps://doi.org/10.1587/transinf.2023EDP7204

JOURNAL FREE ACCESS

Show abstractHide abstract

Today’s parallel computers definitely employ a crucial component of interconnection network, in which message packets are used for interchanging information. One of the challenging issues of the network is congestion control. We have proposed a novel method of Cup-Stacking to solve the problem. The Cup-Stacking method splits a large packet into slices, re-shapes the slice by adjusting possible parameters, and injects the slices with an appropriate interval. The method is successful in reducing congestion by pre-scheduling the packet injection timing. However, as a practical system does not always guarantee precise packet timing, we should discuss robustness issues on the delays from the scheduled timing to show the practical usefulness of the proposed method (Cup-Stacking). This paper addresses criticality and tolerance issues for evaluating the delays from the scheduled timing and, then, proposes two evaluation indices to represent expected performance degradation: delay criticality index (DCI) and delay tolerance measure (DTM). The former represents the impact of the injection delay of individual packet and the latter shows the expected performance degradation. Evaluation results in the Cup-Stacking method reveal preferable relationships between DCI and DTM values. Furthermore, the results lead us to a practical guideline in applying the Cup-Stacking method.

View full abstract

Download PDF (1170K)
An Anchor-Free Siamese Tracker with Multi-Attention and Corner Detection Mechanism

Xiaokang JIN, Benben HUANG, Hao SHENG, Yao WU

Article type: PAPER
Subject area: Software System
2025Volume E108.DIssue 4 Pages 349-359
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: October 28, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7117

JOURNAL FREE ACCESS

Show abstractHide abstract

In recent times, anchor-based visual object trackers have become increasingly popular due to their exceptional performance. However, they rely on preset anchor boxes that require manual tuning, which can impact the performance of the trackers and introduce hyper-parameter dependencies. To address these issues, an anchor-free Siamese tracker with multi-attention and corner detection mechanism was proposed. Additionally, a multiple attention fusion module was created to calculate the relationship between the template and the search area in different channels, thus enhancing the model’s perception of environmental information. By eliminating the need for anchor points and performing direct computation, the proposed model minimizes the influence of hyper-parameters and human factors, resulting in improved overall efficiency. To showcase the effectiveness of the proposed tracker, comprehensive experiments were conducted on four challenging benchmarks, including OTB100, VOT2016, UAV123, and GOT-10k.

View full abstract

Download PDF (4994K)
Multi-Grained Guaranteeable Requirement Analysis for Iterative Adaptation

Jialong LI, Takuto YAMAUCHI, Takanori HIRANO, Jinyu CAI, Kenji TEI

Article type: PAPER
Subject area: Software Engineering
2025Volume E108.DIssue 4 Pages 360-370
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: October 31, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7200

JOURNAL FREE ACCESS

Show abstractHide abstract

In the studies of self-adaptive systems (SAS), requirement relaxation is a well-studied approach to adjust or disable certain requirements in response to requirement unsatisfaction or requirement conflicts, allowing the system to maintain core functionalities while temporarily reducing service quality. The recent integration of Guaranteeable Requirement Analysis (GRA) with Discrete Controller Synthesis (DCS) allows for coordinated self-adaptation by identifying relaxable requirements and then synthesizing new specifications to fulfill remaining requirements. However, the scalability of GRA poses challenges, particularly due to state explosion and combination explosion, making it difficult to apply to runtime self-adaptation due to timeliness reasons. To address this, this paper introduces the Multi-grained Guaranteeable Requirement Analysis (MGRA) approach, which (i) employs a multi-round adaptation process to deal with environmental changes and (ii) controls the trade-off between computation time and adaptation quality by adjusting the granularity of analysis. More specifically, the adaptation starts with a quick, coarser GRA for an initial adaptation to meet timeliness, followed by iterative refinements for finer GRA with higher-quality adaptations to meet more requirements gradually. The applicability and effectiveness have been assessed through two case studies.

View full abstract

Download PDF (1737K)
Learn Discriminative Features for Small Object Detection through Multi-Scale Image Degradation with Contrastive Learning

Xiaoguang TU, Zhi HE, Gui FU, Jianhua LIU, Mian ZHONG, Chao ZHOU, Xia ...

Article type: PAPER
Subject area: Image Processing and Video Processing
2025Volume E108.DIssue 4 Pages 371-383
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: November 05, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7204

JOURNAL FREE ACCESS

Show abstractHide abstract

To address challenges such as small target sizes, blurred target features, and difficulty in distinguishing between targets and backgrounds in small object detection, we propose a method based on Multi-Scale Image Degradation combined with the Contrastive Learning model. By leveraging contrastive learning techniques, our approach aims to enhance the discriminative features necessary for accurately distinguishing objects from backgrounds. To specifically target small objects, we subject target samples to various multi-scale image degradation modes before inputting them into the contrastive learning model. Augmentation techniques are then applied to these degraded samples to facilitate effective contrastive feature learning. Consequently, the model is better equipped to uncover the differences between small targets and backgrounds, thereby improving small object detection performance. Furthermore, considering that spatial domain features are sensitive to local changes in the image, while frequency domain features are sensitive to global structural changes, our approach applies the contrastive learning model in both spatial and frequency domains, aiming to acquire more robust features for small object detection. Extensive experiments conducted on the MS COCO dataset and the VisDrone2019 dataset validate the effectiveness of our proposed method in significantly enhancing small object detection accuracy.

View full abstract

Download PDF (14757K)
Recaptured Image Detection Based on Multi-Scale Residual Features of Discriminative Regions

Lanxi LIU, Pengpeng YANG, Suwen DU, Sani M. ABDULLAHI

Article type: PAPER
Subject area: Image Processing and Video Processing
2025Volume E108.DIssue 4 Pages 384-391
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: November 08, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7166

JOURNAL FREE ACCESS

Show abstractHide abstract

The rapid development of digital cameras and smartphones makes it easy for people to record the information displayed in the media and obtain high-quality recaptured images, which would pose a serious threat to copyright protection, identity authentication, and public social security. Therefore, detecting recaptured images is an urgent problem in the multimedia forensics community. Most existing methods for detecting recaptured images focus on mining specific traces left in the images during the recapture operation. However, these traces may be covered up in certain environmental settings. In order to address this issue, we explore the internal differences in image statistics between the original and recaptured images, which do not depend on specific traces, and construct a more robust feature for detecting recaptured images. Firstly, the most discriminative regions are extracted based on the measure of pixel dispersion. Secondly, a multi-scale residual feature is constructed by calculating the first-order statistics of residual images to enhance the robustness against various recapture environments. Lastly, binary grey wolf optimization and particle swarm optimization (BGWOPSO) feature selection method is used to reduce dimensions in the features space, which could keep a good balance between performance and computational complexity. Experimental results on three public databases demonstrate that our proposed method significantly improves detection performance, especially on the most difficult-to-detect ICL-COMMSP database.

View full abstract

Download PDF (8551K)
Lightweight Neural Data Sequence Modeling by Scale Causal Blocks

Hiroaki AKUTSU, Ko ARAI

Article type: PAPER
Subject area: Biocybernetics, Neurocomputing
2025Volume E108.DIssue 4 Pages 392-402
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: November 08, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDP7074

JOURNAL FREE ACCESS

Show abstractHide abstract

Autoregressive probability estimation of data sequences is a fundamental task in deep neural networks and has been widely used in applications such as data compression and generation. Since it is a sequential iterative process due to causality, there is a problem that its process is slow. One way to achieve high throughput is multiplexing on a GPU. To maximize the throughput of inference processing within the limited resources of the GPU, it is necessary to avoid the increase in computational complexity associated with deeper layers and to reduce the required memory consumption at higher multiplexing. In this paper, we propose Scale Causal Blocks (SCBs), which are basic components of deep neural networks that aim to significantly reduce the computational and memory cost compared to conventional techniques. Evaluation results show that the proposed method is one order of magnitude faster than a conventional computationally optimized Transformer-based method while maintaining comparable accuracy, and also shows better learning convergence.

View full abstract

Download PDF (1697K)
Effect of Politeness on Trust in Re-Enter Requests to User by Smart Speaker —Pilot Study—

Tomoki MIYAMOTO

Article type: LETTER
Subject area: Human-computer Interaction
2025Volume E108.DIssue 4 Pages 403-405
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: October 23, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDL8030

JOURNAL FREE ACCESS

Show abstractHide abstract

When a smart speaker encounters an error, it frequently prompts the user to re-enter the input information. This study examines the psychological impact of adopting a politeness strategy, involving linguistic considerations, for re-enter requests with a smart speaker, particularly focusing on its effect on trust. Specifically, a video-based impression evaluation experiment was conducted to assess the impact of politeness in re-enter requests on improving trust in situations where the smart speaker failed to deliver the expected output for the user.

View full abstract

Download PDF (2926K)
Joint Distribution-Aligned Dual-Sparse Linear Regression for Cross-Stimulus Speech-Based Depression Detection

Yingying LU, Cheng LU, Yuan ZONG, Feng ZHOU, Chuangao TANG

Article type: LETTER
Subject area: Pattern Recognition
2025Volume E108.DIssue 4 Pages 406-410
Published: April 01, 2025
Released on J-STAGE: April 01, 2025
Advance online publication: November 01, 2024

DOIhttps://doi.org/10.1587/transinf.2024EDL8054

JOURNAL FREE ACCESS

Show abstractHide abstract

This letter addresses the challenge of cross-stimulus speech-based depression detection (SDD), where training (source) and testing (target) speech samples stem from different stimulus methods, such as interview responses and reading texts. This discrepancy may create a mismatch in feature distributions between the source and target speech samples, leading to a notable deterioration in the performance of existing SDD methods. To tackle this issue, we propose a novel domain adaptation approach called Joint Distribution-aligned Dual-sparse Linear Regression (JDDLR). The fundamental idea of JDDLR is straightforward: extending simple linear regression (LR) to a version that is both depression-discriminative and stimulus-invariant. To achieve this, we initially equip JDDLR with depression-discriminative capability by constructing a dual-sparse linear regression (DLR) model. Unlike conventional linear regression models, DLR employs a meticulous coarse-to-fine feature selection mechanism to seek the depression-discriminative features from the acoustic feature set used to describe speech signals. Subsequently, we introduce a regularization term, which borrows the idea of joint distribution adaptation, thereby giving rise to JDDLR. This regularization term serves to alleviate the incongruities in feature distributions between the selected high-quality features of source and target samples. To evaluate JDDLR, extensive cross-stimulus SDD experiments are conducted on the MODMA dataset. The results underscore the promising performance of JDDLR in effectively addressing cross-stimulus SDD challenges.

View full abstract

Download PDF (148K)

Register with J-STAGE for free!