IEICE Transactions on Information and Systems

Special Section on Enriched Multimedia — Media technologies supporting the digital society —

FOREWORD

Michiharu NIIMI

2025 年 E108.D 巻 4 号 p. 299
発行日: 2025/04/01
公開日: 2025/04/01

DOIhttps://doi.org/10.1587/transinf.2024MUF0001

ジャーナルフリー

PDF形式でダウンロード (219K)
Deepfake Speech Detection: Approaches from Acoustic Features to Deep Neural Networks

Masashi UNOKI, Kai LI, Anuwat CHAIWONGYEN, Quoc-Huy NGUYEN, Khalid ZAM ...

原稿種別: INVITED PAPER
2025 年 E108.D 巻 4 号 p. 300-310
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/10/07

DOIhttps://doi.org/10.1587/transinf.2024MUI0001

ジャーナルフリー

抄録を表示する抄録を非表示にする

Skillfully fabricated artificial replicas of authentic media using advanced AI-based generators are known as “deepfakes.” Deepfakes have become a growing concern due to their increased distribution in cyber-physical spaces. In particular, deepfake speech, which is fabricated by using advanced AI-based speech analysis/synthesis techniques, can be abused for spoofing and tampering with authentic speech signals. This can enable attackers to commit serious offenses such as fraud by voice impersonation and unauthorized speaker verification. Our research project aims to construct the basis of auditory-media signal processing for defending against deepfake speech attacks. To this end, we introduce current challenges and state-of-the-art techniques for deepfake speech detection and examine current trends and remaining issues. We then introduce the basis of the acoustical features related to auditory perception and propose methods for detecting deepfake speech based on auditory-media signal processing consisting of these features and deep neural networks (DNNs).

抄録全体を表示

PDF形式でダウンロード (2038K)
Video Watermarking Method Based on 3D U-Net Robust Against Re-Shooting

Takaharu TSUBOYAMA, Ryota TAKAHASHI, Motoi IWATA, Koichi KISE

原稿種別: PAPER
2025 年 E108.D 巻 4 号 p. 311-319
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/10/07

DOIhttps://doi.org/10.1587/transinf.2024MUP0003

ジャーナルフリー

抄録を表示する抄録を非表示にする

In recent years, digital signage has become popular as a means of information dissemination to the general public. However, unlike advertisements displayed on PCs or smartphones, it is impossible to directly acquire information displayed on such signages even if the content is interesting. Mizushima et al. proposed a video watermarking method that is robust against re-shooting so that the watermark can be extracted from watermarked videos displayed on digital signage. Conventional methods have the problem of limited information capacity. In recent years, watermarking methods based on deep learning have attracted attention for embedding large watermarks. In this paper, we implemented a video electronic watermark based on 3D U-Net, which makes it possible to embed larger watermarks than existing methods. In addition, the proposed method was able to extract the watermark from the re-shot video, and the shortest average processing time is 1.85 seconds to extract the correct watermark.

抄録全体を表示

PDF形式でダウンロード (6500K)

Regular Section

Leveraging Different Boolean Function Decompositions to Reduce T-Count in LUT-Based Quantum Circuit Synthesis

David CLARINO, Naoya ASADA, Atsushi MATSUO, Shigeru YAMASHITA

原稿種別: PAPER
専門分野: Fundamentals of Information Systems
2025 年 E108.D 巻 4 号 p. 320-329
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/10/30

DOIhttps://doi.org/10.1587/transinf.2024EDP7194

ジャーナルフリー

抄録を表示する抄録を非表示にする

Lookup Table (LUT) based synthesis methods have recently been proposed as a way to synthesize quantum Boolean circuits in a qubit-constrained environment. Other recent research papers have demonstrated the possibility of using relative phase quantum circuits when compute/uncompute logic is used in tandem, reducing T-count in quantum Boolean circuits in the fault-tolerant quantum computing paradigm. Because LUT-based synthesis methods use compute/uncompute pairs on ancilla qubits, this suggests that implementing the arbitrary Boolean logic that make up the individual Boolean logic network nodes in a relative phase manner could reduce the T-count. To generate such arbitrary Boolean functions, we utilize Shannon’s decomposition, Davio expansions, as well as alternating balanced and unbalanced relative phase circuits. Experimental results demonstrate that our method can reduce the T-count to an average of 24% of the existing method.

抄録全体を表示

PDF形式でダウンロード (1625K)
A Fully Digital Transmitting-Receiving Platform for MIMO Radar Waveform Diversity Experiment

Wei LEI, Yue ZHANG, Hanfeng XIE, Zebin CHEN, Zengping CHEN, Weixing LI

原稿種別: PAPER
専門分野: Computer System
2025 年 E108.D 巻 4 号 p. 330-340
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/10/30

DOIhttps://doi.org/10.1587/transinf.2023EDP7276

ジャーナルフリー

抄録を表示する抄録を非表示にする

Radio Frequency (RF) transmitting-receiving platforms play important foundational roles in radar, communication, and so on. In this thesis, based on the Radio Frequency System on Chip (RFSoC), we design and develop a fully digital transmitting-receiving platform for the Multiple Input Multiple Output (MIMO) radar waveform diversity experiment. Firstly, the overall design is shown, and the implementation of each module, including multi-channel arbitrary waveform generation, multi-channel signal pre-processing, multi-channel synchronous, data forwarding and storage are elaborated in detail. Secondly, the RF signal quality evaluation methods are introduced, and the system RF performance is evaluated. The results indicate that its performance is good enough to meet radar requirements. Finally, by using a mutually orthogonal discrete frequency encoding waveform, the detection experiments for Unmanned Aerial Vehicles (UAV) are conducted, which indicates that the target is observed clearly. It verifies the effectiveness of our platform and its applicability to MIMO mode. Compared to conventional radio platforms of radar, our platform possesses many advantages. Firstly, it bears arbitrary waveform ability, and each channel is entirely independent. Secondly, it not only supports narrow and wide bands but also its sampling rates can be switched according to the bandwidth. Last but not least, it facilitates data analysis and processing as a high-speed data forwarding and storage path is designed.

抄録全体を表示

PDF形式でダウンロード (30763K)
Criticality and Tolerance in Injection Timing in Cup-Stacking Method for Collective Communication

Takashi YOKOTA, Kanemitsu OOTSU

原稿種別: PAPER
専門分野: Computer System
2025 年 E108.D 巻 4 号 p. 341-348
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/10/28

DOIhttps://doi.org/10.1587/transinf.2023EDP7204

ジャーナルフリー

抄録を表示する抄録を非表示にする

Today’s parallel computers definitely employ a crucial component of interconnection network, in which message packets are used for interchanging information. One of the challenging issues of the network is congestion control. We have proposed a novel method of Cup-Stacking to solve the problem. The Cup-Stacking method splits a large packet into slices, re-shapes the slice by adjusting possible parameters, and injects the slices with an appropriate interval. The method is successful in reducing congestion by pre-scheduling the packet injection timing. However, as a practical system does not always guarantee precise packet timing, we should discuss robustness issues on the delays from the scheduled timing to show the practical usefulness of the proposed method (Cup-Stacking). This paper addresses criticality and tolerance issues for evaluating the delays from the scheduled timing and, then, proposes two evaluation indices to represent expected performance degradation: delay criticality index (DCI) and delay tolerance measure (DTM). The former represents the impact of the injection delay of individual packet and the latter shows the expected performance degradation. Evaluation results in the Cup-Stacking method reveal preferable relationships between DCI and DTM values. Furthermore, the results lead us to a practical guideline in applying the Cup-Stacking method.

抄録全体を表示

PDF形式でダウンロード (1170K)
An Anchor-Free Siamese Tracker with Multi-Attention and Corner Detection Mechanism

Xiaokang JIN, Benben HUANG, Hao SHENG, Yao WU

原稿種別: PAPER
専門分野: Software System
2025 年 E108.D 巻 4 号 p. 349-359
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/10/28

DOIhttps://doi.org/10.1587/transinf.2024EDP7117

ジャーナルフリー

抄録を表示する抄録を非表示にする

In recent times, anchor-based visual object trackers have become increasingly popular due to their exceptional performance. However, they rely on preset anchor boxes that require manual tuning, which can impact the performance of the trackers and introduce hyper-parameter dependencies. To address these issues, an anchor-free Siamese tracker with multi-attention and corner detection mechanism was proposed. Additionally, a multiple attention fusion module was created to calculate the relationship between the template and the search area in different channels, thus enhancing the model’s perception of environmental information. By eliminating the need for anchor points and performing direct computation, the proposed model minimizes the influence of hyper-parameters and human factors, resulting in improved overall efficiency. To showcase the effectiveness of the proposed tracker, comprehensive experiments were conducted on four challenging benchmarks, including OTB100, VOT2016, UAV123, and GOT-10k.

抄録全体を表示

PDF形式でダウンロード (4994K)
Multi-Grained Guaranteeable Requirement Analysis for Iterative Adaptation

Jialong LI, Takuto YAMAUCHI, Takanori HIRANO, Jinyu CAI, Kenji TEI

原稿種別: PAPER
専門分野: Software Engineering
2025 年 E108.D 巻 4 号 p. 360-370
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/10/31

DOIhttps://doi.org/10.1587/transinf.2024EDP7200

ジャーナルフリー

抄録を表示する抄録を非表示にする

In the studies of self-adaptive systems (SAS), requirement relaxation is a well-studied approach to adjust or disable certain requirements in response to requirement unsatisfaction or requirement conflicts, allowing the system to maintain core functionalities while temporarily reducing service quality. The recent integration of Guaranteeable Requirement Analysis (GRA) with Discrete Controller Synthesis (DCS) allows for coordinated self-adaptation by identifying relaxable requirements and then synthesizing new specifications to fulfill remaining requirements. However, the scalability of GRA poses challenges, particularly due to state explosion and combination explosion, making it difficult to apply to runtime self-adaptation due to timeliness reasons. To address this, this paper introduces the Multi-grained Guaranteeable Requirement Analysis (MGRA) approach, which (i) employs a multi-round adaptation process to deal with environmental changes and (ii) controls the trade-off between computation time and adaptation quality by adjusting the granularity of analysis. More specifically, the adaptation starts with a quick, coarser GRA for an initial adaptation to meet timeliness, followed by iterative refinements for finer GRA with higher-quality adaptations to meet more requirements gradually. The applicability and effectiveness have been assessed through two case studies.

抄録全体を表示

PDF形式でダウンロード (1737K)
Learn Discriminative Features for Small Object Detection through Multi-Scale Image Degradation with Contrastive Learning

Xiaoguang TU, Zhi HE, Gui FU, Jianhua LIU, Mian ZHONG, Chao ZHOU, Xia ...

原稿種別: PAPER
専門分野: Image Processing and Video Processing
2025 年 E108.D 巻 4 号 p. 371-383
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/11/05

DOIhttps://doi.org/10.1587/transinf.2024EDP7204

ジャーナルフリー

抄録を表示する抄録を非表示にする

To address challenges such as small target sizes, blurred target features, and difficulty in distinguishing between targets and backgrounds in small object detection, we propose a method based on Multi-Scale Image Degradation combined with the Contrastive Learning model. By leveraging contrastive learning techniques, our approach aims to enhance the discriminative features necessary for accurately distinguishing objects from backgrounds. To specifically target small objects, we subject target samples to various multi-scale image degradation modes before inputting them into the contrastive learning model. Augmentation techniques are then applied to these degraded samples to facilitate effective contrastive feature learning. Consequently, the model is better equipped to uncover the differences between small targets and backgrounds, thereby improving small object detection performance. Furthermore, considering that spatial domain features are sensitive to local changes in the image, while frequency domain features are sensitive to global structural changes, our approach applies the contrastive learning model in both spatial and frequency domains, aiming to acquire more robust features for small object detection. Extensive experiments conducted on the MS COCO dataset and the VisDrone2019 dataset validate the effectiveness of our proposed method in significantly enhancing small object detection accuracy.

抄録全体を表示

PDF形式でダウンロード (14757K)
Recaptured Image Detection Based on Multi-Scale Residual Features of Discriminative Regions

Lanxi LIU, Pengpeng YANG, Suwen DU, Sani M. ABDULLAHI

原稿種別: PAPER
専門分野: Image Processing and Video Processing
2025 年 E108.D 巻 4 号 p. 384-391
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/11/08

DOIhttps://doi.org/10.1587/transinf.2024EDP7166

ジャーナルフリー

抄録を表示する抄録を非表示にする

The rapid development of digital cameras and smartphones makes it easy for people to record the information displayed in the media and obtain high-quality recaptured images, which would pose a serious threat to copyright protection, identity authentication, and public social security. Therefore, detecting recaptured images is an urgent problem in the multimedia forensics community. Most existing methods for detecting recaptured images focus on mining specific traces left in the images during the recapture operation. However, these traces may be covered up in certain environmental settings. In order to address this issue, we explore the internal differences in image statistics between the original and recaptured images, which do not depend on specific traces, and construct a more robust feature for detecting recaptured images. Firstly, the most discriminative regions are extracted based on the measure of pixel dispersion. Secondly, a multi-scale residual feature is constructed by calculating the first-order statistics of residual images to enhance the robustness against various recapture environments. Lastly, binary grey wolf optimization and particle swarm optimization (BGWOPSO) feature selection method is used to reduce dimensions in the features space, which could keep a good balance between performance and computational complexity. Experimental results on three public databases demonstrate that our proposed method significantly improves detection performance, especially on the most difficult-to-detect ICL-COMMSP database.

抄録全体を表示

PDF形式でダウンロード (8551K)
Lightweight Neural Data Sequence Modeling by Scale Causal Blocks

Hiroaki AKUTSU, Ko ARAI

原稿種別: PAPER
専門分野: Biocybernetics, Neurocomputing
2025 年 E108.D 巻 4 号 p. 392-402
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/11/08

DOIhttps://doi.org/10.1587/transinf.2024EDP7074

ジャーナルフリー

抄録を表示する抄録を非表示にする

Autoregressive probability estimation of data sequences is a fundamental task in deep neural networks and has been widely used in applications such as data compression and generation. Since it is a sequential iterative process due to causality, there is a problem that its process is slow. One way to achieve high throughput is multiplexing on a GPU. To maximize the throughput of inference processing within the limited resources of the GPU, it is necessary to avoid the increase in computational complexity associated with deeper layers and to reduce the required memory consumption at higher multiplexing. In this paper, we propose Scale Causal Blocks (SCBs), which are basic components of deep neural networks that aim to significantly reduce the computational and memory cost compared to conventional techniques. Evaluation results show that the proposed method is one order of magnitude faster than a conventional computationally optimized Transformer-based method while maintaining comparable accuracy, and also shows better learning convergence.

抄録全体を表示

PDF形式でダウンロード (1697K)
Effect of Politeness on Trust in Re-Enter Requests to User by Smart Speaker —Pilot Study—

Tomoki MIYAMOTO

原稿種別: LETTER
専門分野: Human-computer Interaction
2025 年 E108.D 巻 4 号 p. 403-405
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/10/23

DOIhttps://doi.org/10.1587/transinf.2024EDL8030

ジャーナルフリー

抄録を表示する抄録を非表示にする

When a smart speaker encounters an error, it frequently prompts the user to re-enter the input information. This study examines the psychological impact of adopting a politeness strategy, involving linguistic considerations, for re-enter requests with a smart speaker, particularly focusing on its effect on trust. Specifically, a video-based impression evaluation experiment was conducted to assess the impact of politeness in re-enter requests on improving trust in situations where the smart speaker failed to deliver the expected output for the user.

抄録全体を表示

PDF形式でダウンロード (2926K)
Joint Distribution-Aligned Dual-Sparse Linear Regression for Cross-Stimulus Speech-Based Depression Detection

Yingying LU, Cheng LU, Yuan ZONG, Feng ZHOU, Chuangao TANG

原稿種別: LETTER
専門分野: Pattern Recognition
2025 年 E108.D 巻 4 号 p. 406-410
発行日: 2025/04/01
公開日: 2025/04/01
[早期公開] 公開日: 2024/11/01

DOIhttps://doi.org/10.1587/transinf.2024EDL8054

ジャーナルフリー

抄録を表示する抄録を非表示にする

This letter addresses the challenge of cross-stimulus speech-based depression detection (SDD), where training (source) and testing (target) speech samples stem from different stimulus methods, such as interview responses and reading texts. This discrepancy may create a mismatch in feature distributions between the source and target speech samples, leading to a notable deterioration in the performance of existing SDD methods. To tackle this issue, we propose a novel domain adaptation approach called Joint Distribution-aligned Dual-sparse Linear Regression (JDDLR). The fundamental idea of JDDLR is straightforward: extending simple linear regression (LR) to a version that is both depression-discriminative and stimulus-invariant. To achieve this, we initially equip JDDLR with depression-discriminative capability by constructing a dual-sparse linear regression (DLR) model. Unlike conventional linear regression models, DLR employs a meticulous coarse-to-fine feature selection mechanism to seek the depression-discriminative features from the acoustic feature set used to describe speech signals. Subsequently, we introduce a regularization term, which borrows the idea of joint distribution adaptation, thereby giving rise to JDDLR. This regularization term serves to alleviate the incongruities in feature distributions between the selected high-quality features of source and target samples. To evaluate JDDLR, extensive cross-stimulus SDD experiments are conducted on the MODMA dataset. The results underscore the promising performance of JDDLR in effectively addressing cross-stimulus SDD challenges.

抄録全体を表示

PDF形式でダウンロード (148K)

J-STAGEへの登録はこちら（無料）