IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Volume E108.D, Issue 4
Displaying 1-13 of 13 articles from this issue
Special Section on Enriched Multimedia — Media technologies supporting the digital society —
  • Michiharu NIIMI
    2025Volume E108.DIssue 4 Pages 299
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    JOURNAL FREE ACCESS
    Download PDF (219K)
  • Masashi UNOKI, Kai LI, Anuwat CHAIWONGYEN, Quoc-Huy NGUYEN, Khalid ZAM ...
    Article type: INVITED PAPER
    2025Volume E108.DIssue 4 Pages 300-310
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: October 07, 2024
    JOURNAL FREE ACCESS

    Skillfully fabricated artificial replicas of authentic media using advanced AI-based generators are known as “deepfakes.” Deepfakes have become a growing concern due to their increased distribution in cyber-physical spaces. In particular, deepfake speech, which is fabricated by using advanced AI-based speech analysis/synthesis techniques, can be abused for spoofing and tampering with authentic speech signals. This can enable attackers to commit serious offenses such as fraud by voice impersonation and unauthorized speaker verification. Our research project aims to construct the basis of auditory-media signal processing for defending against deepfake speech attacks. To this end, we introduce current challenges and state-of-the-art techniques for deepfake speech detection and examine current trends and remaining issues. We then introduce the basis of the acoustical features related to auditory perception and propose methods for detecting deepfake speech based on auditory-media signal processing consisting of these features and deep neural networks (DNNs).

    Download PDF (2038K)
  • Takaharu TSUBOYAMA, Ryota TAKAHASHI, Motoi IWATA, Koichi KISE
    Article type: PAPER
    2025Volume E108.DIssue 4 Pages 311-319
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: October 07, 2024
    JOURNAL FREE ACCESS

    In recent years, digital signage has become popular as a means of information dissemination to the general public. However, unlike advertisements displayed on PCs or smartphones, it is impossible to directly acquire information displayed on such signages even if the content is interesting. Mizushima et al. proposed a video watermarking method that is robust against re-shooting so that the watermark can be extracted from watermarked videos displayed on digital signage. Conventional methods have the problem of limited information capacity. In recent years, watermarking methods based on deep learning have attracted attention for embedding large watermarks. In this paper, we implemented a video electronic watermark based on 3D U-Net, which makes it possible to embed larger watermarks than existing methods. In addition, the proposed method was able to extract the watermark from the re-shot video, and the shortest average processing time is 1.85 seconds to extract the correct watermark.

    Download PDF (6500K)
Regular Section
  • David CLARINO, Naoya ASADA, Atsushi MATSUO, Shigeru YAMASHITA
    Article type: PAPER
    Subject area: Fundamentals of Information Systems
    2025Volume E108.DIssue 4 Pages 320-329
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: October 30, 2024
    JOURNAL FREE ACCESS

    Lookup Table (LUT) based synthesis methods have recently been proposed as a way to synthesize quantum Boolean circuits in a qubit-constrained environment. Other recent research papers have demonstrated the possibility of using relative phase quantum circuits when compute/uncompute logic is used in tandem, reducing T-count in quantum Boolean circuits in the fault-tolerant quantum computing paradigm. Because LUT-based synthesis methods use compute/uncompute pairs on ancilla qubits, this suggests that implementing the arbitrary Boolean logic that make up the individual Boolean logic network nodes in a relative phase manner could reduce the T-count. To generate such arbitrary Boolean functions, we utilize Shannon’s decomposition, Davio expansions, as well as alternating balanced and unbalanced relative phase circuits. Experimental results demonstrate that our method can reduce the T-count to an average of 24% of the existing method.

    Download PDF (1625K)
  • Wei LEI, Yue ZHANG, Hanfeng XIE, Zebin CHEN, Zengping CHEN, Weixing LI
    Article type: PAPER
    Subject area: Computer System
    2025Volume E108.DIssue 4 Pages 330-340
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: October 30, 2024
    JOURNAL FREE ACCESS

    Radio Frequency (RF) transmitting-receiving platforms play important foundational roles in radar, communication, and so on. In this thesis, based on the Radio Frequency System on Chip (RFSoC), we design and develop a fully digital transmitting-receiving platform for the Multiple Input Multiple Output (MIMO) radar waveform diversity experiment. Firstly, the overall design is shown, and the implementation of each module, including multi-channel arbitrary waveform generation, multi-channel signal pre-processing, multi-channel synchronous, data forwarding and storage are elaborated in detail. Secondly, the RF signal quality evaluation methods are introduced, and the system RF performance is evaluated. The results indicate that its performance is good enough to meet radar requirements. Finally, by using a mutually orthogonal discrete frequency encoding waveform, the detection experiments for Unmanned Aerial Vehicles (UAV) are conducted, which indicates that the target is observed clearly. It verifies the effectiveness of our platform and its applicability to MIMO mode. Compared to conventional radio platforms of radar, our platform possesses many advantages. Firstly, it bears arbitrary waveform ability, and each channel is entirely independent. Secondly, it not only supports narrow and wide bands but also its sampling rates can be switched according to the bandwidth. Last but not least, it facilitates data analysis and processing as a high-speed data forwarding and storage path is designed.

    Download PDF (30763K)
  • Takashi YOKOTA, Kanemitsu OOTSU
    Article type: PAPER
    Subject area: Computer System
    2025Volume E108.DIssue 4 Pages 341-348
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: October 28, 2024
    JOURNAL FREE ACCESS

    Today’s parallel computers definitely employ a crucial component of interconnection network, in which message packets are used for interchanging information. One of the challenging issues of the network is congestion control. We have proposed a novel method of Cup-Stacking to solve the problem. The Cup-Stacking method splits a large packet into slices, re-shapes the slice by adjusting possible parameters, and injects the slices with an appropriate interval. The method is successful in reducing congestion by pre-scheduling the packet injection timing. However, as a practical system does not always guarantee precise packet timing, we should discuss robustness issues on the delays from the scheduled timing to show the practical usefulness of the proposed method (Cup-Stacking). This paper addresses criticality and tolerance issues for evaluating the delays from the scheduled timing and, then, proposes two evaluation indices to represent expected performance degradation: delay criticality index (DCI) and delay tolerance measure (DTM). The former represents the impact of the injection delay of individual packet and the latter shows the expected performance degradation. Evaluation results in the Cup-Stacking method reveal preferable relationships between DCI and DTM values. Furthermore, the results lead us to a practical guideline in applying the Cup-Stacking method.

    Download PDF (1170K)
  • Xiaokang JIN, Benben HUANG, Hao SHENG, Yao WU
    Article type: PAPER
    Subject area: Software System
    2025Volume E108.DIssue 4 Pages 349-359
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: October 28, 2024
    JOURNAL FREE ACCESS

    In recent times, anchor-based visual object trackers have become increasingly popular due to their exceptional performance. However, they rely on preset anchor boxes that require manual tuning, which can impact the performance of the trackers and introduce hyper-parameter dependencies. To address these issues, an anchor-free Siamese tracker with multi-attention and corner detection mechanism was proposed. Additionally, a multiple attention fusion module was created to calculate the relationship between the template and the search area in different channels, thus enhancing the model’s perception of environmental information. By eliminating the need for anchor points and performing direct computation, the proposed model minimizes the influence of hyper-parameters and human factors, resulting in improved overall efficiency. To showcase the effectiveness of the proposed tracker, comprehensive experiments were conducted on four challenging benchmarks, including OTB100, VOT2016, UAV123, and GOT-10k.

    Download PDF (4994K)
  • Jialong LI, Takuto YAMAUCHI, Takanori HIRANO, Jinyu CAI, Kenji TEI
    Article type: PAPER
    Subject area: Software Engineering
    2025Volume E108.DIssue 4 Pages 360-370
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: October 31, 2024
    JOURNAL FREE ACCESS

    In the studies of self-adaptive systems (SAS), requirement relaxation is a well-studied approach to adjust or disable certain requirements in response to requirement unsatisfaction or requirement conflicts, allowing the system to maintain core functionalities while temporarily reducing service quality. The recent integration of Guaranteeable Requirement Analysis (GRA) with Discrete Controller Synthesis (DCS) allows for coordinated self-adaptation by identifying relaxable requirements and then synthesizing new specifications to fulfill remaining requirements. However, the scalability of GRA poses challenges, particularly due to state explosion and combination explosion, making it difficult to apply to runtime self-adaptation due to timeliness reasons. To address this, this paper introduces the Multi-grained Guaranteeable Requirement Analysis (MGRA) approach, which (i) employs a multi-round adaptation process to deal with environmental changes and (ii) controls the trade-off between computation time and adaptation quality by adjusting the granularity of analysis. More specifically, the adaptation starts with a quick, coarser GRA for an initial adaptation to meet timeliness, followed by iterative refinements for finer GRA with higher-quality adaptations to meet more requirements gradually. The applicability and effectiveness have been assessed through two case studies.

    Download PDF (1737K)
  • Xiaoguang TU, Zhi HE, Gui FU, Jianhua LIU, Mian ZHONG, Chao ZHOU, Xia ...
    Article type: PAPER
    Subject area: Image Processing and Video Processing
    2025Volume E108.DIssue 4 Pages 371-383
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: November 05, 2024
    JOURNAL FREE ACCESS

    To address challenges such as small target sizes, blurred target features, and difficulty in distinguishing between targets and backgrounds in small object detection, we propose a method based on Multi-Scale Image Degradation combined with the Contrastive Learning model. By leveraging contrastive learning techniques, our approach aims to enhance the discriminative features necessary for accurately distinguishing objects from backgrounds. To specifically target small objects, we subject target samples to various multi-scale image degradation modes before inputting them into the contrastive learning model. Augmentation techniques are then applied to these degraded samples to facilitate effective contrastive feature learning. Consequently, the model is better equipped to uncover the differences between small targets and backgrounds, thereby improving small object detection performance. Furthermore, considering that spatial domain features are sensitive to local changes in the image, while frequency domain features are sensitive to global structural changes, our approach applies the contrastive learning model in both spatial and frequency domains, aiming to acquire more robust features for small object detection. Extensive experiments conducted on the MS COCO dataset and the VisDrone2019 dataset validate the effectiveness of our proposed method in significantly enhancing small object detection accuracy.

    Download PDF (14757K)
  • Lanxi LIU, Pengpeng YANG, Suwen DU, Sani M. ABDULLAHI
    Article type: PAPER
    Subject area: Image Processing and Video Processing
    2025Volume E108.DIssue 4 Pages 384-391
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: November 08, 2024
    JOURNAL FREE ACCESS

    The rapid development of digital cameras and smartphones makes it easy for people to record the information displayed in the media and obtain high-quality recaptured images, which would pose a serious threat to copyright protection, identity authentication, and public social security. Therefore, detecting recaptured images is an urgent problem in the multimedia forensics community. Most existing methods for detecting recaptured images focus on mining specific traces left in the images during the recapture operation. However, these traces may be covered up in certain environmental settings. In order to address this issue, we explore the internal differences in image statistics between the original and recaptured images, which do not depend on specific traces, and construct a more robust feature for detecting recaptured images. Firstly, the most discriminative regions are extracted based on the measure of pixel dispersion. Secondly, a multi-scale residual feature is constructed by calculating the first-order statistics of residual images to enhance the robustness against various recapture environments. Lastly, binary grey wolf optimization and particle swarm optimization (BGWOPSO) feature selection method is used to reduce dimensions in the features space, which could keep a good balance between performance and computational complexity. Experimental results on three public databases demonstrate that our proposed method significantly improves detection performance, especially on the most difficult-to-detect ICL-COMMSP database.

    Download PDF (8551K)
  • Hiroaki AKUTSU, Ko ARAI
    Article type: PAPER
    Subject area: Biocybernetics, Neurocomputing
    2025Volume E108.DIssue 4 Pages 392-402
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: November 08, 2024
    JOURNAL FREE ACCESS

    Autoregressive probability estimation of data sequences is a fundamental task in deep neural networks and has been widely used in applications such as data compression and generation. Since it is a sequential iterative process due to causality, there is a problem that its process is slow. One way to achieve high throughput is multiplexing on a GPU. To maximize the throughput of inference processing within the limited resources of the GPU, it is necessary to avoid the increase in computational complexity associated with deeper layers and to reduce the required memory consumption at higher multiplexing. In this paper, we propose Scale Causal Blocks (SCBs), which are basic components of deep neural networks that aim to significantly reduce the computational and memory cost compared to conventional techniques. Evaluation results show that the proposed method is one order of magnitude faster than a conventional computationally optimized Transformer-based method while maintaining comparable accuracy, and also shows better learning convergence.

    Download PDF (1697K)
  • Tomoki MIYAMOTO
    Article type: LETTER
    Subject area: Human-computer Interaction
    2025Volume E108.DIssue 4 Pages 403-405
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: October 23, 2024
    JOURNAL FREE ACCESS

    When a smart speaker encounters an error, it frequently prompts the user to re-enter the input information. This study examines the psychological impact of adopting a politeness strategy, involving linguistic considerations, for re-enter requests with a smart speaker, particularly focusing on its effect on trust. Specifically, a video-based impression evaluation experiment was conducted to assess the impact of politeness in re-enter requests on improving trust in situations where the smart speaker failed to deliver the expected output for the user.

    Download PDF (2926K)
  • Yingying LU, Cheng LU, Yuan ZONG, Feng ZHOU, Chuangao TANG
    Article type: LETTER
    Subject area: Pattern Recognition
    2025Volume E108.DIssue 4 Pages 406-410
    Published: April 01, 2025
    Released on J-STAGE: April 01, 2025
    Advance online publication: November 01, 2024
    JOURNAL FREE ACCESS

    This letter addresses the challenge of cross-stimulus speech-based depression detection (SDD), where training (source) and testing (target) speech samples stem from different stimulus methods, such as interview responses and reading texts. This discrepancy may create a mismatch in feature distributions between the source and target speech samples, leading to a notable deterioration in the performance of existing SDD methods. To tackle this issue, we propose a novel domain adaptation approach called Joint Distribution-aligned Dual-sparse Linear Regression (JDDLR). The fundamental idea of JDDLR is straightforward: extending simple linear regression (LR) to a version that is both depression-discriminative and stimulus-invariant. To achieve this, we initially equip JDDLR with depression-discriminative capability by constructing a dual-sparse linear regression (DLR) model. Unlike conventional linear regression models, DLR employs a meticulous coarse-to-fine feature selection mechanism to seek the depression-discriminative features from the acoustic feature set used to describe speech signals. Subsequently, we introduce a regularization term, which borrows the idea of joint distribution adaptation, thereby giving rise to JDDLR. This regularization term serves to alleviate the incongruities in feature distributions between the selected high-quality features of source and target samples. To evaluate JDDLR, extensive cross-stimulus SDD experiments are conducted on the MODMA dataset. The results underscore the promising performance of JDDLR in effectively addressing cross-stimulus SDD challenges.

    Download PDF (148K)
feedback
Top