IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
最新号
選択された号の論文の13件中1~13を表示しています
Special Section on Enriched Multimedia — Media technologies supporting the digital society —
  • Michiharu NIIMI
    2025 年 E108.D 巻 4 号 p. 299
    発行日: 2025/04/01
    公開日: 2025/04/01
    ジャーナル フリー
  • Masashi UNOKI, Kai LI, Anuwat CHAIWONGYEN, Quoc-Huy NGUYEN, Khalid ZAM ...
    原稿種別: INVITED PAPER
    2025 年 E108.D 巻 4 号 p. 300-310
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/10/07
    ジャーナル フリー

    Skillfully fabricated artificial replicas of authentic media using advanced AI-based generators are known as “deepfakes.” Deepfakes have become a growing concern due to their increased distribution in cyber-physical spaces. In particular, deepfake speech, which is fabricated by using advanced AI-based speech analysis/synthesis techniques, can be abused for spoofing and tampering with authentic speech signals. This can enable attackers to commit serious offenses such as fraud by voice impersonation and unauthorized speaker verification. Our research project aims to construct the basis of auditory-media signal processing for defending against deepfake speech attacks. To this end, we introduce current challenges and state-of-the-art techniques for deepfake speech detection and examine current trends and remaining issues. We then introduce the basis of the acoustical features related to auditory perception and propose methods for detecting deepfake speech based on auditory-media signal processing consisting of these features and deep neural networks (DNNs).

  • Takaharu TSUBOYAMA, Ryota TAKAHASHI, Motoi IWATA, Koichi KISE
    原稿種別: PAPER
    2025 年 E108.D 巻 4 号 p. 311-319
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/10/07
    ジャーナル フリー

    In recent years, digital signage has become popular as a means of information dissemination to the general public. However, unlike advertisements displayed on PCs or smartphones, it is impossible to directly acquire information displayed on such signages even if the content is interesting. Mizushima et al. proposed a video watermarking method that is robust against re-shooting so that the watermark can be extracted from watermarked videos displayed on digital signage. Conventional methods have the problem of limited information capacity. In recent years, watermarking methods based on deep learning have attracted attention for embedding large watermarks. In this paper, we implemented a video electronic watermark based on 3D U-Net, which makes it possible to embed larger watermarks than existing methods. In addition, the proposed method was able to extract the watermark from the re-shot video, and the shortest average processing time is 1.85 seconds to extract the correct watermark.

Regular Section
  • David CLARINO, Naoya ASADA, Atsushi MATSUO, Shigeru YAMASHITA
    原稿種別: PAPER
    専門分野: Fundamentals of Information Systems
    2025 年 E108.D 巻 4 号 p. 320-329
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/10/30
    ジャーナル フリー

    Lookup Table (LUT) based synthesis methods have recently been proposed as a way to synthesize quantum Boolean circuits in a qubit-constrained environment. Other recent research papers have demonstrated the possibility of using relative phase quantum circuits when compute/uncompute logic is used in tandem, reducing T-count in quantum Boolean circuits in the fault-tolerant quantum computing paradigm. Because LUT-based synthesis methods use compute/uncompute pairs on ancilla qubits, this suggests that implementing the arbitrary Boolean logic that make up the individual Boolean logic network nodes in a relative phase manner could reduce the T-count. To generate such arbitrary Boolean functions, we utilize Shannon’s decomposition, Davio expansions, as well as alternating balanced and unbalanced relative phase circuits. Experimental results demonstrate that our method can reduce the T-count to an average of 24% of the existing method.

  • Wei LEI, Yue ZHANG, Hanfeng XIE, Zebin CHEN, Zengping CHEN, Weixing LI
    原稿種別: PAPER
    専門分野: Computer System
    2025 年 E108.D 巻 4 号 p. 330-340
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/10/30
    ジャーナル フリー

    Radio Frequency (RF) transmitting-receiving platforms play important foundational roles in radar, communication, and so on. In this thesis, based on the Radio Frequency System on Chip (RFSoC), we design and develop a fully digital transmitting-receiving platform for the Multiple Input Multiple Output (MIMO) radar waveform diversity experiment. Firstly, the overall design is shown, and the implementation of each module, including multi-channel arbitrary waveform generation, multi-channel signal pre-processing, multi-channel synchronous, data forwarding and storage are elaborated in detail. Secondly, the RF signal quality evaluation methods are introduced, and the system RF performance is evaluated. The results indicate that its performance is good enough to meet radar requirements. Finally, by using a mutually orthogonal discrete frequency encoding waveform, the detection experiments for Unmanned Aerial Vehicles (UAV) are conducted, which indicates that the target is observed clearly. It verifies the effectiveness of our platform and its applicability to MIMO mode. Compared to conventional radio platforms of radar, our platform possesses many advantages. Firstly, it bears arbitrary waveform ability, and each channel is entirely independent. Secondly, it not only supports narrow and wide bands but also its sampling rates can be switched according to the bandwidth. Last but not least, it facilitates data analysis and processing as a high-speed data forwarding and storage path is designed.

  • Takashi YOKOTA, Kanemitsu OOTSU
    原稿種別: PAPER
    専門分野: Computer System
    2025 年 E108.D 巻 4 号 p. 341-348
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/10/28
    ジャーナル フリー

    Today’s parallel computers definitely employ a crucial component of interconnection network, in which message packets are used for interchanging information. One of the challenging issues of the network is congestion control. We have proposed a novel method of Cup-Stacking to solve the problem. The Cup-Stacking method splits a large packet into slices, re-shapes the slice by adjusting possible parameters, and injects the slices with an appropriate interval. The method is successful in reducing congestion by pre-scheduling the packet injection timing. However, as a practical system does not always guarantee precise packet timing, we should discuss robustness issues on the delays from the scheduled timing to show the practical usefulness of the proposed method (Cup-Stacking). This paper addresses criticality and tolerance issues for evaluating the delays from the scheduled timing and, then, proposes two evaluation indices to represent expected performance degradation: delay criticality index (DCI) and delay tolerance measure (DTM). The former represents the impact of the injection delay of individual packet and the latter shows the expected performance degradation. Evaluation results in the Cup-Stacking method reveal preferable relationships between DCI and DTM values. Furthermore, the results lead us to a practical guideline in applying the Cup-Stacking method.

  • Xiaokang JIN, Benben HUANG, Hao SHENG, Yao WU
    原稿種別: PAPER
    専門分野: Software System
    2025 年 E108.D 巻 4 号 p. 349-359
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/10/28
    ジャーナル フリー

    In recent times, anchor-based visual object trackers have become increasingly popular due to their exceptional performance. However, they rely on preset anchor boxes that require manual tuning, which can impact the performance of the trackers and introduce hyper-parameter dependencies. To address these issues, an anchor-free Siamese tracker with multi-attention and corner detection mechanism was proposed. Additionally, a multiple attention fusion module was created to calculate the relationship between the template and the search area in different channels, thus enhancing the model’s perception of environmental information. By eliminating the need for anchor points and performing direct computation, the proposed model minimizes the influence of hyper-parameters and human factors, resulting in improved overall efficiency. To showcase the effectiveness of the proposed tracker, comprehensive experiments were conducted on four challenging benchmarks, including OTB100, VOT2016, UAV123, and GOT-10k.

  • Jialong LI, Takuto YAMAUCHI, Takanori HIRANO, Jinyu CAI, Kenji TEI
    原稿種別: PAPER
    専門分野: Software Engineering
    2025 年 E108.D 巻 4 号 p. 360-370
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/10/31
    ジャーナル フリー

    In the studies of self-adaptive systems (SAS), requirement relaxation is a well-studied approach to adjust or disable certain requirements in response to requirement unsatisfaction or requirement conflicts, allowing the system to maintain core functionalities while temporarily reducing service quality. The recent integration of Guaranteeable Requirement Analysis (GRA) with Discrete Controller Synthesis (DCS) allows for coordinated self-adaptation by identifying relaxable requirements and then synthesizing new specifications to fulfill remaining requirements. However, the scalability of GRA poses challenges, particularly due to state explosion and combination explosion, making it difficult to apply to runtime self-adaptation due to timeliness reasons. To address this, this paper introduces the Multi-grained Guaranteeable Requirement Analysis (MGRA) approach, which (i) employs a multi-round adaptation process to deal with environmental changes and (ii) controls the trade-off between computation time and adaptation quality by adjusting the granularity of analysis. More specifically, the adaptation starts with a quick, coarser GRA for an initial adaptation to meet timeliness, followed by iterative refinements for finer GRA with higher-quality adaptations to meet more requirements gradually. The applicability and effectiveness have been assessed through two case studies.

  • Xiaoguang TU, Zhi HE, Gui FU, Jianhua LIU, Mian ZHONG, Chao ZHOU, Xia ...
    原稿種別: PAPER
    専門分野: Image Processing and Video Processing
    2025 年 E108.D 巻 4 号 p. 371-383
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/11/05
    ジャーナル フリー

    To address challenges such as small target sizes, blurred target features, and difficulty in distinguishing between targets and backgrounds in small object detection, we propose a method based on Multi-Scale Image Degradation combined with the Contrastive Learning model. By leveraging contrastive learning techniques, our approach aims to enhance the discriminative features necessary for accurately distinguishing objects from backgrounds. To specifically target small objects, we subject target samples to various multi-scale image degradation modes before inputting them into the contrastive learning model. Augmentation techniques are then applied to these degraded samples to facilitate effective contrastive feature learning. Consequently, the model is better equipped to uncover the differences between small targets and backgrounds, thereby improving small object detection performance. Furthermore, considering that spatial domain features are sensitive to local changes in the image, while frequency domain features are sensitive to global structural changes, our approach applies the contrastive learning model in both spatial and frequency domains, aiming to acquire more robust features for small object detection. Extensive experiments conducted on the MS COCO dataset and the VisDrone2019 dataset validate the effectiveness of our proposed method in significantly enhancing small object detection accuracy.

  • Lanxi LIU, Pengpeng YANG, Suwen DU, Sani M. ABDULLAHI
    原稿種別: PAPER
    専門分野: Image Processing and Video Processing
    2025 年 E108.D 巻 4 号 p. 384-391
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/11/08
    ジャーナル フリー

    The rapid development of digital cameras and smartphones makes it easy for people to record the information displayed in the media and obtain high-quality recaptured images, which would pose a serious threat to copyright protection, identity authentication, and public social security. Therefore, detecting recaptured images is an urgent problem in the multimedia forensics community. Most existing methods for detecting recaptured images focus on mining specific traces left in the images during the recapture operation. However, these traces may be covered up in certain environmental settings. In order to address this issue, we explore the internal differences in image statistics between the original and recaptured images, which do not depend on specific traces, and construct a more robust feature for detecting recaptured images. Firstly, the most discriminative regions are extracted based on the measure of pixel dispersion. Secondly, a multi-scale residual feature is constructed by calculating the first-order statistics of residual images to enhance the robustness against various recapture environments. Lastly, binary grey wolf optimization and particle swarm optimization (BGWOPSO) feature selection method is used to reduce dimensions in the features space, which could keep a good balance between performance and computational complexity. Experimental results on three public databases demonstrate that our proposed method significantly improves detection performance, especially on the most difficult-to-detect ICL-COMMSP database.

  • Hiroaki AKUTSU, Ko ARAI
    原稿種別: PAPER
    専門分野: Biocybernetics, Neurocomputing
    2025 年 E108.D 巻 4 号 p. 392-402
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/11/08
    ジャーナル フリー

    Autoregressive probability estimation of data sequences is a fundamental task in deep neural networks and has been widely used in applications such as data compression and generation. Since it is a sequential iterative process due to causality, there is a problem that its process is slow. One way to achieve high throughput is multiplexing on a GPU. To maximize the throughput of inference processing within the limited resources of the GPU, it is necessary to avoid the increase in computational complexity associated with deeper layers and to reduce the required memory consumption at higher multiplexing. In this paper, we propose Scale Causal Blocks (SCBs), which are basic components of deep neural networks that aim to significantly reduce the computational and memory cost compared to conventional techniques. Evaluation results show that the proposed method is one order of magnitude faster than a conventional computationally optimized Transformer-based method while maintaining comparable accuracy, and also shows better learning convergence.

  • Tomoki MIYAMOTO
    原稿種別: LETTER
    専門分野: Human-computer Interaction
    2025 年 E108.D 巻 4 号 p. 403-405
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/10/23
    ジャーナル フリー

    When a smart speaker encounters an error, it frequently prompts the user to re-enter the input information. This study examines the psychological impact of adopting a politeness strategy, involving linguistic considerations, for re-enter requests with a smart speaker, particularly focusing on its effect on trust. Specifically, a video-based impression evaluation experiment was conducted to assess the impact of politeness in re-enter requests on improving trust in situations where the smart speaker failed to deliver the expected output for the user.

  • Yingying LU, Cheng LU, Yuan ZONG, Feng ZHOU, Chuangao TANG
    原稿種別: LETTER
    専門分野: Pattern Recognition
    2025 年 E108.D 巻 4 号 p. 406-410
    発行日: 2025/04/01
    公開日: 2025/04/01
    [早期公開] 公開日: 2024/11/01
    ジャーナル フリー

    This letter addresses the challenge of cross-stimulus speech-based depression detection (SDD), where training (source) and testing (target) speech samples stem from different stimulus methods, such as interview responses and reading texts. This discrepancy may create a mismatch in feature distributions between the source and target speech samples, leading to a notable deterioration in the performance of existing SDD methods. To tackle this issue, we propose a novel domain adaptation approach called Joint Distribution-aligned Dual-sparse Linear Regression (JDDLR). The fundamental idea of JDDLR is straightforward: extending simple linear regression (LR) to a version that is both depression-discriminative and stimulus-invariant. To achieve this, we initially equip JDDLR with depression-discriminative capability by constructing a dual-sparse linear regression (DLR) model. Unlike conventional linear regression models, DLR employs a meticulous coarse-to-fine feature selection mechanism to seek the depression-discriminative features from the acoustic feature set used to describe speech signals. Subsequently, we introduce a regularization term, which borrows the idea of joint distribution adaptation, thereby giving rise to JDDLR. This regularization term serves to alleviate the incongruities in feature distributions between the selected high-quality features of source and target samples. To evaluate JDDLR, extensive cross-stimulus SDD experiments are conducted on the MODMA dataset. The results underscore the promising performance of JDDLR in effectively addressing cross-stimulus SDD challenges.

feedback
Top