Special Section on Intelligent Information Processing Technology to be Integrated into Society
-
Kiyota HASHIMOTO
2025 Volume E108.D Issue 7 Pages
645-646
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
JOURNAL
FREE ACCESS
-
Tomoaki YAMAZAKI, Seiya ITO, Kouzou OHARA
Article type: PAPER
2025 Volume E108.D Issue 7 Pages
647-658
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: December 20, 2024
JOURNAL
FREE ACCESS
A word sense is an essential element for understanding what a sentence means and can be interpreted as a concept on its own. To realize this cognition in Computational Linguistics, embedding methods have been proposed to map words to dense vectors. Among them, sense embeddings assign multiple vectors to each word to represent its distinct meanings. Their special feature is that the boundary between the meanings of each word explicitly exists. However, their qualities are evaluated using a conventional approach to word embeddings that implicitly addresses meaning. In precise, these evaluations adopt datasets composed from combinations of pairs of words and similarities between two words, where the number of meanings to be evaluated is limited compared to the number of words. Moreover, their evaluation metrics reflect only a part of the relationships between multi-sense words. To overcome these problems, in this paper, we propose a novel evaluation method to sense embeddings that covers rich meanings and addresses the combinations arising from polysemy, such as the uniqueness and redundancy of vectors. Our key idea is a vector, appropriately representing its meanings, has neighbors that can be considered to be similar words in a vector space. Based on this idea, we automatically construct an evaluation dataset with similar words for each meaning by combining information from two reliable concept hierarchies; one is manually managed, and the other is automatically created and manually managed. Then, based on the constructed dataset, we devise three kinds of evaluation metrics that associate vectors of a multi-sense word with its meanings in the dataset in different manners. Through an experiment, we empirically show that the proposed evaluation method can adequately reflect the quality of sense embeddings compared to the conventional method.
View full abstract
-
Zhizhong WANG, Wen GU, Zhaoxing LI, Koichi OTA, Shinobu HASEGAWA
Article type: PAPER
2025 Volume E108.D Issue 7 Pages
659-666
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: December 20, 2024
JOURNAL
FREE ACCESS
To understand the development of online discussions and engage effectively, it is a vital issue for both individual participant and facilitator to grasp the contents that the discussion group is focusing, i.e., spotlight contents. However, it becomes extremely challenging to catch up with the spotlight contents in the text-based consensus decision-making online forums (TCDOF) with the increasing of participants and post generation. In this paper, we endeavor to address this challenge through the introduction of a novel framework that leverages topics derived from post contents and inter-post structure to extract spotlight contents from TCDOF. In addition, the extracted spotlight contents are presented in the form of succinct natural language sentences, enhancing accessibility and comprehension. Furthermore, we devise a time-based spotlight contents extraction (TSCE) algorithm to extract spotlight content from a temporal perspective. The effectiveness of the proposed approach is demonstrated with real-world online discussion experiments.
View full abstract
-
Meihua XUE, Kazuki SUGITA, Koichi OTA, Wen GU, Shinobu HASEGAWA
Article type: PAPER
2025 Volume E108.D Issue 7 Pages
667-674
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 23, 2025
JOURNAL
FREE ACCESS
This research proposes a system to support Japanese vocabulary learning for L2 learners of Japanese by integrating object recognition technology and a thesaurus database. Vocabulary learning is the foundation of L2 learning, but traditional translation-based learning is still the mainstream. The proposed method is based on the hypothesis of the effectiveness of associating visuals and synonyms in vocabulary learning. The system combines YOLOv7 and WordNet Japanese, called PICSU (PICture-based Synonyms Understanding), to provide a unique and context-rich vocabulary learning experience. Preliminary experiment results from international graduate students as participants implied improved retention and engagement compared to traditional flashcard-based learning. This article outlines the proposed approach and highlights the potential for integrating intelligent information processing technology with vocabulary learning practice.
View full abstract
-
Lorenzo MAMELONA, TingHuai MA, Li JIA, Bright BEDIAKO-KYEREMEH, Benjam ...
Article type: PAPER
2025 Volume E108.D Issue 7 Pages
675-684
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: November 19, 2024
JOURNAL
FREE ACCESS
The widespread implementation of social distancing measures and remote work due to the COVID-19 pandemic has significantly altered societal dynamics, leading to an increased reliance on social media platforms for expressing sentiment. However, existing sentiment analysis models face challenges in comprehending the complexities of English tweets and nuances within social media conversations. To address this, our study proposes an innovative ensemble framework for sentiment analysis on social media, integrating Tiny Bert, a lightweight variant of BERT, into a dynamic bootstrap aggregation and stacking ensemble with extreme gradient boosting as a meta-learner. This framework aims to improve sentiment analysis efficiency while managing computational costs effectively. Our experiments demonstrate promising results, achieving an accuracy, precision, recall, and F1-score of 96.34%, 96.39%, 96.34%, and 96.35% respectively. These findings advance sentiment analysis tailored for the dynamic landscape of social media, enabling the identification of key pandemic discourse sentiments and informing public health interventions. The study underscores the importance of AI in extracting insights from COVID-19 tweets, contributing to a deeper understanding of societal impacts and highlighting its role in addressing global health challenges.
View full abstract
-
Chenbo SHI, Wenxin SUN, Jie ZHANG, Junsheng ZHANG, Chun ZHANG, Changsh ...
Article type: PAPER
2025 Volume E108.D Issue 7 Pages
685-696
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: November 12, 2024
JOURNAL
FREE ACCESS
Flexible paper answer sheets are widely employed in various examinations due to cost-effectiveness. However, optical marks on flexible paper often encounter challenges such as irregular shapes, non-uniform arrangement, deformation, and scanning noise, rendering automatic optical mark recognition (OMR) a formidable task. This paper introduces a multi-layer feature energy model based on the Bayesian global optimization method. The model seamlessly integrates the localization of individual marks and a division model to effectively address the problem of locating and segmenting optical marks with varying shapes and arrangements, even in the presence of deformation and diverse noise disturbances. Furthermore, the model incorporates the pixel occupancy ratio to achieve optical mark recognition. A comprehensive dataset comprising 31,940 instances of optical marks with diverse shapes and arrangements was meticulously created. This dataset achieved an impressive single-mark localization accuracy of 97.07% and an outstanding recognition accuracy of 97.80%. These results underscore the proposed method’s remarkable flexibility and noise resilience in solving multiple-choice recognition problems.
View full abstract
-
Jiakun LI, Jiajian LI, Yanjun SHI, Hui LIAN, Haifan WU
Article type: PAPER
2025 Volume E108.D Issue 7 Pages
697-708
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: September 24, 2024
JOURNAL
FREE ACCESS
In future 6G Vehicle-to-Everything (V2X) Network, task offloading of mobile edge computing (MEC) systems will face complex challenges in high mobility, dynamic environment. We herein propose a Multi-Agent Deep Reinforcement Learning algorithm (MADRL) with cloud-edge-vehicle collaborations to address these challenges. Firstly, we build the model of the task offloading problem in the cloud-edge-vehicle system, which meets low-latency, low-energy computing requirements by coordinating the computational resources of connected vehicles and MEC servers. Then, we reformulate this problem as a Markov Decision Process and propose a digital twin-assisted MADRL algorithm to tackle it. This algorithm tackles the problem by treating each connected vehicle as a agent, where the observations of agents are defined as the current local environmental state and global digital twin information. The action space of agents comprises discrete task offloading targets and continuous resource allocation. The objective of this algorithm is to improve overall system performance, taking into account collaborative learning among the agents. Experimental results show that the MADRL algorithm performed well in computational efficiency and energy consumption compared with other strategies.
View full abstract
-
Zhengyu LU, Pengfei XU
Article type: PAPER
2025 Volume E108.D Issue 7 Pages
709-717
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 17, 2025
JOURNAL
FREE ACCESS
Hail, recognized as a severe convective weather phenomenon, carries significant destructive. Accurate identification is crucial to minimize economic damages and safeguard lives. The primary challenges in detecting hail include the scarcity of valid hail samples and the imbalance of these samples in high-resolution datasets. In response, this paper introduces the HAM Unet model, an hail identification framework that leverages multisource data and environmental factors. The model combines the FEM-Unet semantic segmentation architecture data fusion techniques. By integrating radar reflectivity, FY-4B satellite imagery, ERA5 climatic parameters, and topographical data, HAM-Unet improves both its precision and resilience. Extensive training and validation have equipped HAM-Unet with good capabilities, achieving remarkable scores in Probability of Detection (POD), False Alarm Rate (FAR), and the Critical Success Index (CSI). The model not only show potential in improving the accuracy and reliability of hail identification but also provides innovative ideas and methods for improvement of hail monitoring and warning Systems.
View full abstract
-
Masateru TSUNODA, Takuto KUDO, Akito MONDEN, Amjed TAHIR, Kwabena Ebo ...
Article type: LETTER
2025 Volume E108.D Issue 7 Pages
718-722
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: November 11, 2024
JOURNAL
FREE ACCESS
Various clone detection methods have been proposed, with results varying depending on the combination of the methods and hyperparameters used (i.e., configurations). To help select a suitable clone detection configuration, we propose two Bandit Algorithm (BA) based methods that can help evaluate the configurations used dynamically while using detection methods. Our analysis showed that the two proposed methods, the naïve method and BANC (BA considering Negative Cases), identified the best configurations from four used code clone detection methods with high probability.
View full abstract
-
Masateru TSUNODA, Ryoto SHIMA, Amjed TAHIR, Kwabena Ebo BENNIN, Akito ...
Article type: LETTER
2025 Volume E108.D Issue 7 Pages
723-726
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: November 11, 2024
JOURNAL
FREE ACCESS
Background: Code s generation tools such as GitHub Copilot have received attention due to their performance in generating code. Generally, a prior analysis of their performance is needed to select new code-generation tools from a list of candidates. Without such analysis, there is a higher risk of selecting an ineffective tool, which would negatively affect software development productivity. Additionally, conducting prior analysis of new code generation tools is often time-consuming. Aim: To use a new code generation tool without prior analysis but with low risk, we propose to evaluate the new tools during software development (i.e., online optimization). Method: We apply the bandit algorithm (BA) approach to help select the best code suggestion or generation tool among a list of candidates. Developers evaluate whether the result of the tool is correct or not. When code generation and evaluation are repeated, the evaluation results are saved. We utilize the stored evaluation results to select the best tool based on the BA approach. In our preliminary analysis, we evaluated five tools with 164 code-generation cases using BA. Result: BA approach selected ChatGPT as the best tool as the evaluation proceeded, and during the evaluation, the average accuracy by BA approach outperformed the second-best performing tool. Our results reveal the feasibility and effectiveness of BA in assisting the selection of best-performing code suggestion or generation tools.
View full abstract
-
Qingqing YU, Rong JIN
Article type: PAPER
Subject area: Fundamentals of Information Systems
2025 Volume E108.D Issue 7 Pages
727-733
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 15, 2025
JOURNAL
FREE ACCESS
This paper presents an improved Quantum Approximate Optimization Algorithm variant based on Conditional Value-at-Risk for addressing portfolio optimization problems. Portfolio optimization is a NP-hard combinatorial problem that aims to select an optimal set of assets and their quantities to balance risk against expected return. The proposed approach uses the QAOA to find the optimal asset combination that maximizes returns while minimizing risk, with a focus on the tail end of the loss distribution. An enhanced QAOA ansatz introduced that offers a balance between optimization quality and circuit depth, leading to faster convergence and higher probabilities of obtaining optimal solutions. Experiments are conducted using historical stock data from Nasdaq, optimizing portfolios with varying numbers of stocks. Our method outperforms original QAOA and CVaR-QAOA, particularly as the size of the problem increases. Regardless of the scenario, whether it involves 10, 12, 14, or 16 stocks, the improved CVaR-QAOA consistently converges within 100 iterations or less, whereas the standard QAOA consistently requires 450 iterations or more.
View full abstract
-
Koji KAMMA, Toshikazu WADA
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2025 Volume E108.D Issue 7 Pages
734-743
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: December 27, 2024
JOURNAL
FREE ACCESS
Deep Neural Networks (DNNs) are dominant in the field of Computer Vision (CV). Although DNN models show state-of-the-art performance in various CV tasks, using such models on resource-limited equipment (mobile phones, in-vehicle cameras, and so on) is challenging. Therefore, techniques of compressing DNN models without significant accuracy loss is desired. Pruning is one such technique that removes redundant neurons (or channels). In this paper, we present Pruning with Output Error Minimization (POEM). POEM has two steps: pruning and reconstruction. In the pruning step, the importance of neurons is evaluated, and the unimportant neurons are selected and removed. In the reconstruction step, the weights of the remaining neurons are tuned to compensate the error caused by pruning so that the model accuracy can be well preserved. The advantage of POEM over the previous methods is that both neuron selection and reconstruction is done based on the output error of the activation functions. On the other hand, the previous methods minimize the error before the activation functions. The experiments on the well-known DNN models (VGG, ResNet, and MobileNet) and the image recognition datasets (ImageNet, CUB-200-2011, and CIFAR-10) were conducted. The results show that POEM significantly outperforms the previous methods in maintaining the accuracy of the compressed models.
View full abstract
-
Qianhang DU, Zhipeng LIU, Yaotong SONG, Ningning WANG, Zeyuan JU, Shan ...
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2025 Volume E108.D Issue 7 Pages
744-751
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 10, 2025
JOURNAL
FREE ACCESS
ShuffleNetV2 is a lightweight deep learning model architecture designed to achieve efficient neural network performance in resource-constrained environments. Through channel shuffle and units of ShuffleNetV2, the model promotes effective information exchange between different channels, thereby enhancing feature representation and computational efficiency. However, due to its lightweight architecture, further improvements are needed in terms of accuracy, stability, and generalization ability in classification tasks. Dendritic neurons are basic neurons in the nervous system with multiple dendrites responsible for receiving input signals from other neurons. Inspired by the information processing capacity of dendritic neurons, researchers have proposed a new dendritic neuron model and applied it to various traditional deep learning models, achieving outstanding performance in different tasks. Motivated by this, this paper proposes Dendritic ShuffleNetV2 (DShuffleNetV2), which effectively combines the efficient feature extraction characteristics of ShuffleNetV2 with dendritic neuron features, thereby improving the classification performance in medical image classification tasks. To evaluate the performance of this model, image classification experiments are conducted on three different types of medical image datasets. The experimental results demonstrate that, by leveraging the nonlinear features of dendrites and synapses, DShuffleNetV2 significantly outperforms other comparison models in terms of accuracy, precision, recall, and F1 score.
View full abstract
-
Dingjie PENG, Wataru KAMEYAMA
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2025 Volume E108.D Issue 7 Pages
752-759
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: December 25, 2024
JOURNAL
FREE ACCESS
Weakly Supervised Semantic Segmentation (WSSS) aims to train models to identify and delineate objects within an image using limited training data such as image-level labels. While recent works mainly focus on exploring class-specific knowledge to improve the quality of class activation maps, we contend that relying solely on this approach within a non-hierarchical architecture fails to adequately capture the structural relationships within images. Drawing inspiration from fully supervised semantic segmentation designs, which use hierarchical multi-scale feature maps for predicting the dense masks, we propose a novel architecture that integrates a Structural Relation Multi-class Token Transformer (SR-MCT) with WSSS. This model employs multi-scale structural tokens, generated by a Spatial Prior Module (SPM), which interact not only with patch tokens to encode structural relations, but also with multi-class tokens to integrate class-specific knowledge into complex structural embeddings. The proposed Structural Relation Multi-class Token Attention effectively builds long-range dependencies among structural tokens, patch tokens, and multi-class tokens simultaneously. Experimental results and ablation studies on PASCAL VOC 2012 and MS COCO 2014 demonstrate that our proposed SR-MCT can enhance baseline performance and outperform other state-of-the-art methods.
View full abstract
-
Jinyong SUN, Zhiwei DONG, Zhigang SUN, Guoyong CAI, Xiang ZHAO
Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2025 Volume E108.D Issue 7 Pages
760-775
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 20, 2025
JOURNAL
FREE ACCESS
Graph classification has gained significant attention in recent years due to its wide applications in many domains such as cheminformatics, bioinformatics and social networks. Graph neural networks have been proved to be an effective solution for graph classification because of their powerful ability of learning graph node features. However, existing spatial graph convolutional neural networks for node-labeled graph classification utilize one-hot encoding or graph kernel methods to initialize node features, leading to their inability to capture semantic dependencies among graph nodes, with the result of a decrease in graph classification accuracy. In this paper, we propose a Node Semantic-based Spatial Graph Convolutional Network (NSSGCN) for graph classification which integrates multi-scale node semantic into graph neural network with word embedding. Specifically, we construct multiple corpora of different granularity for a graph dataset, and then leverage the PV-DBOW model to extract multi-scale node semantic information from built corpora. Then, we normalize non-Euclidean graph data into 3D tensor data by node ordering and receptive field constructing, during which we propose a node importance measurement considering both node semantic and topology. After that, we design a channel attention based spatial graph convolutional neural network to effectively learn graph feature vectors from these 3D tensor data. Finally, we apply a Dense layer followed by a softmax layer to the learned graph feature vectors to classify graphs. Experimental results show that our proposed method achieves superior graph classification accuracy compared with classical graph kernel methods and state-of-the-art spatial graph neural networks on six benchmark graph datasets. On average, our method achieves a remarkable accuracy improvement of 4.12% in graph classification.
View full abstract
-
Rong HUANG, Zewen QIAN, Hao MA, Zhezhe HAN, Yue XIE
Article type: PAPER
Subject area: Educational Technology
2025 Volume E108.D Issue 7 Pages
776-783
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 07, 2025
JOURNAL
FREE ACCESS
High-precision sports performance prediction of college students is significant in formulating scientific training mechanisms and reasonable physical fitness courses. Traditional sports performance prediction primarily relies on subjective experience, which suffers from limitations such as individual variations and low reliability. To overcome these shortcomings, an ensemble learning algorithm is proposed in this study. Firstly, a historical dataset for college students is established, including physical characteristics (such as age, height, weight, and lung capacity) and sports performance (such as the 50-meter dash, 1000-meter run, and standing long jump). Then, three forecasting engines including support vector regression, extreme learning machine and decision tree are employed for preliminary prediction based on the preprocessed physical characteristics. Sequentially, three preliminary predictions are combined by the Gaussian process regression in a nonlinear manner to achieve a final probabilistic prediction. By using the data collected from college students, the feasibility of the established ensemble model is evaluated. Practical confirms that the proposed model can fix the performance gap of the individual forecasting engine, effectively improving the prediction accuracy. In addition, the proposed method not only provides point predictions but also generates confidence interval information, which greatly enhances the prediction reliability.
View full abstract
-
Yusuke HIROTA, Yuta NAKASHIMA, Noa GARCIA
Article type: PAPER
Subject area: Multimedia Pattern Processing
2025 Volume E108.D Issue 7 Pages
784-794
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 20, 2025
JOURNAL
FREE ACCESS
We study societal bias amplification in image captioning. Image captioning models have been shown to perpetuate gender and racial biases, however, metrics to measure, quantify, and evaluate the societal bias in captions are not yet standardized. We provide a comprehensive study on the strengths and limitations of each metric, and propose LIC, a metric to study captioning bias amplification. We argue that, for image captioning, it is not enough to focus on the correct prediction of the protected attribute, and the whole context should be taken into account. We conduct extensive evaluation on traditional and state-of-the-art image captioning models, and surprisingly find that, by only focusing on the protected attribute prediction, bias mitigation models are unexpectedly amplifying bias.
View full abstract
-
Yusuke HIROTA, Yuta NAKASHIMA, Noa GARCIA
Article type: PAPER
Subject area: Multimedia Pattern Processing
2025 Volume E108.D Issue 7 Pages
795-807
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 20, 2025
JOURNAL
FREE ACCESS
Image captioning models are known to perpetuate and amplify harmful societal bias in the training set. In this work, we aim to mitigate such gender bias in image captioning models. While prior work has addressed this problem by forcing models to focus on people to reduce gender misclassification, it conversely generates gender-stereotypical words at the expense of predicting the correct gender. From this observation, we hypothesize that there are two types of gender bias affecting image captioning models: 1) bias that exploits context to predict gender, and 2) bias in the probability of generating certain (often stereotypical) words because of gender. To mitigate both types of gender biases, we propose a framework, called LIBRA, that learns from synthetically biased samples to decrease both types of biases, correcting gender misclassification and changing gender-stereotypical words to more neutral ones.
View full abstract
-
Binggang ZHUO, Ryota HONDA, Masaki MURATA
Article type: PAPER
Subject area: Natural Language Processing
2025 Volume E108.D Issue 7 Pages
808-819
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 16, 2025
JOURNAL
FREE ACCESS
Transformer is a significant achievement in the natural language processing field. By introducing a denoising autoencoding pretraining task and performing pretraining on a massive amount of text data, transformer models can achieve excellent results on a wide range of downstream natural language understanding tasks. This study focuses on the Japanese document emphasis task, and we propose a simple and effective method to enhance the performance of transformer models on the target task by utilizing title information. Experimental results demonstrate that the proposed model achieves an average F1-score of 0.437, which represents an improvement of 0.038 over the best-performing baseline (F1-score: 0.399) and 0.124 compared to a method based on conditional random fields (F1-score: 0.313). The results of the two-sided Wilcoxon signed-rank test highlight the statistical significance of the proposed model relative to the compared baseline models. An extensive set of additional investigations were conducted to highlight the importance of title information on the automatic Japanese document emphasis task. In addition, to further validate the effectiveness of the proposed methodology, experiments were conducted on the BBC News Summary, an English extractive summarization dataset. The results demonstrated that the proposed method, BERTSUM + All, significantly improved the performance compared to the primary baseline BERTSUM (from a ROUGE-1 score of 0.708 to 0.933).
View full abstract
-
Kosetsu TSUKUDA, Tomoyasu NAKANO, Masahiro HAMASAKI, Masataka GOTO
Article type: PAPER
Subject area: Music Information Processing
2025 Volume E108.D Issue 7 Pages
820-829
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 17, 2025
JOURNAL
FREE ACCESS
When a user listens to a song for the first time, what musical factors (e.g., melody, tempo, and lyrics) influence the user’s decision to like or dislike the song? An answer to this question would enable researchers to more deeply understand how people interact with music. Thus, in this paper, we report the results of an online survey involving 302 participants to investigate the influence of 10 musical factors. We also evaluate how a user’s personal characteristics (i.e., personality traits and musical sophistication) relate to the importance of each factor for the user. Moreover, we propose and evaluate three factor-based functions that would enable more effectively browsing songs on a music streaming service. The user survey results provide several reusable insights, including the following: (1) for most participants, the melody and singing voice are considered important factors in judging whether they like a song on first listen; (2) personal characteristics do influence the important factors (e.g., participants who have high openness and are sensitive to beat deviations emphasize melody); and (3) the proposed functions each have a certain level of demand because they enable users to easily find music that fits their tastes. We have released part of the survey results as publicly available data so that other researchers can reproduce the results and analyze the data from their own viewpoints.
View full abstract
-
Shaobao WU, Zhihua WU, Meixuan HUANG
Article type: PAPER
Subject area: Image Processing and Video Processing
2025 Volume E108.D Issue 7 Pages
830-840
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 06, 2025
JOURNAL
FREE ACCESS
Copy-move forgery detection is a crucial task in digital image forensics, aiming to identify duplicated regions within an image. Traditional methods often locate the forged regions but struggle to distinguish between the original and copied areas. We introduce a new method that addresses this issue. Initially, the method extracts the low-frequency components of the image using contourlet transform. These components are then divided into overlapping blocks, and singular value features are extracted from each block. The feature vectors are sorted lexicographically and combined with the offset information of the image blocks to identify suspicious regions. To further refine the detection, double quantization effect feature is computed for blocks within these suspicious regions. When the double quantization feature value of a block exceeds a certain threshold, it is classified as part of the copied region. Experimental results demonstrate that the proposed method not only effectively detects and localizes copy-move forgeries but also accurately identifies the original and copied regions. Moreover, the method proves to be effective even on natural images containing multiple similar-but-genuine objects, reducing the false alarm rate.
View full abstract
-
Huawei TAO, Ziyi HU, Sixian LI, Chunhua ZHU, Peng LI, Yue XIE
Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2025 Volume E108.D Issue 7 Pages
841-844
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 10, 2025
JOURNAL
FREE ACCESS
Speech Emotion Recognition (SER) plays a pivotal role in human-computer interaction, yet its performance is often hindered by the nonlinear entanglement of emotional and speaker features. This paper proposes an interpretable multi-level feature disentanglement algorithm for speech emotion recognition, aiming to effectively separate emotion features from individual speech. The algorithm first constructs a novel hybrid auto-encoder network that can separate static and dynamic emotional features from the features extracted by the self-supervised network emotion2vec, thereby obtaining multi-level and time-varying emotional feature representations. Additionally, we implement a multi-layer perceptual classifier based on Kolmogorov-Arnold Networks (KAN), which is adept at capturing complex nonlinear relationships in the data and further promote feature disentanglement. Experiments results on the IEMOCAP database show that our proposed algorithm achieves a WA value of 73.2%, surpassing the current state-of-the-art.
View full abstract
-
Rong HUANG, Yue XIE
Article type: LETTER
Subject area: Speech and Hearing
2025 Volume E108.D Issue 7 Pages
845-848
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 23, 2025
JOURNAL
FREE ACCESS
Recent studies in deep learning have shown great advantages in acoustic echo cancellation (AEC) due to its strong capability of non-linear fitting; however, most AEC models are based on the convolution recurrent network (CRN) architecture, using stacked convolution layers as the encoder to extract latent representations, without considering the misalignment between reference and echo signal. Furthermore, the masking-based filtering method disregards the inter-spectral correlation patterns and harmonic characteristics. In this paper, we propose an AEC approach called the multi-scale dual path convolution recurrent network with deep filtering block (DPDF-AEC). We propose a multi-scale encoder to capture complex patterns and time dependencies between the reference and microphone signal. After the masking method, a post-deep filtering block is introduced, incorporating spectrum patterns to further reduce residual echo. We conduct comprehensive ablation experiments to validate the effectiveness of each component in DPDF, and the results indicate that our model outperforms the AEC challenges the baseline in terms of the Echo-MOS metrics.
View full abstract
-
So KOIDE, Yoshiaki TAKATA, Hiroyuki SEKI
Article type: LETTER
Subject area: Fundamentals of Information Systems
2025 Volume E108.D Issue 7 Pages
849-852
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
Advance online publication: January 09, 2025
JOURNAL
FREE ACCESS
We study the decidability and complexity of non-cooperative rational synthesis problem (abbreviated as NCRSP) for some classes of probabilistic strategies. We show that NCRSP for stationary strategies and Muller objectives is in 3-EXPTIME, and if we restrict the strategies of environment players to be positional, NCRSP becomes NEXPSPACE solvable. On the other hand, NCRSP>, which is a variant of NCRSP, is shown to be undecidable even for pure finite-state strategies and terminal reachability objectives. Finally, we show that NCRSP becomes EXPTIME solvable if we restrict the memory of a strategy to be the most recently visited t vertices where t is linear in the size of the game.
View full abstract