IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
早期公開論文
早期公開論文の88件中1~50を表示しています
  • Mingyang XU, Ao ZHAN, Chengyu WU, Zhengqiang WANG
    原稿種別: LETTER
    論文ID: 2024EDL8094
    発行日: 2025年
    [早期公開] 公開日: 2025/04/02
    ジャーナル フリー 早期公開

    Recognizing fatigue drivers is essential for improving road safety. Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have been applied to identify the state of drivers. However, these models frequently encounter various challenges, including a vast number of parameters and low detection effectiveness. To address these challenges, we propose Dual-Lightweight-Swin-Transformer (DLS) for driver drowsiness detection. We also propose the Spatial-Temporal Fusion Model (STFM) and Global Saliency Fusion Model (GSFM), where STFM fuses the spatial-temporal features and GSFM fuses the features from different layers of STFM to enhance detection efficiency. Simulation results show that DLS increases accuracy by 0.33% and reduces the computational complexity by 49.3%. The running time per test epoch of DLS is reduced by 33.1%.

  • Zhe ZHANG, Yiding WANG, Jiali CUI, Han ZHENG
    原稿種別: PAPER
    論文ID: 2024EDP7161
    発行日: 2025年
    [早期公開] 公開日: 2025/04/02
    ジャーナル フリー 早期公開

    Multimodal Emotion Recognition (MER) is a critical task in sentiment analysis. Current methods primarily focus on multimodal fusion and representation of emotions, but they fail to capture the collaborative interaction in modalities effectively. In this study, we propose an MER model with intra-modal enhancement and inter-modal interaction (IEII). Firstly, this model extracts emotion information through RoBERTa, openSMILE, and DenseNet architectures from text, audio and video modalities respectively. The model designs the Large Enhanced Kernel Attention (LEKA) module which utilizes a simplified attention mechanism with large convolutional kernels, enhances intra-modal emotional information, and aligns modalities effectively. Then the multimodal representation space is proposed, which is constructed with transformer encoders to explore intermodal interactions. Finally, the model designs a Dual-Branch Multimodal Attention Fusion (DMAF) module based on grouped query attention and rapid attention mechanisms. The DMAF module integrates multimodal emotion representations and realizes the MER. The experimental results indicate that the model achieves superior overall accuracy and F1-scores on the IEMOCAP and MELD datasets compared to existing methods. It proved that the proposed model effectively enhances intra-modal emotional information and captures inter-modal interactions.

  • Kazuhiro WADA, Masaya TSUNOKAKE, Shigeki MATSUBARA
    原稿種別: PAPER
    論文ID: 2024EDP7149
    発行日: 2025年
    [早期公開] 公開日: 2025/03/28
    ジャーナル フリー 早期公開

    Citations using URL (URL citations) that appear in scholarly papers can be used as an information source for the research resource search engines. In particular, the information about the types of cited resources and reasons for their citation is crucial to describe the resources and their relations in the search services. To obtain this information, previous studies proposed some methods for classifying URL citations. However, their methods trained the model using a simple fine-tuning strategy and exhibited insufficient performance. We propose a classification method using a novel intermediate task. Our method trains the model on our intermediate task of identifying whether sample pairs belong to the same class before being fine-tuned on the target task. In the experiment, our method outperformed previous methods using the simple fine-tuning strategy with higher macro F-scores for different model sizes and architectures. Our analysis results indicate that the model learns the class boundaries of the target task by training our intermediate task. Our intermediate task also demonstrated higher performance and computational efficiency than an alternative intermediate task using triplet loss. Finally, we applied our method to other text classification tasks and confirmed the effectiveness when a simple fine-tuning strategy does not stably work.

  • Kei KOGAI, Yoshikazu UEDA
    原稿種別: PAPER
    論文ID: 2024EDP7192
    発行日: 2025年
    [早期公開] 公開日: 2025/03/28
    ジャーナル フリー 早期公開

    Information and control systems operated in the field of social infrastructure are required to enhance their quality in terms of safety and reliability, and model checking is an effective technique to validate their behavior in the design phase. Model checking generates a state transition diagram from a model of system behavior and verifies that the model satisfies a system requirement by exploring the state space. However, as the number of model attributes and attribute value combinations increases, the state space expands, leading to a state explosion that makes completing the search within a realistic time impossible. To solve this problem, methods to reduce the state space by dividing the model are commonly applied, although these methods require human judgment based on knowledge of the system and the designer's experience. The purpose of this paper is to propose a method for partitioning behavioral models of information and control system (ICS) without relying on such judgment. The structures of ICS are represented by attributes, and the behaviors are described by rules using these attributes. The description includes attributes that are characteristics of an ICS. This method extracts dependency relationships between rules from the reference to the attribute and generates the dependency graph. The graph is partitioned by clustering into clusters corresponding to the rules, thus reducing the state space. Clustering partitions the model at points where relationships between clusters, such as rule dependencies, are sufficiently low. Modularity is used as a measure to ensure that the total number of states after partitioning is less than before. The authors will confirm the effectiveness of this method by using the ICS example to show the partitioning of the system using this method, compare the number of states in the behavior models generated from the partitioned system, and show the results of model checking using these behavior models.

  • Keigo WAKAYAMA, Takafumi KANAMORI
    原稿種別: PAPER
    論文ID: 2024EDP7245
    発行日: 2025年
    [早期公開] 公開日: 2025/03/28
    ジャーナル フリー 早期公開

    Neural architecture search (NAS) is very useful for automating the design of DNN architectures. In recent years, a number of methods for training-free NAS have been proposed, and reducing search cost has raised expectations for real-world applications. In a state-of-the-art (SOTA) training-free NAS based on theoretical background, i.e., NASI, however, the proxy for estimating the test performance of candidate architectures is based on the training error, not the generalization error. In this research, we propose a NAS based on a proxy theoretically derived from the bias-variance decomposition of the normalized generalization error, called NAS-NGE, i.e., NAS based on normalized generalization error. Specifically, we propose a surrogate of the normalized 2nd order moment of Neural Tangent Kernel (NTK) and use it together with the normalized bias to construct NAS-NGE. We use NAS Benchmarks and DARTS search space to demonstrate the effectiveness of the proposed method by comparing it to SOTA training-free NAS in a short search time.

  • Jiajun LI, Qiang LI, Kui ZHENG, JinZheng LU, Lijuan WEI, Qiang XIANG
    原稿種別: PAPER
    論文ID: 2024EDP7280
    発行日: 2025年
    [早期公開] 公開日: 2025/03/21
    ジャーナル フリー 早期公開

    For landslides, a serious natural disaster, how to accurately locate the landslide area is crucial for disaster mitigation and relief work. In view of the complex situation of landslides and the difficulty of traditional methods in quickly and accurately determining the area where landslides occur, this paper proposes a multi-scale feature recognition algorithm for landslide images (MF-L-UNet++) by analyzing the characteristics of landslides and common semantic segmentation networks. MF-L-UNet++ is based on UNet++ with the following modifications. First, the Dual Large Feature Fusion Selective Kernel Attention (DLFFSKA) module is employed to eliminate the interference of background in model recognition and enhance the accuracy of landslide location capture. Second, the Same Scale Lightweight Kernel Prediction (SSLKP) is designed to achieve a significant reduction in the number of parameters while reducing the loss of convolutional feature information and position offset. Third, Large Kernel Content Aware Recombination Upsample (LKCARU) is presented to enhance the model's capacity to delineate the boundaries and details of the landslide, thereby facilitating more precise segmentation outcomes. Finally, Atrous Spatial Pyramid Pooling (ASPP) is introduced to address the issue of inadequate coverage and fusion of multi-scale information following the utilization of multiple modules, enabling the model to fully integrate global context information. The experimental results showed that on the expanded Bijie Landslide Dataset, the algorithm proposed in this study achieved an improvement of 3.68%, 1.29%, and 1.59% in IoU, Precision, and F1-score, respectively, compared to the UNet++ algorithm, while Params and Loss decreased by 0.86M and 0.05, respectively. Compared to other commonly used segmentation methods, the detection performance of the model in this paper is at the optimal level.

  • Huansha Wang, Qinrang Liu, Ruiyang Huang, Jianpeng Zhang, Hongji Liu
    原稿種別: PAPER
    論文ID: 2024EDP7173
    発行日: 2025年
    [早期公開] 公開日: 2025/03/19
    ジャーナル フリー 早期公開

    Multi-modal entity alignment (MMEA) endeavors to ascertain whether two multi-modal entities originating from distinct knowledge graphs refer to a congruent real-world object. This alignment is a pivotal technique in knowledge graph fusion, which aims to enhance the overall richness and comprehensiveness of the knowledge base. Existing mainstream MMEA models predominantly leverage graph convolutional networks and pre-trained visual models to extract the structural and visual features of entities, subsequently proceeding to integrate these features and conduct similarity comparisons. However, given the often suboptimal quality of multi-modal information in knowledge graphs, reliance solely on traditional visual feature extraction methods and the extraction of visual and structural features alone may result in insufficient semantic information within the generated multi-modal joint embeddings of entities. This limitation could potentially hinder the accuracy and effectiveness of multi-modal entity alignment. To address the above issues, we propose MSEEA, a Multi-modal Entity Alignment method based on Multidimensional Semantic Extraction. First, MSEEA fine-tunes a large language model using preprocessed entity relationship triples, thereby enhancing its capacity to analyze latent semantic information embedded in structural triples and generate contextually rich entity descriptions. Second, MSEEA employs a combination of multiple advanced models and systems to extract multidimensional semantic information from the visual modality, thereby circumventing the feature quality degradation that can occur with reliance solely on pre-trained visual models. Finally, MSEEA integrates different modal embeddings of entities to generate multi-modal representations and compares their similarities. We designed and executed experiments on FB15K-DB15K/ YAGO15K, and the outcomes demonstrate that MSEEA outperforms traditional approaches, achieving state-of-the-art results.

  • Zhifu TIAN, Tao HU, Chaoyang NIU, Di WU, Shu WANG
    原稿種別: PAPER
    論文ID: 2024EDP7266
    発行日: 2025年
    [早期公開] 公開日: 2025/03/19
    ジャーナル フリー 早期公開

    The deep unfolding network (DUN) for image compressive sensing (ICS) integrates a traditional optimization algorithm with a neural network, providing clear interpretability and demonstrating exceptional performance. Nevertheless, the inherent paradigm of the DUN lies in the independent proximal mapping between iterations and the limited information flux, potentially constraining the mapping capability of the deep unfolding method. This paper introduces a Feature-Domain FISTA-Inspired Deep Unfolding Network (FDFI-DUN) for ICS. FDFI-DUN comprises a Feature-Domain Nesterov Momentum Module (FNMM), a Feature-Domain Gradient Descent Module (FGDM), and a Two-level Multiscale Proximal Mapping Module (TMPMM). Specifically, the Nesterov momentum term and gradient descent term in the FISTA are tailored to the feature domain, enhancing the information flux of the entire DUN and augmenting the feature information within and between iterations while maintaining clear interpretability. Furthermore, the TMPMM, encompassing intra-stage and inter-stage components, is designed to further augment the information flux and effectively utilize multiscale feature information for reconstructing image details. Extensive experimental results demonstrate that the proposed FDFI-DUN surpasses state-of-the-art methods in both quality and vision. Our codes are available at: https://github.com/giant-pandada/FDFI-DUN.

  • Yongfei WU, Daisuke KATAYAMA, Tetsushi KOIDE, Toru TAMAKI, Shigeto YOS ...
    原稿種別: PAPER
    論文ID: 2024EDP7283
    発行日: 2025年
    [早期公開] 公開日: 2025/03/19
    ジャーナル フリー 早期公開

    In this paper, we propose an automatic segmentation method for detecting lesion areas from full-screen Narrow Band Imaging (NBI) endoscopic image frames using deep learning for real-time diagnosis support in endoscopy. In existing diagnosis support systems, doctors need to actively align lesion areas to accurately classify lesions. Therefore, we aim to develop a real-time diagnosis support system combining an automatic lesion segmentation algorithm, which can identify lesions in full-screen endoscopic image. We created a dataset of over 8000 images and verified the detection performance of multiple existing segmentation model structures. We realized that there is a serious problem of missing detection dealing with images with small lesion. We analyzed the possible reason and proposed a method of using convolutional backbone network for downsampling to retain effective information, and conducted experiments with a model structure using Dense Block and U-Net. The experimental results showed that the detection performance of our structure showed superiority over other models for small lesions. At the same time, CutMix, a data augmentation method added to the model learning method to further improve detection performance, was proven to be effective. The detection performance achieved an accuracy of 0.8603 ± 0.006 when evaluated using F-measure. In addition, our model showed the fastest processing speed in experimental test, which will be advantageous in the subsequent development of processing system for real-time clinical videos.

  • Jifeng GUO, Yongjie WANG, Jingtan GUO, Shiwei WEI, Xian SHI
    原稿種別: PAPER
    論文ID: 2024EDP7135
    発行日: 2025年
    [早期公開] 公開日: 2025/03/11
    ジャーナル フリー 早期公開

    The purpose of unsupervised person re-identification (Re-ID) is to improve the recognition performance of the model without using any labeled Re-ID datasets. Recently, camera differences and noisy labels have emerged as critical factors hindering the improvement of unsupervised Re-ID performance. To address these issues, we propose a camera style alignment (CSA) method. In CSA, we first devise the feature mean clustering (FM-clustering) algorithm, which is based on the average features for clustering to mitigate the impact of camera differences on the clustering results. Subsequently, we design dual-cluster consistency refinement (DCR), which assesses the reliability of pseudo-labels from the perspective of clustering consistency, thereby reducing the influence of noisy labels. In addition, we introduce style-aware invariance loss and camera-aware invariance loss to achieve camera style-invariant learning from different aspects. Style-aware invariance loss will improve the similarity between samples and their style-transferred counterparts, and camera-aware invariance loss will improve the similarity between positive samples of different cameras. The experimental results on the Market-1501 and MSMT17 datasets show that the performance of CSA exceeds the existing fully unsupervised Re-ID and unsupervised domain adaptation Re-ID methods.

  • Xinglong PEI, Yuxiang HU, Yongji DONG, Dan LI
    原稿種別: LETTER
    論文ID: 2024EDL8095
    発行日: 2025年
    [早期公開] 公開日: 2025/03/10
    ジャーナル フリー 早期公開

    We propose a task scheduling method using resource interleaving and Reinforcement Learning (RL) for edge network system. We use resource interleaving to schedule edge node task forwarding to reduce task waiting delay on resources after being forwarded. We formulate a task scheduling optimization problem and use RL to ensure real-time policy. Simulations verify the proposed method's effectiveness in task scheduling.

  • Hee-Suk PANG, Jun-seok LIM, Seokjin LEE
    原稿種別: LETTER
    論文ID: 2024EDL8086
    発行日: 2025年
    [早期公開] 公開日: 2025/03/07
    ジャーナル フリー 早期公開

    Whereas vibrato is one of the most frequently used techniques to enrich vocal and musical instrument sounds, the performance of fine frequency estimation methods has not been studied much for vibrato tones. We present three models of synthetic vibrato tones and compare three DFT-based fine frequency estimation methods using the models, which are phase difference estimation (PDE), zero-padding method (ZPM), and corrected quadratically interpolated fast Fourier transform (CQIFFT). Experimental results show that CQIFFT and ZPM with a large number of padded zeroes are effective in the fine frequency estimation of vibrato tones. We also show an example of applying each method to a flute vibrato tone. We expect that the results will be helpful in choosing a fine frequency estimation method for DFT-based methods to analyze the frequencies of vibrato tones.

  • Hui Li, Xiaofeng Yang, Zebin Zheng, Jinyi Li, Shengli Lu
    原稿種別: LETTER
    論文ID: 2024EDL8089
    発行日: 2025年
    [早期公開] 公開日: 2025/03/07
    ジャーナル フリー 早期公開

    Hardware accelerators using fixed-point quantization efficiently run object detection neural networks, but high-bit quantization demands substantial hardware and power, while low-bit quantization sacrifices accuracy. To address this, we introduce an 8-bit quantization scheme, ASPoT8, which uses add/shift operations to replace INT8 multiplications, minimizing hardware area and power consumption without compromising accuracy. ASPoT8 adjusts quantified value distribution to match INT8's accuracy. Tests on YOLOV3 Tiny and MobileNetV2 SSDlite show minimal mAP drops of 0.5% and 0.2%, respectively, with significant reductions in power (76.31%), delay (29.46%), and area (58.40%) over INT8, based on SMIC 40nm.

  • Lei ZHOU, Ryohei SASANO, Koichi TAKEDA
    原稿種別: PAPER
    論文ID: 2024EDP7126
    発行日: 2025年
    [早期公開] 公開日: 2025/03/07
    ジャーナル フリー 早期公開

    In the Autonomous Driving (AD) scenario, accurate, informative, and understandable descriptions of the traffic conditions and the ego-vehicle motions can increase the interpretability of an autonomous driving system to the vehicle user. End-to-end free-form video captioning is a straightforward vision-to-text task to address such needs. However, insufficient real-world driving scene descriptive data hinders the performance of caption generation under a simple supervised training paradigm. Recently, large-scale Vision-Language Pre-training (VLP) foundation models have attracted much attention from the community. Tuning large foundation models on task-specific datasets becomes a prevailing paradigm for caption generation. However, for the application in autonomous driving, we often encounter large gaps between the training data for VLP foundation models and the real-world driving scene captioning data, which impedes the immense potential of VLP foundation models. In this paper, we present to tackle this problem via a unified framework for cross-lingual cross-domain vision-language tuning empowered by Machine Translation (MT) techniques. We aim to obtain a captioning system for driving scene caption generation in Japanese from a domain-general and English-centric VLP model. The framework comprises two core components: (i) bidirectional knowledge distillation by MT teachers; (ii) fusing objectives for cross-lingual fine-tuning. Moreover, we introduce three schedulers to operate the vision-language tuning process with fusing objectives. Based on GIT [1], we implement our framework and verify its effectiveness on real-world driving scenes with natural caption texts annotated by experienced vehicle users. The caption generation performance with our framework reveals a significant advantage over the baseline settings.

  • Boago OKGETHENG, Koichi TAKEUCHI
    原稿種別: PAPER
    論文ID: 2024EDP7189
    発行日: 2025年
    [早期公開] 公開日: 2025/03/07
    ジャーナル フリー 早期公開

    Automatic Essay Scoring is a crucial task aimed at alleviating the workload of essay graders. Most of the previous studies have been focused on English essays, primarily due to the availability of extensive scored essay datasets. Thus, it remains uncertain whether the models developed for English are applicable to smaller-scale Japanese essay datasets. Recent studies have demonstrated the successful application of BERT-based regression and ranking models. However, downloadable Japanese GPT models, which are larger than BERT, have become available, and it is unclear which types of modeling are appropriate for Japanese essay scoring. In this paper, we explore various aspects of modeling using GPTs, including the type of model (i.e., classification or regression), the size of the GPT models, and the approach to training (e.g., learning from scratch versus conducting continual pre-training). In experiments conducted with Japanese essay datasets, we demonstrate that classification models combined with soft labels are more effective for scoring Japanese essays compared to the simple classification models. Regarding the size of GPT models, we show that smaller models can produce better results depending on the model, type of prompt, and theme.

  • Masatoshi YAMADA, Ryosuke TAKATA
    原稿種別: PAPER
    論文ID: 2024HCP0004
    発行日: 2025年
    [早期公開] 公開日: 2025/03/07
    ジャーナル フリー 早期公開

    There have been many reports that words are related to perceptions associated with body motion from the perspective of cognitive science, and it is equally clear that conscious word processing affects body motion of skills. On the other hand, the effect of word usage during motion on the perception associated with physical motion of skills has not been fully clarified. The purpose of this research is to empirically verify the effects of discrimination of perceptual objects by cognitive speech acts on the reaction time of body motion based on a perceptual reaction test and brain activity in the prefrontal cortex measured using near-infrared light. As a specific method, under the setting where the control task (CT) was defined as a subject 's saying "yes (hai)" regardless of whether a red or blue circle was displayed, whereas the target task (TT) was defined as a subject 'saying "red (aka)" when a red circle was displayed and "blue (ao)" when a blue circle was displayed, 30 able-bodied subjects were instructed to press down the space key only when a red circle was displayed to verify the differences in their reaction time between the two tasks. In addition, using the Near-Infrared Spectroscopy (NIS) system to measure brain activity, the brain activities of the subjects were compared between the two tasks based on changes in cerebral blood flow in their prefrontal cortex. Results showed that the reaction times of all subjects were significantly slower (t (1735) = 6.57, p< .05) in TT than in CT (at 5% level), that 16 out of 30 subjects had statistically slower reaction times in TT compared to CT, a hypothesis was supported. And the right brain activity in CT tended to be more activated than that in TT. The discussion suggested that TT compared to CT involved an additional judgment to discriminate perceptual objects by cognitive speech and therefore had slower reaction times due to the additional burden on the subjects' cognitive resources.

  • Zezhong LI, Jianjun MA, Fuji REN
    原稿種別: LETTER
    論文ID: 2024EDL8062
    発行日: 2025年
    [早期公開] 公開日: 2025/03/04
    ジャーナル フリー 早期公開

    The past decade has witnessed the rapid development of Neural Machine Translation (NMT). However, NMT approaches tend to generate fluent but sometimes unfaithful translations of the source sentences. In response to this problem, we propose a new framework to incorporate the bilingual phrase knowledge into the encoder-decoder architecture, which allows the system to make full use of the phrase knowledge flexibly with no need to design complicated search algorithm. A significant difference to the existing work is that we obtain all the target phrases aligning to any part of the source sentence and learn representations for them before the decoding starts, which alleviates the hurt of invisibility of the future context in the standard autoregressive decoder, so that the generated target words can be decided more accurately with a global understanding. Extensive experiments on Japanese-Chinese translation task show that the proposed approach significantly outperforms multiple strong baselines in terms of BLEU scores, and verify the effectiveness of exploiting bilingual phrase knowledge for NMT.

  • Chong-Hui Lee, Lin-Hao Huang, Fang-Bin Qi, Wei-Juan Wang, Xian-Ji Zhan ...
    原稿種別: LETTER
    論文ID: 2024EDL8087
    発行日: 2025年
    [早期公開] 公開日: 2025/03/04
    ジャーナル フリー 早期公開

    In recent years, environmental sustainability and the reduction of CO2 emissions have become significant research topics. To effectively reduce CO2 emissions, recent studies have used deep learning models to provide precise estimates, but these models often lack interpretability. In light of this, our study employs an explainable neural network to learn fuel consumption, which is then converted to CO2 emissions. The explainable neural network includes an explainable layer that can explain the importance of each input variable. Through this layer, the study can elucidate the impact of different speeds on fuel consumption and CO2 emissions. Validated with real fleet data, our study demonstrates an impressive mean absolute percentage error (MAPE) of only 3.3%, outperforming recent research methods.

  • Yuxin HUANG, Jiushun MA, Tianxu LI, Zhengtao YU, Yantuan XIAN, Yan XIA ...
    原稿種別: PAPER
    論文ID: 2024EDP7274
    発行日: 2025年
    [早期公開] 公開日: 2025/03/04
    ジャーナル フリー 早期公開

    Cross-lingual summarization (CLS) simplifies obtaining information across languages by generating summaries in the target language from source documents in another. State-of-the-art neural summarization models typically rely on training or fine-tuning with extensive corpora. Nonetheless, applying these approaches in practical industrial scenarios poses challenges due to the scarcity of annotated data. Recent research utilizes large language models (LLMs) to generate superior summaries by extracting fine-grained elements (entities, dates, events, and results) from source documents based on the Chain of Thought (CoT). Such an approach inevitably leads to the loss of fact-relationship across elements in the original document, thus hurting the performance of summary generation. In this paper, we not only substantiate the importance of the fact-relationship across elements for summary generation on the element-aware test sets CNN/DailyMail and BBC XSum but also propose a novel Cross-Lingual Summarization method based on Element Fact-relationship Generation (EFGCLS). Specifically, we break down the CLS task into three simple subtasks: though element fact-relationship generation extracts fine-grained elements in source documents and the fact-relationship across them; afterwards the monolingual document summarization leverages the fact-relationship and source documents to generate the monolingual summary; ultimately, the cross-lingual summarization via Cross-lingual Prompting (CLP) enhance the alignment between source language summaries and target language summaries. Experimental results on the element-aware datasets show that our method outperforms state-of-the-art fine-tuned PLMs and zero-shot LLMs by +6.28/+1.22 in ROUGE-L, respectively.

  • He GONG, Qingfa REN, Zhijie YIN, Quanyuan LIU, Jing WANG, Yuwei LIU, D ...
    原稿種別: PAPER
    論文ID: 2025EDP7004
    発行日: 2025年
    [早期公開] 公開日: 2025/03/04
    ジャーナル フリー 早期公開

    With the development of neuroscience and psychology, the cerebellar role in higher-order functions has been increasingly recognized. Premature birth has an impact on cerebellar development and increases the risk of neurodevelopmental disorders. This study aimed to evaluate the development and alterations of glutamate levels and volumes in cerebellar subregions in preterm infants and investigate the relationship of glutamate and volumes. 70 preterm infants and 22 full-term infants underwent glutamate chemical exchange saturation transfer (GluCEST) and sampling perfection with application optimized contrasts using different flip angle evolutions (SPACE). Custom-written scripts in MATLAB were used to process GluCEST images to obtain glutamate levels, and volumes were obtained by ITK-SNAP. Both glutamate levels and volumes in cerebellar subregions in preterm infants were positively correlated with postmenstrual age. Furthermore, when compared to full-term infants, the glutamate levels of preterm infants at term-equivalent age were higher. No correlation was found between glutamate and volume. The metabolite and structure of preterm infants in cerebellar subregions were altered even in the absence of significant brain structure damage. These findings may help probe the pattern of brain maturation and identify potential neurodevelopmental disorders in preterm infants.

  • Shun KAWAKAMI, Savong BOU, Toshiyuki AMAGASA
    原稿種別: PAPER
    論文ID: 2024DAT0001
    発行日: 2025年
    [早期公開] 公開日: 2025/02/25
    ジャーナル フリー 早期公開

    Stream processing engines need to process multiple queries over streams simultaneously, and continuous window aggregation plays a critical role in various applications as a part of data analysis pipelines. However, the system suffers from scalability issues when dealing with massive queries with different window and slide sizes over data streams with high input rates. To this problem, we propose LSiX (longest-shortest-window-based indexing) to aggregate multiple queries over data streams efficiently. More precisely, we employ two arrays based on the longest and shortest windows among all registered queries, and all query results are computed by using the shared partial aggregations in the two arrays using only two operations at most for each query, enabling efficient aggregation computation. We have conducted extensive experiments, and the results show that LSiX can be at least 3 times faster than the comparative methods, including the state-of-the-art method, MCQA.

  • Fei MO, Fei QIAO, Lingyu LIANG
    原稿種別: LETTER
    論文ID: 2024EDL8064
    発行日: 2025年
    [早期公開] 公開日: 2025/02/25
    ジャーナル フリー 早期公開

    Ratings of products serve as a crucial indicator for assessing the impact of products in the retail market. Existing methods in rating estimation of product primarily use single-label machine learning methods, where the prediction may fail to represent the whole properties of products. This paper explores a challenging task to obtain product rating distribution estimation (RDE), which predict the distribution of product ratings instead of a single rating. Specifically, we focus on RDE of follower brands product, which provide relatively objective artifacts and easier to collect data. We formulate the RDE task based on a label distribution learning (LDL) framework, which uses the maximum entropy model functions as the output component of LDL, and generate the probability distribution for each category. However, one of the main challenge of conducting the RDE task within the LDL framework is that the large number of labels leads to an exponentially growing output space, which increases model complexity and reduces its performance. To address this problem, we propose a new model, called RDE-LDL, with an adaptive manifold learning module. The RDE-LDL method use uniform manifold approximation and projection (UMAP) to represent the label distribution manifold via fuzzy simplicial sets, which encodes label correlation information, and allows to regularize the maximum entropy model's output based on label correlation. Quantitative and qualitative experiments conducted on a marketing dataset verified the demonstrates the effectiveness of the RDE-LDL method with the UMAP-based module.

  • Takuya KISHIDA, Toru NAKASHIKA
    原稿種別: PAPER
    論文ID: 2024EDP7206
    発行日: 2025年
    [早期公開] 公開日: 2025/02/25
    ジャーナル フリー 早期公開

    In this paper, we propose a fast and lightweight non-parallel voice conversion method based on minimizing the free energy of a restricted Boltzmann machine (RBM). The proposed method employs an RBM that learns the generative probability of acoustic features conditioned on a target speaker and iteratively updates the input acoustic features until their free energy reaches a local minimum, resulting in converted features. Due to the RBM framework, only a few hyperparameters need to be set, and the number of training parameters is minimal, ensuring stable training. When determining the step size of the update formula using the Newton-Raphson method, we found that the Hessian matrix of the free energy can be approximated by a diagonal matrix. This allows for efficient updates with minimal computational costs. In objective evaluation experiments, the proposed method demonstrated approximately 4.5 times faster conversion speed compared with StarGAN-VC and also outperformed StarGAN-VC in terms of Mel-cepstrum distortion. In subjective evaluation experiments, the performance of the proposed method was comparable to that of StarGAN-VC in similarity mean opinion score.

  • Seiya SATOH
    原稿種別: PAPER
    論文ID: 2024EDP7235
    発行日: 2025年
    [早期公開] 公開日: 2025/02/20
    ジャーナル フリー 早期公開

    Mean Variance Estimation networks are models capable of predicting not only the mean but also the variance of a distribution. A recent study has demonstrated that using separate subnetworks for predicting the mean and variance, and training the subnetwork for predicting the mean first (a process called a warm-up) before training the subnetwork for predicting the variance, is more effective than using a single network. However, that study has only utilized the Adam optimizer for training and has not explored quasi-Newton methods, nor varied the subnetwork structures. In this study, we introduce the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method for training a Mean Variance Estimation network and examines how the selection of subnetwork structures affects performance. We conducted an experiment using a synthetic dataset and 11 experiments using realworld datasets to compare the performance of Adam, BFGS, and three other learning methods, including AdaHessian. Out of the 11 experiments using real-world datasets, BFGS outperformed Adam and AdaHessian in seven cases. This also reveals that BFGS tended to perform better on datasets with a larger number of data points. While underfitting was a problem with learning methods other than BFGS, overfitting was a concern with BFGS when it did not achieve the best performance. This overfitting issue can be mitigated with techniques such as early stopping and regularization. Additionally, BFGS required more hidden units for the subnetwork for predicting the mean than for the subnetwork for predicting the variance, and even 0 hidden units were selected as the optimal number for the subnetwork for predicting the variance. It was also observed that, for the subnetwork for predicting the variance, BFGS tended to select more compact models compared to other methods.

  • Ningning WANG, Qianhang DU, Zijing YUAN, Yu GAO, Rong-Long WANG, Shang ...
    原稿種別: LETTER
    論文ID: 2024EDL8049
    発行日: 2025年
    [早期公開] 公開日: 2025/02/19
    ジャーナル フリー 早期公開

    The diagnosis of meningioma through magnetic resonance imaging (MRI) holds significant importance in clinical medicine. To enhance the accuracy of meningioma diagnosis, a more effective image classification method is required to comprehensively capture subtle features within MRI images. Although deep learning networks have been successfully applied to this problem, the conventional neural networks based on McCulloch-Pitts neurons suffer from low performance and insufficient feature extraction, due to simplistic structure and neglect of the nonlinear effects of synapses. Therefore, we propose a novel dendritic learning-based ResNeXt model, named DResNeXt. It utilizes the residual structure of ResNeXt and the cardinality method to adequately extract features of MRI images. Then, we innovationally introduce a dendritic neural model to improve the nonlinear information processing of biological neurons for comprehensively handling extracted features. Experimental results demonstrate the outstanding performance of the proposed DResNeXt model in the classification task of meningioma MRI dataset, surpassing the ResNeXt model in preventing overfitting. Additionally, compared to other deep learning models, it exhibits higher accuracy and superior image classification performance.

  • Qinghua SUN, Jia CUI, Zhenyu GU
    原稿種別: PAPER
    論文ID: 2024EDP7262
    発行日: 2025年
    [早期公開] 公開日: 2025/02/18
    ジャーナル フリー 早期公開

    Fonts play a crucial role in graphic design, conveying both text and information. However, selecting a proper font can be challenging due to the overwhelming variety and the need for semantic consistency between text and font shapes. While previous research has focused on word-level font retrieval, real-world design tasks often require selecting fonts for text sequences, such as titles or slogans. This study addresses these challenges by: (1) Proposing S2Font, a model using contrastive learning to create a multimodal embedding space for texts and fonts. (2) Developing a retrieval strategy based on font frequency weighting to handle similarity in retrieval results and the Pareto principle of font usage. (3) Introducing S2Font@Topic, a topic-based extension allowing identical text to return different fonts based on the topic. The methods offer several advantages: (1) Aligning sentence-level text input with real design tasks. (2) Leveraging existing text-font pairs from the Internet without manual annotations. (3) Achieving scalability by encoding new font candidates with the trained font encoder. Experiments demonstrated the methods' effectiveness. The top 3 retrieved fonts outperformed baseline models, and S2Font's top choice rivaled those of expert designers. Designers rated S2Font@Topic highly for usefulness (4.67/5) and interest (4.83/5) in design tasks.

  • Hongbin WANG, Kunqiang ZHANG, Yantuan XIAN
    原稿種別: PAPER
    論文ID: 2024EDP7303
    発行日: 2025年
    [早期公開] 公開日: 2025/02/18
    ジャーナル フリー 早期公開

    Stance detection is a key task in natural language processing (NLP) that involves identifying the opinions and attitudes expressed in a text. Cross-target stance detection further extends this task, requiring models to distinguish the stance toward different targets within a text. However, achieving cross-target stance detection remains challenging due to issues such as short and informal text as well as implicit stance expressions. To address this challenge, this paper proposes a multi-level information fusion model for cross-target stance detection. The model first constructs single-target GCN graphs and multi-target GCN graphs, providing each word with a comprehensive semantic framework. Through cross-convolution techniques, the model can obtain weighted information for each word in different contexts, capturing subtle semantic differences of key terms. Then, by utilizing the deep semantic analysis capability of BERT, combined with contrastive learning, the model further refines sentence-level information and enhances its cross-target transferability through adversarial learning. Finally, the overall features are obtained through feature concatenation, enabling effective cross-target stance detection. This approach, which integrates word-level and sentence-level information for cross-target stance detection, not only deeply explores the text's deep semantics and rich contextual information but also precisely captures the subtle semantic differences at the word level. The proposed method demonstrates excellent performance on the SEM16 and WT-WT datasets, with an average F1 score 1.7% higher than the best traditional methods, proving its effectiveness and feasibility.

  • Yoshinori DOBASHI, Syuhei SATO
    原稿種別: LETTER
    論文ID: 2024EDL8078
    発行日: 2025年
    [早期公開] 公開日: 2025/02/17
    ジャーナル フリー 早期公開

    This article presents a method to efficiently synthesize high-resolution 3D smoke by using 2D turbulence transfer on cross-sections of the velocity distribution, converting it into a stream function to preserve mass conservation. The goal is to create realistic, high-quality smoke animations with reduced computational cost and time.

  • Chenchi LIU, Ao ZHAN, Chengyu WU, Zhengqiang WANG
    原稿種別: LETTER
    論文ID: 2024EDL8090
    発行日: 2025年
    [早期公開] 公開日: 2025/02/13
    ジャーナル フリー 早期公開

    Conventional Multipath QUIC (MPQUIC) scheduler struggles in dynamic networks with multiple clients, significantly hindering its potential. In this letter, a Multi-Agent Reinforcement Learning-based MPQUIC scheduler is designed to optimize communication transmission in dynamic networks for the multi-client scenario. The proposed scheduler is implemented on the server side with a Deep Q-Network (DQN) agent for each client, each agent observes the state of all network flows and adjusts scheduling strategies to enhance the Quality of Service (QoS) for dynamic networks. The simulation results demonstrate that the scheduler significantly outperforms existing schedulers by reducing latency and amplifying throughput for all clients, thus adeptly satisfying the QoS requirements of multiple clients.

  • Toshifusa SEKIZAWA, Naoaki YONEZAWA, Kozo OKANO, Keitaro NARUSE
    原稿種別: LETTER
    論文ID: 2025EDL8001
    発行日: 2025年
    [早期公開] 公開日: 2025/02/12
    ジャーナル フリー 早期公開

    This study presents an approach to one-dimensional multi-robot tracking problem using probabilistic model checking. Two of three robots have probabilistic velocity changes that cause unsteady movements and potential collision. The experimental results shows qualitative and quantitative validation indicating applicability of model checking to designing dependable robot control.

  • Masayuki HIRAYABU, Yoshiaki SHIRAISHI
    原稿種別: PAPER
    論文ID: 2024DAK0001
    発行日: 2025年
    [早期公開] 公開日: 2025/02/07
    ジャーナル フリー 早期公開

    Given the finite nature of an organization's security resources, effectively countering all risks can be quite challenging. Threat hunting involves gathering information to make informed decisions about the allocation of security resources. Part of this responsibility for security personnel includes investigating the attack methods made possible by existing vulnerabilities, identifying potential attackers, and understanding their attack strategies. This study aims to support threat hunting efforts, ultimately aiding in the optimal distribution of security resources. To achieve this goal, we propose a system that combines data from NVD (National Vulnerability Database) and MITRE ATT&CK (Adversarial Tactics, Techniques, and Common Knowledge). This system enables us to identify the attack methods that could be executed by exploiting specific vulnerabilities and the potential attackers who may leverage these methods. Through several examples, we have verified that the insights provided by our system align with information available from other sources. By leveraging the proposed system, investigations into attack methods and potential attackers can be conducted more efficiently, requiring fewer steps compared to manual investigations.

  • Shota FUJII, Shohei KAKEI, Masanori HIROTOMO, Makoto TAKITA, Yoshiaki ...
    原稿種別: PAPER
    論文ID: 2024DAK0002
    発行日: 2025年
    [早期公開] 公開日: 2025/02/06
    ジャーナル フリー 早期公開

    Content management systems (CMS) simplify website creation, allowing people without specialized skills, such as designers and corporate public relations departments, to publish their web services. Although the Internet has become more convenient to use, published web services are at risk of various attacks. To realize secure web services, it is essential to incorporate security functions such as user authentication and authorization as well as the detection and blocking of malicious HTTP requests. However, it is difficult to understand and implement appropriate security measures when creating website content. Therefore, this study proposes A+Block, a reverse-proxy-based web security add-on service that provides authentication, authorization, and web application firewall functions for web services. A+Block allows web-service developers to implement these security features by simply pointing to their website uniform resource locators, without the need to modify their websites. By separating the core web-service functionality from security features and offering proxy configuration templates, A+Block simplifies the security implementation for websites and minimizes the configuration burden on web-service operators. We conducted an availability assessment of A+Block and a difficulty assessment of the adoption of WAF, authentication, and authorization in existing web security products. To evaluate the impact of A+Block on web-service availability, we conducted tests on 30 webpages created using the top 30 most frequently used WordPress plugins. Moreover, to evaluate the ease of adoption of A+Block in comparison with existing products, we analyzed the implementation documentation provided by Amazon Web Services (AWS) and Cloudflare. The results confirmed that the solution allows for simple implementation of security functions for web services without compromising their availability.

  • Haoran LUO, Tengfei SHAO, Tomoji KISHI, Shenglei LI
    原稿種別: PAPER
    論文ID: 2024EDP7095
    発行日: 2025年
    [早期公開] 公開日: 2025/01/31
    ジャーナル フリー 早期公開

    Amidst the COVID-19 pandemic, medical protective masks emerged as essential protective gear for the public. This paper aims to construct a nuanced, portable aspect-level sentiment analysis method, designed to unearth insightful information about attitudes toward such masks. The method is built upon three pivotal functional layers: sentiment intensity prediction, classification, and sentiment score calculation, collaboratively revealing consumer sentiments. For predicting sentiment intensity, we employ the Locally Weighted Linear Regression (LWLR) method, enhancing the Chinese VA sentiment lexicon while considering elements like foreign culture and value variations. Additionally, a context-adaptive modifier learning model adjusts word sentiment intensity. Sentiment classification leverages a dynamic XLNet mechanism and utilizes a Bi-LSTM model with stacked residuals for precise results. The sentiment score is astutely calculated by amalgamating sentiment classification and intensity prediction outcomes through the economically-recognized SRC index method. Through a case study using “User Preferences for Mask Attributes” as an example, the method demonstrated exceptional performance across numerous evaluation metrics. Furthermore, a qualitative analysis of the data elucidates the rationale behind varied sentiments concerning medical protective masks and epidemic prevention products.

  • Chee Siang LEOW, Tomoki KITAGAWA, Hideaki YAJIMA, Hiromitsu NISHIZAKI
    原稿種別: PAPER
    論文ID: 2024EDP7201
    発行日: 2025年
    [早期公開] 公開日: 2025/01/31
    ジャーナル フリー 早期公開

    This study introduces data augmentation techniques to enhance training datasets for a Japanese handwritten character classification model, addressing the high cost of collecting extensive handwritten character data. A novel method is proposed to automatically generate a largescale dataset of handwritten characters from a smaller dataset, utilizing a style transformation approach, particularly Adaptive Instance Normalization (AdaIN).Additionally, the study presents an innovative technique to improve character structural information by integrating features from the Contrastive Language-Image Pre-training (CLIP) text encoder. This approach enables the creation of diverse handwritten character images, including Kanji, by merging content and style elements. The effectiveness of our approach is demonstrated by evaluating a handwritten character classification model using an expanded dataset, which includes Japanese hiragana, katakana, and Kanji from the ETL Character Database. The character classification model's macro F1 score improved from 0.9733 with the original dataset to 0.9861 using the augmented dataset by the proposed approach. This result indicated that our proposed character generation model was able to generate new character images that were not included in the original dataset and that they effectively contributed to training the handwritten character classification model.

  • Dengtian YANG, Lan CHEN, Xiaoran HAO
    原稿種別: LETTER
    論文ID: 2024EDL8088
    発行日: 2025年
    [早期公開] 公開日: 2025/01/27
    ジャーナル フリー 早期公開

    Unmanned Aerial Vehicle (UAV) object detection is impeded by the difficulty of accurately identifying small, densely packed targets. Despite the computational and real-time constraints of UAV platforms, point-based detection methods are favored for their efficiency. However, these methods encounter issues with point competitions due to the dense distribution of targets, resulting in low precision and recall of UAV datasets. This study proposes label reassignment (LR) to mitigate the competitions arising from the label assignment process, focusing on intra-group competitions (IGC) and invasion competitions (IVC). By introducing extended points, our approach enhances accuracy of detectors. Label reassignment also overcomes the secondary competitions (SC) after introducing the extended points. Experimental results demonstrate the effectiveness of our strategy in reducing competitions and improving model accuracy.

  • Rong HUANG, Yue XIE
    原稿種別: LETTER
    論文ID: 2024EDL8084
    発行日: 2025年
    [早期公開] 公開日: 2025/01/23
    ジャーナル フリー 早期公開

    Recent studies in deep learning have shown great advantages in acoustic echo cancellation (AEC) due to its strong capability of non-linear fitting; however, most AEC models are based on the convolution recurrent network (CRN) architecture, using stacked convolution layers as the encoder to extract latent representations, without considering the misalignment between reference and echo signal. Furthermore, the masking-based filtering method disregards the inter-spectral correlation patterns and harmonic characteristics. In this paper, we propose an AEC approach called the multi-scale dual path convolution recurrent network with deep filtering block (DPDF-AEC). We propose a multi-scale encoder to capture complex patterns and time dependencies between the reference and microphone signal. After the masking method, a post-deep filtering block is introduced, incorporating spectrum patterns to further reduce residual echo. We conduct comprehensive ablation experiments to validate the effectiveness of each component in DPDF, and the results indicate that our model outperforms the AEC challenges the baseline in terms of the Echo-MOS metrics.

  • Toshiki ONISHI, Asahi OGUSHI, Ryo ISHII, Akihiro MIYATA
    原稿種別: PAPER
    論文ID: 2024HCP0008
    発行日: 2025年
    [早期公開] 公開日: 2025/01/23
    ジャーナル フリー 早期公開

    Praising behavior is an important part of human communication. However, people who are unfamiliar with often praising have difficulty improving their praising skills. To solve this problem, we aim to construct a system for evaluating praising skills. So far, we have attempted to predict the degree of praising skills from verbal and nonverbal behaviors. However, our previous studies were focused on scenes in which the praiser was actually praising, and we have not dealt with scenes in which the praiser was not praising. In this paper, we attempt to detect whether the praiser is actually praising the receiver by including scenes in the study in which the praiser is not praising the receiver. First, we extract features related to the verbal and nonverbal behaviors of the praiser and receiver. Second, we construct machine learning models that utilize these features to detect whether or not the praiser is actually praising the receiver. Our results show that the machine learning model utilizing the acoustic and embedding-based linguistic behaviors of the praiser and the visual and acoustic behaviors of the receiver has the best detection performance.

  • Meihua XUE, Kazuki SUGITA, Koichi OTA, Wen GU, Shinobu HASEGAWA
    原稿種別: PAPER
    論文ID: 2024IIP0008
    発行日: 2025年
    [早期公開] 公開日: 2025/01/23
    ジャーナル フリー 早期公開

    This research proposes a system to support Japanese vocabulary learning for L2 learners of Japanese by integrating object recognition technology and a thesaurus database. Vocabulary learning is the foundation of L2 learning, but traditional translation-based learning is still the mainstream. The proposed method is based on the hypothesis of the effectiveness of associating visuals and synonyms in vocabulary learning. The system combines YOLOv7 and WordNet Japanese, called PICSU (PICture-based Synonyms Understanding), to provide a unique and context-rich vocabulary learning experience. Preliminary experiment results from international graduate students as participants implied improved retention and engagement compared to traditional flashcard-based learning. This article outlines the proposed approach and highlights the potential for integrating intelligent information processing technology with vocabulary learning practice.

  • Jinyong SUN, Zhiwei DONG, Zhigang SUN, Guoyong CAI, Xiang ZHAO
    原稿種別: PAPER
    論文ID: 2024EDP7086
    発行日: 2025年
    [早期公開] 公開日: 2025/01/20
    ジャーナル フリー 早期公開

    Graph classification has gained significant attention in recent years due to its wide applications in many domains such as cheminformatics, bioinformatics and social networks. Graph neural networks have been proved to be an effective solution for graph classification because of their powerful ability of learning graph node features. However, existing spatial graph convolutional neural networks for node-labeled graph classification utilize one-hot encoding or graph kernel methods to initialize node features, leading to their inability to capture semantic dependencies among graph nodes, with the result of a decrease in graph classification accuracy. In this paper, we propose a Node Semantic-based Spatial Graph ConvolutionalNetwork (NSSGCN) for graph classification which integrates multi-scale node semantic into graph neural network with word embedding. Specifically, we construct multiple corpora of different granularity for a graph dataset, and then leverage the PV-DBOW model to extract multiscale node semantic information from built corpora. Then, we normalize non-Euclidean graph data into 3D tensor data by node ordering and receptive field constructing, during which we propose a node importance measurement considering both node semantic and topology. After that, we design a channel attention based spatial graph convolutional neural network to effectively learn graph feature vectors from these 3D tensor data. Finally, we apply a Dense layer followed by a softmax layer to the learned graph feature vectors to classify graphs. Experimental results show that our proposed method achieves superior graph classification accuracy compared with classical graph kernel methods and state-of-the-art spatial graph neural networks on six benchmark graph datasets. On average, our method achieves a remarkable accuracy improvement of 4.12% in graph classification.

  • Yusuke HIROTA, Yuta NAKASHIMA, Noa GARCIA
    原稿種別: PAPER
    論文ID: 2024EDP7116
    発行日: 2025年
    [早期公開] 公開日: 2025/01/20
    ジャーナル フリー 早期公開

    We study societal bias amplification in image captioning. Image captioning models have been shown to perpetuate gender and racial biases, however, metrics to measure, quantify, and evaluate the societal bias in captions are not yet standardized. We provide a comprehensive study on the strengths and limitations of each metric, and propose LIC, a metric to study captioning bias amplification. We argue that, for image captioning, it is not enough to focus on the correct prediction of the protected attribute, and the whole context should be taken into account. We conduct extensive evaluation on traditional and state-of-the-art image captioning models, and surprisingly find that, by only focusing on the protected attribute prediction, bias mitigation models are unexpectedly amplifying bias.

  • Yusuke HIROTA, Yuta NAKASHIMA, Noa GARCIA
    原稿種別: PAPER
    論文ID: 2024EDP7164
    発行日: 2025年
    [早期公開] 公開日: 2025/01/20
    ジャーナル フリー 早期公開

    Image captioning models are known to perpetuate and amplify harmful societal bias in the training set. In this work, we aim to mitigate such gender bias in image captioning models. While prior work has addressed this problem by forcing models to focus on people to reduce gender misclassification, it conversely generates gender-stereotypical words at the expense of predicting the correct gender. From this observation, we hypothesize that there are two types of gender bias affecting image captioning models: 1) bias that exploits context to predict gender, and 2) bias in the probability of generating certain (often stereotypical) words because of gender. To mitigate both types of gender biases, we propose a framework, called LIBRA, that learns from synthetically biased samples to decrease both types of biases, correcting gender misclassification and changing gender-stereotypical words to more neutral ones.

  • Kosetsu TSUKUDA, Tomoyasu NAKANO, Masahiro HAMASAKI, Masataka GOTO
    原稿種別: PAPER
    論文ID: 2024EDP7136
    発行日: 2025年
    [早期公開] 公開日: 2025/01/17
    ジャーナル フリー 早期公開

    When a user listens to a song for the first time, what musical factors (e.g., melody, tempo, and lyrics) influence the user's decision to like or dislike the song? An answer to this question would enable researchers to more deeply understand how people interact with music. Thus, in this paper, we report the results of an online survey involving 302 participants to investigate the influence of 10 musical factors. We also evaluate howa user's personal characteristics (i.e., personality traits and musical sophistication) relate to the importance of each factor for the user. Moreover, we propose and evaluate three factor-based functions that would enable more effectively browsing songs on a music streaming service. The user survey results provide several reusable insights, including the following: (1) for most participants, the melody and singing voice are considered important factors in judging whether they like a song on first listen; (2) personal characteristics do influence the important factors (e.g., participants who have high openness and are sensitive to beat deviations emphasize melody); and (3) the proposed functions each have a certain level of demand because they enable users to easily find music that fits their tastes. We have released part of the survey results as publicly available data so that other researchers can reproduce the results and analyze the data from their own viewpoints.

  • ZhengYu LU, PengFei XU
    原稿種別: PAPER
    論文ID: 2024IIP0011
    発行日: 2025年
    [早期公開] 公開日: 2025/01/17
    ジャーナル フリー 早期公開

    Hail, recognized as a severe convective weather phenomenon, carries significant destructive. Accurate identification is crucial to minimize economic damages and safeguard lives. The primary challenges in detecting hail include the scarcity of valid hail samples and the imbalance of these samples in high-resolution datasets. In response, this paper introduces the HAM Unet model, an hail identification framework that leverages multisource data and environmental factors. The model combines the FEM-Unet semantic segmentation architecture data fusion techniques. By integrating radar reflectivity, FY-4B satellite imagery, ERA5 climatic parameters, and topographical data, HAM-Unet improves both its precision and resilience. Extensive training and validation have equipped HAM-Unet with good capabilities, achieving remarkable scores in Probability of Detection (POD), False Alarm Rate (FAR), and the Critical Success Index (CSI). The model not only show potential in improving the accuracy and reliability of hail identification but also provides innovative ideas and methods for improvement of hail monitoring and warning Systems.

  • Binggang ZHUO, Ryota HONDA, Masaki MURATA
    原稿種別: PAPER
    論文ID: 2024EDP7109
    発行日: 2025年
    [早期公開] 公開日: 2025/01/16
    ジャーナル フリー 早期公開

    Transformer is a significant achievement in the natural language processing field. By introducing a denoising autoencoding pretraining task and performing pretraining on a massive amount of text data, transformer models can achieve excellent results on a wide range of downstream natural language understanding tasks. This study focuses on the Japanese document emphasis task, and we propose a simple and effective method to enhance the performance of transformer models on the target task by utilizing title information. Experimental results demonstrate that the proposed model achieves an average F1-score of 0.437, which represents an improvement of 0.038 over the best-performing baseline (F1-score: 0.399) and 0.124 compared to a method based on conditional random fields (F1-score: 0.313). The results of the two-sided Wilcoxon signed-rank test highlight the statistical significance of the proposed model relative to the compared baseline models. An extensive set of additional investigations were conducted to highlight the importance of title information on the automatic Japanese document emphasis task. In addition, to further validate the effectiveness of the proposed methodology, experiments were conducted on the BBC News Summary, an English extractive summarization dataset. The results demonstrated that the proposed method, BERTSUM + All, significantly improved the performance compared to the primary baseline BERTSUM (from a ROUGE-1 score of 0.708 to 0.933).

  • Qingqing YU, Rong JIN
    原稿種別: PAPER
    論文ID: 2024EDP7254
    発行日: 2025年
    [早期公開] 公開日: 2025/01/15
    ジャーナル フリー 早期公開

    This paper presents an improved Quantum Approximate Optimization Algorithm variant based on Conditional Value-at-Risk for addressing portfolio optimization problems. Portfolio optimization is a NP-hard combinatorial problem that aims to select an optimal set of assets and their quantities to balance risk against expected return. The proposed approach uses the QAOA to find the optimal asset combination that maximizes returns while minimizing risk, with a focus on the tail end of the loss distribution. An enhanced QAOA ansatz introduced that offers a balance between optimization quality and circuit depth, leading to faster convergence and higher probabilities of obtaining optimal solutions. Experiments are conducted using historical stock data from Nasdaq, optimizing portfolios with varying numbers of stocks. Our method outperforms original QAOA and CVaR-QAOA, particularly as the size of the problem increases. Regardless of the scenario, whether it involves 10, 12, 14, or 16 stocks, the improved CVaR-QAOA consistently converges within 100 iterations or less, whereas the standard QAOA consistently requires 450 iterations or more.

  • Huawei TAO, Ziyi HU, Sixian LI, Chunhua ZHU, Peng LI, Yue XIE
    原稿種別: LETTER
    論文ID: 2024EDL8083
    発行日: 2025年
    [早期公開] 公開日: 2025/01/10
    ジャーナル フリー 早期公開

    Speech Emotion Recognition (SER) plays a pivotal role in human-computer interaction, yet its performance is often hindered by the nonlinear entanglement of emotional and speaker features. This paper proposes an interpretable multi-level feature disentanglement algorithm for speech emotion recognition, aiming to effectively separate emotion features from individual speech. The algorithm first constructs a novel hybrid auto-encoder network that can separate static and dynamic emotional features from the features extracted by the self-supervised network emotion2vec, thereby obtaining multi-level and time-varying emotional feature representations. Additionally, we implement a multi-layer perceptual classifier based on Kolmogorov-Arnold Networks (KAN), which is adept at capturing complex nonlinear relationships in the data and further promote feature disentanglement. Experiments results on the IEMOCAP database show that our proposed algorithm achieves a WA value of 73.2%, surpassing the current state-of-the-art.

  • Qianhang DU, Zhipeng LIU, Yaotong SONG, Ningning WANG, Zeyuan JU, Shan ...
    原稿種別: PAPER
    論文ID: 2024EDP7059
    発行日: 2025年
    [早期公開] 公開日: 2025/01/10
    ジャーナル フリー 早期公開

    ShuffleNetV2 is a lightweight deep learning model architecture designed to achieve efficient neural network performance in resource-constrained environments. Through channel shuffle and units of ShuffleNetV2, the model promotes effective information exchange between different channels, thereby enhancing feature representation and computational efficiency. However, due to its lightweight architecture, further improvements are needed in terms of accuracy, stability, and generalization ability in classification tasks. Dendritic neurons are basic neurons in the nervous system with multiple dendrites responsible for receiving input signals from other neurons. Inspired by the information processing capacity of dendritic neurons, researchers have proposed a new dendritic neuron model and applied it to various traditional deep learning models, achieving outstanding performance in different tasks. Motivated by this, this paper proposes Dendritic ShuffleNetV2 (DShuffleNetV2), which effectively combines the efficient feature extraction characteristics of ShuffleNetV2 with dendritic neuron features, thereby improving the classification performance in medical image classification tasks. To evaluate the performance of this model, image classification experiments are conducted on three different types of medical image datasets. The experimental results demonstrate that, by leveraging the nonlinear features of dendrites and synapses, DShuffleNetV2 significantly outperforms other comparison models in terms of accuracy, precision, recall, and F1 score.

  • Ryota TOMODA, Hisashi KOGA
    原稿種別: PAPER
    論文ID: 2024DAP0004
    発行日: 2025年
    [早期公開] 公開日: 2025/01/09
    ジャーナル フリー 早期公開

    Dynamic Time Warping (DTW) is a well-known similarity measure between time series data. Although DTW can calculate the similarity between time series with different lengths, it is computationally expensive. Therefore, fast algorithms that approximate the DTW have been desired. SSH (Sketch, Shingle & Hash) is a representative hash-based approximation algorithm. It extracts a set of quantized subsequences from a given time series and finds similar time series by means of Min-Hash, a hash-based set similarity search. However, Min-Hash does not care about the location of set elements (i.e., quantized subsequences) in the time series, so that hash collisions have rather weak correlation with DTW. In this paper, to strengthen the correlation between hash collisions and DTW, we propose a new method termed Section Min-Hash that can couple the hash values with the positions of quantized subsequences. After quantizing subsequences in a time series based on Euclidean distance, Section Min-Hash explicitly specifies multiple sections within the time series and generates the hash values from all the sections.

  • Reina SASAKI, Atsuko TAKEFUSA, Hidemoto NAKADA, Masato OGUCHI
    原稿種別: PAPER
    論文ID: 2024DAP0005
    発行日: 2025年
    [早期公開] 公開日: 2025/01/09
    ジャーナル フリー 早期公開

    The data collected by Internet of Things (IoT) devices equipped with sensors enable smart home services such as monitoring of elderly, pets, and the indoor environment. Building an IoT system to collect data from individual households in the cloud requires measures to reduce communication latency and the amount of data transferred and protect privacy. Installing sensors in multiple indoor locations is necessary when collecting diverse data in an indoor environment. However, installing numerous sensors increases costs and makes it challenging to relocate them and obtain the necessary information. In this study, we indicate the effectiveness of an IoT system using a wheeled mobile robot implemented in a Robot Operating System (ROS). We attempt to demonstrate the effectiveness of sensor data collection using a robot by developing a prototype system that collects indoor environment information and performs analysis processing in a cloud via an edge server for a monitoring application of indoor carbon dioxide concentration. We also investigate the performance characteristics of ROS and ROS 2 communication between the sensor robot and the edge server and IoT communication between the edge server and the cloud server to identify technical issues in a smart home.

  • So KOIDE, Yoshiaki TAKATA, Hiroyuki SEKI
    原稿種別: LETTER
    論文ID: 2024EDL8081
    発行日: 2025年
    [早期公開] 公開日: 2025/01/09
    ジャーナル フリー 早期公開

    We study the decidability and complexity of non-cooperative rational synthesis problem (abbreviated as NCRSP) for some classes of probabilistic strategies. We show that NCRSP for stationary strategies and Muller objectives is in 3-EXPTIME, and if we restrict the strategies of environment players to be positional, NCRSP becomes NEXPSPACE solvable. On the other hand, NCRSP>, which is a variant of NCRSP, is shown to be undecidable even for pure finite-state strategies and terminal reachability objectives. Finally, we show that NCRSP becomes EXPTIME solvable if we restrict the memory of a strategy to be the most recently visited t vertices where t is linear in the size of the game.

feedback
Top