IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
早期公開論文
早期公開論文の87件中1~50を表示しています
  • Yizhe LI, Zhenyu LU, Zhongfeng CHEN, Zhuang LI
    原稿種別: PAPER
    論文ID: 2025EDP7042
    発行日: 2025年
    [早期公開] 公開日: 2025/08/01
    ジャーナル フリー 早期公開

    Precipitation is a crucial component of the natural water cycle, and inadequate timeliness and precision in precipitation prediction can result in agricultural losses, traffic disruptions, flood catastrophes, and even threats to human life. Consequently, precipitation prediction is a key problem in the domain of meteorology. However, the current methodologies pay close attention to the explicit spatial connections of precipitation regions while neglecting the implicit spatial connections over time. There are often challenging for traditional convolutional neural networks and graph neural networks to capture, leading to inaccurate spatial regions and poor timeliness of model predictions. To resolve this problem, we propose a Dynamic spatial-temporal graph prediction model for short-term precipitation (Dst-pred), which dynamically explores implicit connections among meteorological stations in the target region through graph neural networks and constructs dynamic spatial-temporal graphs to predict precipitation in the region. We have verified our Dst-pred model on our proprietary precipitation dataset from Guangxi Province, China, and the ERA5-Land dataset, and it can extract the implicit spatial connections between individual stations from the precipitation data of meteorological stations. The precipitation process capture of our model enhances the timeliness and accuracy of nowcasting precipitation prediction with the best performance.

  • Yanchen LI, Fumihiko INO
    原稿種別: PAPER
    論文ID: 2024EDP7220
    発行日: 2025年
    [早期公開] 公開日: 2025/07/23
    ジャーナル フリー 早期公開

    Deep neural network (DNN) pruning is a popular method for accelerating computations in DNNs by removing unimportant parameters. Among pruning methods, tile-wise pruning (TWP) achieves significant acceleration with minimal pruning loss. However, TWP suffers from load imbalance when important weight elements in the matrices of the DNN are unevenly distributed. To address this issue, we propose adaptive tile pruning (ATP), an integrative solver for building sparse DNNs with controllably balanced workloads. ATP comprises three components: hierarchical tile pruning (HTP), split-tiled sparse matrix multiplication (STSpMM), and adaptive pattern selection (APS). HTP constructs sparse matrices with evenly distributable workloads while preserving DNN model accuracy. STSpMM efficiently handles HTP-generated sparse matrices on GPUs by splitting and redistributing large workloads. APS dynamically selects pruning patterns for HTP and grid sizes for STSpMM based on the problem sizes in the targeted DNN. We evaluated our approach on pruned ResNet-18 and ResNet-34 models using ImageNet, and BERT-Small on the question-answering natural language inference (QNLI) task. Results demonstrate that models accelerated by ATP achieve greater acceleration than previous methods while maintaining accuracy for inference.

  • Yang XU, Yueyi ZHANG, Hanting ZHOU
    原稿種別: PAPER
    論文ID: 2025EDP7030
    発行日: 2025年
    [早期公開] 公開日: 2025/07/23
    ジャーナル フリー 早期公開

    Owing to the inherent sparsity of the user-item interaction matrix, the majority of existing collaborative filtering-based recommendation algorithms predominantly focus on the explicit interactions between users and items, thereby neglecting the complex interdependencies among items and users. This oversight results in a suboptimal representation of user and item characteristics, ultimately leading to a diminished quality of recommendations. To address this limitation, we proposed a novel recommendation algorithm, the Dual Co-occurrence Convolutional Neural Network (DCoCNN). DCoCNN innovatively integrates three pivotal components: user-item interactions, user-user co-occurrences, and item co-occurrences, leveraging the powerful feature extraction capabilities of CNN to train and refine latent features. Since items or users often emerge in pairs, DCoCNN thoroughly explores the intrinsic relationships among items or users, compensates for the lack of item-user interaction behaviors, and enables the trained latent features to contain more effective co-occurrence information, thereby enhancing model performance. The experimental results show that DCoCNN can effectively capture effective information between items or users, effectively mitigate the deficiencies with non-co-occurrence and single co-occurrence models, and improve recommendation quality.

  • Taehoon KIM, Jaechun NO, Sehoon KWON, Sungsoon PARK
    原稿種別: PAPER
    論文ID: 2024EDP7188
    発行日: 2025年
    [早期公開] 公開日: 2025/07/17
    ジャーナル フリー 早期公開

    The docker container-based virtualization is becoming mainstream in cloud computing due to its potential benefits, such as a lightweight resource footprint and quick deployment. However, docker containers can suffer from a lack of stability in provisioning per-container storage space, which can lead to undesirable consequences, such as abrupt application termination requiring execution restart or substantial data loss. In this paper, we propose an I/O storage scheme (mSEM) to enhance the storage reliability and stability of docker containers by dynamically enlarging the data reservoir for each container through the effective data path redirection to our extensible storage space. We measure the performance of our method, while comparing its performance to the baseline docker and kubernetes using three benchmarks: filebench, postmark and vdbench. The results show that our method produces three times higher I/O bandwidth than both baselines under storage shortage conditions. More importantly, even when there is no available storage space left and the baseline stops execution, our method can continue application execution with no severe performance degradation.

  • Takahiro KASAMA, Ryoichi ISAWA, Ryo KAMINO, Yuichi HAGIWARA
    原稿種別: PAPER
    論文ID: 2024ICP0007
    発行日: 2025年
    [早期公開] 公開日: 2025/07/14
    ジャーナル フリー 早期公開

    With the increasing prevalence of mobile devices, wireless LANs, which allow network access without physical connections, have become widely used. Wi-Fi is particularly prevalent among wireless LANs, with a household Wi-Fi router adoption rate of approximately 89% and over 90% adoption in hospitals and schools in Japan. While Wi-Fi routers offer security features such as encryption and authentication, improperly configured or managed routers pose risks of eavesdropping and misuse by malicious actors. Previous studies have highlighted the risks of using vulnerable encryption protocols, such as WEP, and free public Wi-Fi services. However, the risks associated with default SSIDs and passwords on Wi-Fi routers remain largely unexplored. This study investigated the guessability of default Wi-Fi passwords across 44 consumer-grade Wi-Fi routers from 11 vendors commonly distributed in Japan. Our findings revealed that in 30 models from six vendors, default Wi-Fi passwords were generated using specific algorithms, making them vulnerable to being guessed by malicious actors. Based on the findings, we summarize the common pitfalls that product vendors often encounter when generating default Wi-Fi passwords. Additionally, we conducted a field survey across five locations in Tokyo, Japan to assess the prevalence and risk of Wi-Fi routers still operating with default settings.

  • Hibiki NAKANISHI, Kento HASEGAWA, Seira HIDANO, Kazuhide FUKUSHIMA, Ka ...
    原稿種別: PAPER
    論文ID: 2024EDP7305
    発行日: 2025年
    [早期公開] 公開日: 2025/07/11
    ジャーナル フリー 早期公開

    In recent years, security measures for IoT devices have become more important. Fuzzing of IoT devices is an effective way to find unknown vulnerabilities. In IoT device fuzzing, a large number of test cases are generated based on a set of initial seeds and they are sent to a target device to monitor its behavior, in which a device crash means that a vulnerability has been discovered. However, generating a set of initial seeds is difficult because technical knowledge in security and adaptation to various IoT devices are quite required. In this paper, we propose a method to generate initial seeds for IoT device fuzzing effectively utilizing a large language model (LLM). The proposed method efficiently generates initial seeds for fuzzing the target IoT device by inputting only the type of IoT device, communication logs, and the name of the vulnerability to be inspected into an LLM, with no specific technical knowledge in security. Experimental results of applying the proposed method to two types of IoT devices show that the proposed method detected the first crash in 0.40 seconds and 0.47 seconds from the start of fuzzing, respectively, and after 24 hours of fuzzing, it detected all the crashes due to null pointer exception and buffer overflow that could not be detected by fuzzing with the initial seeds generated manually.

  • Masanori HIROTOMO, Atsushi MARUI, Yoshiaki SHIRAISHI
    原稿種別: PAPER
    論文ID: 2024OFP0010
    発行日: 2025年
    [早期公開] 公開日: 2025/07/11
    ジャーナル フリー 早期公開

    Many researchers have proposed several variants of visual secret sharing scheme (VSSS). In these schemes, the secret image can be recovered by only stacking share images. In this paper, we propose new VSSS embeded a decoding condition, which is called an adaptively decodable VSSS on background color. In the proposed scheme, the secret image cannot be visually recovered when the share images are stacked on a white background, and the secret image can be recovered when the shares are stacked on a black background. Furthermore, we propose a systematic method to construct (k, n)-threshold adaptively decodable VSSS for any integer k, n (kn).

  • Cong GUAN, Yuya IEIRI, Osamu YOSHIE
    原稿種別: PAPER
    論文ID: 2024EDP7318
    発行日: 2025年
    [早期公開] 公開日: 2025/07/07
    ジャーナル フリー 早期公開

    Precisely detecting obstacles on the track is critical to the safety of railway transportation. However, existing track obstacle detection methods suffer from issues of low accuracy, slow speed, and high complexity, which are not qualified for real-time demand and low-resource constraints. This paper proposes a novel Railway Obstacle Detection (ROD) method named ROD-YOLO, striking a good trade-off between performance and efficiency. Firstly, we design a multi-scale Feature Enhancement Module (FEM), utilizing convolutions with different dilation rates to extract fine-grained features from different layers. Secondly, to improve detection speed, we propose the SPPCSPC-F spatial pyramid pooling module, which reduces the number of convolution units, the size of pooling operations and the dimensions of feature concatenation. Additionally, we incorporate the Large Selective Kernel (LSK) Attention to filter out interfering information and focus on important local features. Comprehensive experiments are conducted on a real-world dataset consisting of 12,270 images, aiming to verify the feasibility of object detection methods in complex railway environments. Results show that ROD-YOLO outperforms state-of-the-art one-stage and two-stage object detection methods, achieving 96.3% in precision, 91.4% in recall, and 96.6% of mAP at 0.5 IoU threshold. Compared to the most light-weight baseline (YOLOv8n), our method improves the mAP50 and inference speed by 7.93% and 72.42%, respectively, with only 36.19% growth in parameter size. Moreover, ROD-YOLO shows strong generalization ability on four cross-domain datasets, including a remote sensing image dataset and a traffic sign dataset. In conclusion, the proposed ROD-YOLO algorithm demonstrates remarkable performance in detecting track obstacles, provides valuable practice for deployment of object detection models in resource-constrained and security-crucial systems.

  • Shu CHEN, Yingyi SUI, Qisheng PAN, Yiran WANG, Fei WU
    原稿種別: LETTER
    論文ID: 2025EDL8002
    発行日: 2025年
    [早期公開] 公開日: 2025/07/02
    ジャーナル フリー 早期公開

    With the development of society, people get news more and more frequently from online media. Under such circumstances, fake news has become a major social problem. Most of the existing fake news detection works focus on the extraction of identification information. However, howto deal with domain shift problem is still a challenge. In this paper, we propose an approach called Joint Domain-specific and Domain-shared Learning (JDDL) for multi-domain fake news detection. It mainly consists of three modules: (1) The multi-domain feature extraction module, which extracts domain-specific features and domain-shared features, respectively; (2) The feature fusion module, which employs Graph Attention Network (GAT) to further extract features, and then fuses the output features; (3) The domain adversarial discrimination module, which designs the domain discrimination loss to confuse classifier and make it be unable to distinguish which domain the news belongs to. Experiments on English dataset show that the JDDL outperforms state-of-the-art methods.

  • Hitoshi NISHIMURA, Haruhisa KATO, Kei KAWAMURA
    原稿種別: PAPER
    論文ID: 2024EDP7256
    発行日: 2025年
    [早期公開] 公開日: 2025/07/01
    ジャーナル フリー 早期公開

    Dynamic meshes reasonably represent time-varying 3D objects, but compression is required due to the large amount of data involved. One efficient framework decomposes a dynamic mesh into a base mesh and displacements using decimation and subdivision. The displacements are converted to levels by wavelet transforms and quantization, and they are coded by arithmetic coding. The levels of the current frame are predicted from the reference frame, and only the residuals are coded. However, the residual tends to be large since the coefficients of each frame are quantized before performing inter prediction. In this paper, we propose a method of quantizing the residuals obtained after applying inter prediction in order to reduce the amount of required data. The experimental results show that the proposed method improves coding efficiency (BD-Rate: -0.3 %) and that the reconstructed mesh has no quality degradations.

  • Yoon Hak KIM
    原稿種別: PAPER
    論文ID: 2025EDP7024
    発行日: 2025年
    [早期公開] 公開日: 2025/07/01
    ジャーナル フリー 早期公開

    For linear least squares estimation of parameters in wireless sensor networks, we focus on construction of the best subset of sensors that minimizes the estimation error which requires computation of the inverse of large matrices. We manipulate the estimation error based on the LU factorization, resulting in the factored triangular matrices, the inverse of which can be iteratively obtained without large matrix inversion. We then derive an analytic selection rule in a greedy manner which facilitates a fast selection process. We also discuss the complexity of different selection methods with an emphasis on a reasonable complexity of the proposed method. We finally validate the merit of the proposed algorithm through numerical experiments in the aspect of estimation performance and complexity as compared with previous methods.

  • Maoke ZHOU, Xiaoke QI, Wei BAO, Xiaobing ZHAO
    原稿種別: PAPER
    論文ID: 2025EDP7098
    発行日: 2025年
    [早期公開] 公開日: 2025/07/01
    ジャーナル フリー 早期公開

    Tibetan text recognition plays a key role in preserving the Tibetan language, religion, and traditions. While text recognition has made progress for high-resource languages, handwritten Tibetan character recognition remains difficult due to limited data and the lack of public large language models. Most existing datasets focus on printed or historical documents, as well as online handwriting data, but there are still few large offline handwritten Tibetan datasets. To solve this problem, we construct TibHCR, a large-scale offline handwritten character recognition dataset for the Tibetan language. To increase the diversity of the linguistic and font styles, more character categories and participants from 5 provinces in China are included. To collect and label the data efficiently, we introduce a grid sheet design, reducing manual annotation to just 1% of the samples. This design then allows for automatic data processing to extract each character sample and its corresponding label. The resulting TibHCR dataset contains 141,698 samples from 235 Tibetan writers, covering 47 character classes. We evaluate TibHCR using two recognition models: a convolutional recurrent neural network (CRNN) and a cross-lingual fine-tuning method, on a Chinese pretrained model using the PP-OCRv4 architecture to adapt Tibetan data. The results show that both models can recognize handwritten Tibetan characters efficiently, with an accuracy of 99.48% for CRNN and 99.70% for the fine-tuning method. The TibHCR dataset is publicly available at https://huggingface.co/datasets/qixiaoke/TibHCR.

  • Toshihiro SHIMIZU, Yasuhiro WATANABE
    原稿種別: PAPER
    論文ID: 2025PAP0005
    発行日: 2025年
    [早期公開] 公開日: 2025/07/01
    ジャーナル フリー 早期公開

    The coarse-grained reconfigurable architecture (CGRA) has been attracting significant attention as an energy-efficient accelerator. Recently, many applications require significant computational power, and CGRAs are expected to meet this demand. In such fields, CGRAs are utilized to execute computationally intensive programs in the innermost loop body, often called a “kernel”. They generally consist of a two-dimensional array of processing elements (PEs) interconnected in a configurable manner, and the data transfer between PEs is configured accordingly. Running a kernel on a CGRA requires a mapping process that generates CGRA configurations to match the kernel program. This mapping is time-consuming and can hinder developer productivity. We therefore propose a fast mapping method that leverages the architecture's characteristics, namely, its routing capabilities, to reduce mapping time. We define a heuristic cost function for routing that guides the mapper toward better mapping results. We demonstrate that our mapper is fast enough for practical software development and can provide sufficiently robust mapping results.

  • Qianying ZHANG, Dongxu JI, Shijun ZHAO, Zhiping SHI, Yong GUAN
    原稿種別: PAPER
    論文ID: 2024ICP0004
    発行日: 2025年
    [早期公開] 公開日: 2025/06/26
    ジャーナル フリー 早期公開

    ARM TrustZone technology is widely used to provide Trusted Execution Environments (TEEs) for sensitive applications. However, most TEE OSes are implemented as monolithic kernels. In such designs, all components run in the kernel which will lead to a big trusted computing base (TCB). It is difficult to guarantee that all components of the kernel have no security vulnerabilities. The functions of trusted computing, such as integrity measurement and data sealing, will provide further security guarantees. This paper presents MicroTEE, a TEE OS with rich trusted computing primitives based on the microkernel architecture. In MicroTEE, the microkernel provides strong isolation for services and applications. The kernel is only responsible for providing core services such as address space management, thread management, and inter-process communication. Other fundamental services, such as trusted service, are implemented as applications at the user layer. Trusted computing primitives provide some security features for trusted applications (TAs), including integrity measurement, data sealing, and remote attestation. Our design avoids the compromise of the whole TEE OS if some kernel service is vulnerable. A monitor has also been added to perform the switch between the secure world and the normal world. Finally, we implemented a MicroTEE prototype on the Freescale i.MX6Q Sabre Lite development board and tested its performance. Evaluation results show that MicroTEE only introduces some necessary and acceptable overhead.

  • Nhu NGUYEN, Hideaki TAKEDA
    原稿種別: PAPER
    論文ID: 2024EDP7258
    発行日: 2025年
    [早期公開] 公開日: 2025/06/24
    ジャーナル フリー 早期公開

    Wikipedia stands out as a globally utilized linguistic resource available in over 330 languages, attracting contributions from a diverse group of editors on a global scale. Despite its widespread use, significant disparities persist among language publications, including variations in the number of articles, the spectrum of topics covered, and even the number of contributing community editors. In this paper, we aim to alleviate this gap in the coverage of low-resource languages. Although previous work has focused on multilingual interoperability efforts, the potential of hyperlinks has not been fully realized. Therefore, this study introduces a novel approach focused on hyperlinks, specifically emphasizing hyperlink types derived from Wikidata. We extract and analyze patterns related to these hyperlink types across different languages, using them as recommended solutions to connect the topics of various languages, particularly low-resource languages. Collaborative filtering experiments suggest that using combined languages leads to good overall results while preserving the uniqueness of each language.

  • Yikang WANG, Xingming WANG, Chee Siang LEOW, Qishan ZHANG, Ming LI, Hi ...
    原稿種別: PAPER
    論文ID: 2025EDP7044
    発行日: 2025年
    [早期公開] 公開日: 2025/06/24
    ジャーナル フリー 早期公開

    Currently, research in deepfake speech detection focuses on the generalization of detection systems towards different spoofing methods, mainly for noise-free clean speech. However, the performance of speech anti-spoofing countermeasure (CM) systems often does not work well in more complicated scenarios, such as those involving noise and reverberation. To address the problem of enhancing the robustness of CM systems, we propose a transfer learning-based hybrid approach with Speech Enhancement front-end and Counter Measure back-end Joint optimization (SECM-Joint), investigating its effectiveness in improving robustness against noise and reverberation. Experimental results show that our SECM-Joint method reduces EER by 19.11% to 64.05% relatively in most noisy conditions and 23.23% to 30.67% relatively in reverberant environments compared to a Conformer-based CM baseline system without pre-training. Additionally, our dual-path U-Net (DUMENet) further enhances the robustness for real-world applications. These results demonstrate that the proposed method effectively enhances the robustness of CM systems in noisy and reverberant conditions. Codes and experimental data supporting this work are publicly available at: https://github.com/ikou-austin/SECM-Joint

  • Reo UENO, Akihiro FUJIWARA
    原稿種別: LETTER
    論文ID: 2025PAL0001
    発行日: 2025年
    [早期公開] 公開日: 2025/06/24
    ジャーナル フリー 早期公開

    In the membrane computing, most of the proposed algorithms for computationally hard problems use an exponential number of membranes, and reduction in the number of membranes must be considered in order to make the membrane computing a more realistic model.

    In the present paper, we propose an asynchronous P system using improved branch and bound to solve the minimum Steiner tree. The experimental results show the validity and efficiency of the proposed P system.

  • Takashi YOKOTA, Kanemitsu OOTSU
    原稿種別: LETTER
    論文ID: 2025PAL0002
    発行日: 2025年
    [早期公開] 公開日: 2025/06/24
    ジャーナル フリー 早期公開

    Interconnection networks are inevitable in parallel computers. Effectiveness in parallel execution is largely affected by the interconnection network as a communication performance. Especially, collective communication is important since it is frequently executed in parallel programs. To improve the performance of collective communication, one of the promising methods is packet scheduling. This paper addresses a lazy method for packet scheduling. The proposed method is based on an evolutionary idea to find hopeful candidates for injection delays and improvement methods. Preliminary evaluation results reveal that the proposed method outperforms the existing method.

  • Cheng XU, Yirong KAN, Renyuan ZHANG, Yasuhiko NAKASHIMA
    原稿種別: PAPER
    論文ID: 2025PAP0003
    発行日: 2025年
    [早期公開] 公開日: 2025/06/24
    ジャーナル フリー 早期公開

    This paper proposes a Field-Programmable Gate Array (FPGA) accelerator for Vision Transformers (ViTs) with quantization and look-up-table (LUT) based operations. First, two improved quantization methods are proposed, achieving comparable performance at lower bit-widths. Furthermore, linear and nonlinear units' designs are proposed to support diverse operations in ViTs models. Finally, the LUT-based accelerator design is implemented and evaluated. Experimental results on the ImageNet dataset demonstrate that our proposed quantization method achieves an accuracy of 80.74% at 2-bit width, outperforming state-of-the-art Vision Transformer quantization methods by 0.1% to 0.5%. The performance of the proposed FPGA accelerator demonstrates a higher energy efficiency, achieving a peak energy efficiency of 7.06 FPS/W and 246 GOPS/W.

  • Aoi KIDA, Hideyuki KAWASHIMA
    原稿種別: PAPER
    論文ID: 2025PAP0004
    発行日: 2025年
    [早期公開] 公開日: 2025/06/24
    ジャーナル フリー 早期公開

    State Machine Replication (SMR) is a fundamental technique for building fault-tolerant distributed systems with strong consistency. Rabia is an SMR protocol that simplifies implementation design through a randomized consensus algorithm. Our analysis reveals a design limitation of the Rabia protocol: under partial network partitioning, replicas can develop inconsistent queue states, leading to a livelock state. We present Qsync, which enhances Rabia's fault tolerance through queue state synchronization mechanisms while preserving its implementation simplicity. Experimental evaluation shows that Qsync maintains stable performance under partial network partitions where the original Rabia throughput drops to zero.

  • Toshiyuki ICHIBA, Yasuhiro WATANABE, Takahide YOSHIKAWA
    原稿種別: PAPER
    論文ID: 2025PAP0007
    発行日: 2025年
    [早期公開] 公開日: 2025/06/24
    ジャーナル フリー 早期公開

    Driven by the strong demand for enhanced performance in High-Performance Computing (HPC), Coarse-Grained Reconfigurable Architectures (CGRAs) are promising technologies that offer high performance even under power consumption constraints. Performance on CGRAs is significantly influenced by loop unrolling, a technique that increases computational parallelism by utilizing more processing elements in CGRAs. Determining the optimal loop unrolling factor is challenging in applications with multiple loops. This paper presents a case study demonstrating the determination of optimal loop unrolling factors for an application based on the Lattice Boltzmann Method (LBM). Because the application's process exceeds the capacity of a single CGRA, this paper proposes a method for partitioning the process to fit the CGRA's resources using integer linear programming (ILP). Finally, this paper provides a performance estimation of the CGRAs runtime and demonstrates the effectiveness of CGRAs for HPC.

  • Sho SATO, Shinobu MIWA, Hiroki HONDA, Hayato YAMAKI
    原稿種別: PAPER
    論文ID: 2025PAP0008
    発行日: 2025年
    [早期公開] 公開日: 2025/06/24
    ジャーナル フリー 早期公開

    In recent years, it has become increasingly important to utilize the entire network links more effectively to avoid traffic congestion for Internet Service Providers (ISPs), where link installation costs are high. As a promising approach to address this issue, multipath routing, which distributes traffic across multiple reachable paths to the destination, has getting attention. In multipath routing, congestion can be avoided by using other paths and balancing path loads even if a path is congested. Conventionally, realizing load-aware multipath routing has required both the collection of load metrics to track dynamically changing path loads and the distribution of traffic at an appropriate ratio with fine-grained traffic units such as flowlets. However, in ISP networks, existing methods may fail to balance path loads due to the large path delay and the variation in flow bit rates. In this paper, we propose a novel traffic balancing method suitable for ISP networks. In the proposed method, we first derive a target bandwidth for each path to equalize congestion levels of all paths in multipath, and then decide the distribution ratio by feedback control. In addition to this, the proposed method adopts modified flow-level traffic distribution, which makes flows reselect their paths at a certain time intervals. These approaches enable to balance traffic more evenly in ISP networks than conventional methods. Through network simulations using network topologies assuming ISP networks, including SINET6, we demonstrated that the proposed method can reduce the average flow completion time (FCT) by 16.0%, 44.5%, and 58.4% compared to ECMP, which performs naive traffic distribution, CONGA and W-ECMP, which achieve advanced traffic distribution.

  • Takuya FUTAGAMI, Noboru HAYASAKA
    原稿種別: PAPER
    論文ID: 2025PCP0006
    発行日: 2025年
    [早期公開] 公開日: 2025/06/19
    ジャーナル フリー 早期公開

    This study proposes a knowledge-based handcrafted building region extraction algorithm that can accurately identify the building and its background from street image at pixel level. The proposed algorithm leverages a customized patch-based graph cut inspired by human visual perception mechanisms. At the patch-based graph cut, the similarity of patches is measured by the cutting-edge deep neural networks (DNNs). The graph settings are based on the knowledge that buildings are captured at the center of the image owing to their main subject. Our experiment, which employed 300 images included in well-known open dataset, demonstrated that the proposed method employing GrabCut for a pixel-level segmentation significantly increased the comprehensive accuracy of building region extraction, which is measured by intersection over union (IoU), by 12.29% or more compared with the conventional knowledge-based method using color segmentation. This stems from the fact that the proposed method presents the more accurate building and background candidates by 8.57% or more. In addition, the GrabCut-based proposed method represented a similar accuracy to the state-of-the-art DNN-based semantic segmentation based on a transformer architecture. Further comparisons and discussions are provided in this paper to clarify the effectiveness of the proposed method.

  • Onhi KATO, Akira KUBOTA
    原稿種別: PAPER
    論文ID: 2025PCP0007
    発行日: 2025年
    [早期公開] 公開日: 2025/06/19
    ジャーナル フリー 早期公開

    In recent years, zero-shot learning-based haze removal methods using a single image have been proposed and have gained attention for their effectiveness. However, methods that fuse near-infrared (NIR) and color images have not been sufficiently studied. This paper presents a haze removal method based on zero-shot learning that fuses NIR and color images. The proposed method consists of two steps: haze removal and edge fusion. In the first step, the atmospheric scattering model is adapted to remove haze from NIR and color images. This step restores colors in the color image and enhances edges in the NIR image. In the second step, a new method is introduced to fuse haze-removed NIR and color images. This method preserves the natural color and the luminance of the color image and effectively uses the edges of the NIR image. Specifically, a weight map is generated to adjust for luminance changes and is added to the NIR image. The adjusted NIR image is then multiplied by the lightness image to restore the edges. This process allows for a natural fusion of NIR and lightness images and an effective fusion of detailed edges. Our qualitative and quantitative evaluations demonstrated that our method can restore color and edges more naturally than the conventional methods. Furthermore, it was shown to be effective even for strong haze images.

  • Yan XIANG, Di WU, Yunjia CAI, Yantuan XIAN
    原稿種別: PAPER
    論文ID: 2024EDP7313
    発行日: 2025年
    [早期公開] 公開日: 2025/06/18
    ジャーナル フリー 早期公開

    Joint multimodal aspect-based sentiment analysis (JMABSA) aims to extract aspects from multimodal inputs and determine their sentiment polarity. Existing research often faces challenges in effectively aligning aspect features across images and text. To address this, we propose an entity knowledge-guided image-text alignment network that integrates alignment across both modalities, enabling the model to more accurately capture jointly expressed aspect and sentiment information in images and text. Specifically, we introduce an entity class embedding to guide the model in learning entity-related features from text. Additionally, we utilize scene and aspect descriptions in images as entity knowledge, helping the model learn entity-relevant features from visual input. The alignment between entity knowledge in images and the initial text further supports the model in learning consistent aspect and sentiment expressions across modalities. Experimental results on two benchmark datasets demonstrate that our method achieves state-of-the-art performance on two public datasets.

  • Anlin HU, Wenjiang FENG, Xudong ZHU, Junjie WANG, Shaolong LI
    原稿種別: LETTER
    論文ID: 2025EDL8015
    発行日: 2025年
    [早期公開] 公開日: 2025/06/18
    ジャーナル フリー 早期公開

    Deep Learning-based Fault Localization (DLFL) uses metamorphic testing to locate faults in the absence of test oracles. However, these approaches face the class imbalance problem, i.e., the violated data (i.e., minority class) is much less than the non-violated data (i.e., majority class). To address this issue, we propose MDAug: Metamorphic Diffusionbased Augmentation for improving DLFL without test oracles. MDAug combines metamorphic testing and diffusion model to generate the data of minority class and acquire class balanced data. We apply MDAug to three state-of-the-art DLFL baselines without test oracles, and the results show that MDAug significantly outperforms all the baselines in the absence of test oracles.

  • Yi LIU, QiaoXing LI, Lu XIAO, Sen ZHANG
    原稿種別: PAPER
    論文ID: 2025EDP7088
    発行日: 2025年
    [早期公開] 公開日: 2025/06/18
    ジャーナル フリー 早期公開

    Driver distraction is a primary cause of traffic accidents, and the real-time and effective detection of such behaviors can significantly reduce traffic-related injuries and fatalities. In this paper, we enhance the lightweight YOLOv10n model by integrating the BiFPN structure to bolster its multi-scale feature extraction capabilities. Additionally, we design a CASSA module that combines channel attention, spatial attention, and channel shuffle to strengthen the model's ability to capture long-range dependencies. The model was tested on the CBTDDD dataset, established in this study, which includes data on driver distraction across multiple scenarios involving sedans, passenger buses, and trucks. Compared to the original YOLOv10n model, the proposed model demonstrates a 2.0% improvement in mAP@0.5 and achieves an FPS of 115.3 f/s. These results indicate that the YOLOv10n-BC model developed in this paper is capable of performing real-time and efficient monitoring of driver distraction.

  • Xuemin Huang, Xiaoliang Zhuang, Fangyuan Tian, Zheng Niu, Lin Peng, Qi ...
    原稿種別: LETTER
    論文ID: 2025EDL8016
    発行日: 2025年
    [早期公開] 公開日: 2025/06/10
    ジャーナル フリー 早期公開

    An FPGA-based fire detection system using a back propagation (BP) neural network was designed for early fire detection in key equipment in converter stations. An 8-5-1 BP network structure was trained, achieving a recognition accuracy of 94.08%. Fixed-point data quantization and pipelining were employed to reduce computational complexity, lowering resource consumption and enhancing speed. The FPGA system used 683 LUTs, achieved a 94.6% detection rate, consumed only 1.342 W of power and completed a single detection in 3.25 μs,a significant improvement compared to the 8.56 ms detection time on MATLAB.This system demonstrates excellent reliability, real-time performance, and promising application potential for early fire detection in key equipment in converter stations.

  • Zeyou LIAO, Junguo LIAO
    原稿種別: PAPER
    論文ID: 2025EDP7001
    発行日: 2025年
    [早期公開] 公開日: 2025/06/10
    ジャーナル フリー 早期公開

    Object detection in drone-captured scenarios presents significant challenges due to factors such as varying object scales, motion blur, and dense object clusters. Although existing methods, including attention blocks and feature fusion networks, have shown improvements in detection accuracy, they often come with high computational costs, which hinder realtime performance. In this paper, we propose IFN-YOLOv8, an enhanced version of YOLOv8, designed to address these challenges. By integrating the P2 feature scale, IFN-YOLOv8 enhances small object detection through higher-resolution feature maps. Additionally, we introduce a novel convolutional block, RHAConv, to replace traditional convolution layers, improving feature representation in scenes with dense object clusters. A new Information Fusion Module is also proposed to refine object features, reducing both missed and false detections. Experimental results on the VisDrone and DOTA datasets demonstrate that IFN-YOLOv8 outperforms mainstream methods, achieving an mAP@50 of 45.7% and 68.5%, respectively, while maintaining low resource consumption and high detection speed.

  • Zhiwei YU, Weixiang XU, Qianhang DU, Rong-Long WANG, Shangce GAO
    原稿種別: LETTER
    論文ID: 2024EDL8097
    発行日: 2025年
    [早期公開] 公開日: 2025/06/09
    ジャーナル フリー 早期公開

    Glaucoma is one of the leading causes of irreversible blindness worldwide. Deep learning methods have made significant strides in predicting glaucoma in recent years. However, existing models continue encountering challenges in extracting complex and subtle pathological features from fundus images associated with glaucoma. To address this limitation, we propose a novel DMNet model, which aims to enhance the integration of input signals by simulating the dendritic neuron model. This approach can improve the capture of fine details within glaucoma images and significantly boost classification performance. Experimental results indicate that DMNet outperforms traditional deep learning models on the glaucoma fundus image dataset, demonstrating its substantial performance advantages.

  • Hanaki YACHI, Wenzhu GU, Zhenyu LEI, Masaaki OMURA, Shangce GAO
    原稿種別: PAPER
    論文ID: 2024EDP7320
    発行日: 2025年
    [早期公開] 公開日: 2025/06/09
    ジャーナル フリー 早期公開

    Deep learning has revolutionized complex tasks such as classification, approximation, and prediction, drawing inspiration from mathematical models of the human brain. Among recent breakthroughs, Google's Transformer architecture has established itself as a leading framework in natural language processing. Its adaptation to computer vision, known as the Vision Transformer (ViT), has set new benchmarks for image-based tasks. In this study, we introduce a novel neural network model that integrates the input layer of the ViT with the dendritic neuron model (DNM). This hybrid architecture combines the advanced feature extraction capabilities of ViT with the adaptability and robustness of DNM to enhance performance. The proposed model is applied to the diagnosis of diabetic retinopathy, effectively identifying critical features associated with the condition. The results underscore its potential to improve the accuracy and reliability of medical image analysis, paving the way for advancements in healthcare diagnostics.

  • Yuka IKEGAMI, Kento HASEGAWA, Seira HIDANO, Kazuhide FUKUSHIMA, Kazuo ...
    原稿種別: PAPER
    論文ID: 2024EDP7325
    発行日: 2025年
    [早期公開] 公開日: 2025/06/09
    ジャーナル フリー 早期公開

    With the rapid increase in demand for IoT devices, malicious attacks targeting vulnerabilities in IoT devices have been frequent in recent years. It is highly expected that the vulnerabilities can be removed from them through vulnerability assessment. However, the wide variety of IoT devices is not standardized, and it is difficult to set up vulnerability assessment items mechanically for those IoT devices, which causes a major obstacle to automate the vulnerability assessment for IoT devices. In this paper, we propose a method to prioritize vulnerability assessment items for every IoT device by effectively utilizing large language models (LLMs). The proposed method generates the answers that take into account the specifications of individual IoT devices using an LLM by introducing Retrieval Augmented Generation (RAG), and determines how much suitable each vulnerability assessment item is for every IoT device by calculating the suitability using semantic entropy. At that time, the proposed method introduces hybrid search with reranking as a search method for related chunks in RAG. Through binary classification of vulnerability assessment items, the average area under the curve (AUC) of 0.753 was achieved for five IoT devices. We confirmed that the proposed method is more effective in evaluating the suitability of the items to the target device specifications than the methods using keyword search, vector search, and hybrid search with RRF (Reciprocal Rank Fusion).

  • Shigeaki Tanimoto, Yoshinori Fujihira, Toru Kobayashi, Takeshi Yamauch ...
    原稿種別: LETTER
    論文ID: 2024OFL0001
    発行日: 2025年
    [早期公開] 公開日: 2025/06/09
    ジャーナル フリー 早期公開

    We propose “bio-inspired UX,” a new method based on the defense mechanisms of ecosystems, for preventing intentional internal fraud within organizations. The proposed method features a function for sharing UX information within groups, inspired by the signal transmission mechanism between plants.

  • Taishin TAKAHATA, Mitsuharu MATSUMOTO
    原稿種別: LETTER
    論文ID: 2025EDL8026
    発行日: 2025年
    [早期公開] 公開日: 2025/06/09
    ジャーナル フリー 早期公開

    Disaster relief robots have been studied extensively as a promising approach to realize lifesaving and goods transportation without the need for manpower. Most disaster relief robots are designed to search for and find a person in need of rescue. However, it is not always easy for a robot to find a person in need of rescue at a disaster site, and the person in need of rescue may not even notice the presence of a robot approaching very close by. In this study, we therefore investigate the effectiveness of smell as a method of communicating the presence of a robot. We conducted a search experiment with and without smell to evaluate whether the sense of smell is useful for search. The results of the experiment confirmed its high effectiveness in searching with smell.

  • Kosuke SHIMIZU, Taizo SUZUKI
    原稿種別: PAPER
    論文ID: 2025PCP0005
    発行日: 2025年
    [早期公開] 公開日: 2025/06/09
    ジャーナル フリー 早期公開

    We propose a JPEG format-compliant encryption method in the quantized discrete cosine transform (QDCT) domain for texture protection, called Prediction Error-Propagated Encryption with Modulo Operator (PEPE-MO = WPE-MO, by pronouncing ‘W’ as ‘double’). In the QDCT domain, both the direct current (DC) coefficients, which contain structure information, and alternating current (AC) coefficients, which contain texture information, are encrypted with newly placed prediction, encryption, and reconstruction modules. The resulting propagated prediction error reinforces texture protection. To ensure JPEG compatibility, WPE-MO incorporates a modulo operator into the prediction and reconstruction modules, circulating coefficients within the JPEG-encodable value range. Additionally, to balance attack resilience and coding efficiency, two adjustable parameters are introduced: random length interval (RLI) and random step size (RSS). Experiments on JPEG image encryption demonstrate that WPE-MO exhibits high attack resilience with minimal degradation in coding efficiency. In particular, WPE-MO resists ciphertext-only attacks, including brute-force and replacement attacks, with approximately 19.55 % degradation in coding efficiency, as measured by the Bjøntegaard-delta rate, through careful selection of RLI and RSS.

  • Lintang Matahari Hasani, Kasiyah Junus, Lia Sadita, Ayano Ohsaki, Tsuk ...
    原稿種別: LETTER
    論文ID: 2024EDL8025
    発行日: 2025年
    [早期公開] 公開日: 2025/06/02
    ジャーナル フリー 早期公開

    Learners need to progress through certain inquiry stages to experience a good online discussion. This study analyzes the discussion of two classes that received different preparation: Kit-build concept mapping (KBCM) and summary writing. By using epistemic network analysis, KBCM class showed close to ideal connectivity between the inquiry stages.

  • Ying Liu, Yong Li, Ming Wen, Xiangwei Xu
    原稿種別: PAPER
    論文ID: 2024EDP7299
    発行日: 2025年
    [早期公開] 公開日: 2025/06/02
    ジャーナル フリー 早期公開

    Federated Learning collaborates with multiple organizations to train machine learning models in a way that does not reveal raw data. As a new learning paradigm, FL suffers from statistical challenges on cross-organizational non-IID data, limiting the global model to provide good performance for each client task. In this paper, we propose a personalized federated meta-learning (EPer-FedMeta) algorithm for heterogeneous clients using q-FedAvg as a model aggregation strategy, which helps the global model to optimize a reasonable representation fairly with multiple client personalized models and introduces a contrast loss in the local training to bring the similarity between meta-learner representations closer. Also noteworthy is the potential cold-start problem for new tasks in PFL (Personalized Federated Learning), where EPer-FedMeta simply uses CondConv to make lightweight modifications to the CNN network for more robust model personalization migration. Our extensive empirical evaluation of the LEAF dataset and the actual production dataset shows that EPer-FedMeta further mitigates the challenges of Non-IID data on FL system communication costs and model accuracy. In terms of performance and optimization, EPer-FedMeta achieves optimal model performance with faster convergence and lower communication overhead compared to the leading optimization algorithms in FL.

  • Makoto NAKATSUJI, Yasuhiro FUJIWARA
    原稿種別: PAPER
    論文ID: 2024OFP0009
    発行日: 2025年
    [早期公開] 公開日: 2025/06/02
    ジャーナル フリー 早期公開

    Developing personalized chatbots is crucial in the field of AI, particularly when aiming for dynamic adaptability similar to that of human communication. Traditional methods often overlook the importance of both the speaker's and the responder's personalities and their interaction histories, resulting in lower predictive accuracy. Our solution, INTPChat (Interactive Persona Chat), addresses this limitation. INTPChat builds implicit profiles from extensive utterance histories of both speakers and responders and updates these profiles dynamically to reflect current conversational contexts. By employing a co-attention encoding mechanism, INTPChat aligns current contexts with responses while considering historical interactions. This approach effectively mitigates data sparsity issues by iteratively shifting each context backward in time, allowing for a more granular analysis of long-term interactions. Evaluations on long-term Reddit datasets demonstrate that INTPChat significantly enhances response accuracy and surpasses the performance of state-of-the-art persona chat models.

  • Qian Zewen, HAN Zhezhe, Jiang Haoran, Zhang Ziyi, Zhang Mohan, Ma Hao, ...
    原稿種別: LETTER
    論文ID: 2025EDL8003
    発行日: 2025年
    [早期公開] 公開日: 2025/06/02
    ジャーナル フリー 早期公開

    Identifying the combustion conditions in power-plant furnaces is crucial for optimizing combustion efficiency and reducing pollutant emissions. Traditional image-processing methods heavily rely on prior empirical knowledge, limiting their ability to comprehensively extract features from flame images. To address these deficiencies, this study proposed a novel approach for combustion condition identification through flame imaging and a convolutional autoencoder (CAE). In this approach, the flame images are first preprocessed, then the CAE is established to extract the deep features of the flame image, and finally the Softmax classifier is employed to determine the combustion conditions. Experimental research is carried out on a 600MW opposed wall boiler, and the effectiveness of the proposed method is evaluated using captured flame images. Results demonstrate that the proposed CAE-Softmax model achieves an identification accuracy of 98.2% under the investigated combustion conditions, significantly outperforming traditional models. These findings reveal the method feasibility, offering an intelligent and efficient solution for enhancing the operational performance of power-plant boilers.

  • Jialong LI, Shogo MORITA, Wei WANG, Yan ZHANG, Takuto YAMAUCHI, Kenji ...
    原稿種別: LETTER
    論文ID: 2025EDL8017
    発行日: 2025年
    [早期公開] 公開日: 2025/06/02
    ジャーナル フリー 早期公開

    Human-robot collaboration has become increasingly complex and dynamic, highlighting the need for effective and intuitive communication. Two communication strategies for robots have been explored: (i) global-perspective strategy to share an overview of task progress, aimed at achieving consensus on completed and upcoming tasks; and (ii) local-perspective strategy to share the robot's intent, aimed at conveying the robot's immediate intentions and next actions. However, existing studies merely rely on the distinct focus to differentiate between the use of different strategies, lacking a deeper exploration of how these strategies affect user perceptions and responses in practice. For example, a possible concern could be which strategy is more likely to inspire human effort in collaboration. To this end, this paper conducts a user experiment (N=15) within a collaborative cooking scenario, and provides design insights into the strengths and weaknesses of each strategy from three dimensions to inform the design of human-sensitive communication.

  • Ziyue WANG, Yanchao LIU, Xina CHENG, Takeshi IKENAGA
    原稿種別: PAPER
    論文ID: 2025PCP0002
    発行日: 2025年
    [早期公開] 公開日: 2025/06/02
    ジャーナル フリー 早期公開

    Automatically reconstructing structured 3D model of real-world indoor scenes has been an essential and challenging task in indoor navigation, evacuation planning and wireless signal simulation, etc. Despite the increasing demand of updated indoor models, indoor reconstruction from monocular videos is still in an early stage in comparison with the reconstruction of outdoor scenes. Specific challenges are related to the complex building layouts which need long-term video recording, and the high presence of elements such as pieces of furniture causing clutter and occlusions. To accurately reconstruct the large-scale indoor scenes with multiple rooms, this paper designs a large-scale indoor multiple room 3D reconstruction pipeline to explore the topology relation between different rooms from long-term monocular videos. Firstly, semantic door detection based video segmentation is proposed to segment different rooms in video for individual reconstruction to avoid global mismatching noise, and 3D temporal trajectory is proposed to connect different rooms in spatial domain. Secondly, 3D Hough transform and Principal components analysis are utilized to refine the room boundary from reconstructed point clouds, which contributes to the accuracy improvement. Further, an original longterm video dataset for large-scale indoor multiple rooms reconstruction is constructed, which contains 12 real-world videos and 4 virtual videos with 30 rooms. Extensive experiments demonstrate that the proposed method reaches the highest performance of the 3D IoU at 0.70, room distance accuracy at 0.87, and connectivity accuracy at 0.67, which is around 39% better on average compared with various state-of-the-art models.

  • Kosuke KURIHARA, Yoshihiro MAEDA, Daisuke SUGIMURA, Takayuki HAMAMOTO
    原稿種別: PAPER
    論文ID: 2025PCP0004
    発行日: 2025年
    [早期公開] 公開日: 2025/06/02
    ジャーナル フリー 早期公開

    We propose a non-contact heart rate (HR) estimation method that models weak physiological blood volume pulse (BVP) signals and strong noise signals caused by background illumination. Our method integrates BVP signal extraction based on a physiological model and a flexible RGB/NIR integration scheme based on an illumination model in a unified manner. This unified framework enables accurate extraction of the BVP signal while suppressing noise derived from ambient light, and thus improves HR estimation performance. We demonstrate the effectiveness of our method through experiments using several datasets, including various illumination scenes. Our code will be available on https://github.com/kosuke-kurihara/PhysIllumHR.

  • Zhiyao SUN, Peng WANG
    原稿種別: PAPER
    論文ID: 2024EDP7289
    発行日: 2025年
    [早期公開] 公開日: 2025/05/28
    ジャーナル フリー 早期公開

    Mobile edge computing (MEC) faces severe challenges in achieving efficient and timely task offloading in heterogeneous network environments. While existing contract-based approaches address incentive compatibility and resource coordination, many either ignore the constraints of age of information (AoI) or suffer from high computational complexity. This paper presents an AoI-guaranteed Optimal Contract (AOC) mechanism that jointly considers information freshness and asymmetric information in MEC systems. We design a three-tier heterogeneous network architecture with non-orthogonal multiple access to enable cooperative task offloading across multiple cells and enhance spectral efficiency. Instead of a model that requires extensive training and is difficult to analyze, our proposed AOC framework uses a lightweight block coordinate descent (BCD) algorithm to solve closed-form contract solutions while ensuring incentive compatibility and individual rationality. Simulation results show that the AOC mechanism significantly improves the utility and AoI performance of the MEC server compared with existing incentive-based methods. In addition, the analysis confirms the robustness and practical deployability of the proposed framework under different system conditions.

  • Qingxia YANG, Deng PAN, Wanlin HUANG, Erkang CHEN, Bin HUANG, Sentao W ...
    原稿種別: PAPER
    論文ID: 2024EDP7316
    発行日: 2025年
    [早期公開] 公開日: 2025/05/23
    ジャーナル フリー 早期公開

    Ship detection in maritime monitoring is crucial for ensuring public safety in marine environments. However, maritime surveillance faces significant challenges due to weak targets (small, low-contrast objects) caused by complex environments and long distances. To address these challenges, we propose YOLO-MSD, a maritime surveillance detection model based on YOLOv8. In YOLO-MSD, Receptive-Field Attention Convolution (RFAConv) replaces standard convolution, learning attention maps via receptive-field interaction to enhance detail extraction and reduce information loss. The C2f module In the neck integrates Omni-Dimensional Dynamic Convolution (ODConv), which dynamically adjusts convolution kernel parameters to effectively capture contextual information, thereby achieving superior multi-scale feature fusion. We introduce a dedicated detection head specifically for small objects to enhance detection accuracy. Furthermore, to address detection box quality imbalance, we employ Wise-IoU for bounding box regression loss, enhancing multi-scale target localization and accelerating convergence. The model achieves precision, recall and mean average precision (mAP50) rates of 93.0%, 90.05% and 95.0%, respectively, on the self-constructed Maritime Vessel Surveillance Dataset (MVSD), effectively meeting the requirements for maritime target detection. We further conduct comparative experiments on the public McShips dataset, demonstrating YOLO-MSD's broad applicability in ship detection.

  • Mitsuhiro WATANABE, Go HASEGAWA
    原稿種別: PAPER
    論文ID: 2025EDP7014
    発行日: 2025年
    [早期公開] 公開日: 2025/05/23
    ジャーナル フリー 早期公開

    As the Internet becomes larger-scaled and more diversified, the traditional end-to-end (E2E) congestion control faces various problems such as low throughput on long-delay networks and unfairness among flows with different network situations. In this paper, we propose a novel congestion control architecture, called in-network congestion control (NCC). Specifically, by introducing one or more nodes (NCC nodes) on an E2E network path, we divide the network path into multiple sub-paths and maintain a congestion-control feedback loop on each sub-path. In each sub-path, a specialized congestion control algorithm can be applied according to its network characteristics. This architecture can provide various advantages compared with the traditional E2E congestion control, such as higher data transmission throughput, better per-flow fairness, and incremental deployment nature. In this paper, we describe NCC's advantages and challenges, and clarify its potential performance by evaluation results. We reveal that the E2E throughput improves by as much as 159% by just introducing NCC nodes. Furthermore, increasing the number of NCC nodes improves the E2E throughput and fairness among flows by up to 258% and 151%, respectively.

  • Guanghui CAI, Junguo ZHU
    原稿種別: PAPER
    論文ID: 2024EDP7292
    発行日: 2025年
    [早期公開] 公開日: 2025/05/15
    ジャーナル フリー 早期公開

    Deep learning has transformed Neural Machine Translation (NMT), but the complexity of these models makes them hard to interpret, thereby limiting improvements in translation quality. This study explores the widely used Transformer model, utilizing linguistic features to clarify its inner workings. By incorporating three linguistic features—part-of-speech, dependency relations, and syntax trees—we demonstrate how the model's attention mechanism interacts with these features during translation. Additionally, we improved translation quality by masking nodes that were identified to have negative effects. Our approach bridges the complex nature of NMT with clear linguistic knowledge, offering a more intuitive understanding of the model's translation process.

  • Shuhei YAMAMOTO, Yasunori AKAGI, Tomu TOMINAGA, Takeshi KURASHIMA
    原稿種別: PAPER
    論文ID: 2024EDP7248
    発行日: 2025年
    [早期公開] 公開日: 2025/05/14
    ジャーナル フリー 早期公開

    Present bias, the cognitive bias that prioritizes immediate rewards over future ones, is considered one of the factors that can hinder goal achievement. Estimation of present bias enables the development of effective intervention strategies for behavioral change. This paper proposes a novel method using behavior history, captured by wearable devices for estimating the present bias. We employ Transformer due to its proficiency in learning relationships within sequential data like behavioral history, including continuous (e.g., heart rate) and event data (e.g., sleep onset). To allow Transformer to capture behavior patterns affected by present bias from behavior history, we introduce two novel architectures for effectively processing continuous and event data timestamp information in behavioral history: temporal and event encoders (TE and EE). TE discerns the periodic characteristics of continuous data, while EE examines temporal interdependencies in the event data. These encoders enable our proposed model to capture temporally (ir)regular behavioral patterns associated with present bias. Our experiments using the behavior history logs of 257 subjects collected over 28 days demonstrated that our method estimates the subjects' present bias accurately.

  • Shrey SINGH, Prateek KESERWANI, Katsufumi INOUE, MASAKAZU IWAMURA, Par ...
    原稿種別: PAPER
    論文ID: 2024EDP7297
    発行日: 2025年
    [早期公開] 公開日: 2025/05/14
    ジャーナル フリー 早期公開

    Sign language recognition (SLR) using a video is a challenging problem. In the SLR problem, I3D network, which has been proposed for action recognition problems, is the best performing model. However, the action recognition and SLR are inherently different problems. Therefore, there is room to develop it for the SLR problem to achieve better performance, considering the task-specific features of SLR. In this work, we revisit I3D model to extend its performance in three essential design aspects. They include a better inception module named dilated inception module (DIM) and an attention mechanism-based temporal attention module (TAM) to identify the essential features of signs. In addition, we propose to eliminate a loss function that deteriorate the performance. The proposed method has been extensively validated on WLASL and MS-ASL public datasets. The proposed method has outperformed the state-of-the-art approaches in WLSAL dataset and produced competitive results on MS-ASL dataset, though the results of MS-ASL dataset are indicative due to unavailability of the original data. The Top-1 accuracy of the proposed method on WLASL100 and MS-ASL100 were 79.08% and 82.78%, respectively.

  • Olivier NOURRY, Masanari KONDO, Shinobu SAITO, Yukako IIMURA, Naoyasu ...
    原稿種別: LETTER
    論文ID: 2025EDL8005
    発行日: 2025年
    [早期公開] 公開日: 2025/05/14
    ジャーナル フリー 早期公開

    [Background] Throughout their lifetime, open-source software systems will naturally attract new contributors and lose existing contributors. Not all OSS contributors are equal, however, as some contributors within a project possess significant knowledge and expertise of the codebase (i.e., core developers). When investigating a project's ability to attract new contributors and how often a project loses contributors, it is therefore important to take into account the expertise of the contributors. [Goal] Since core developers are vital to a project's longevity, we therefore aim to find out: can OSS projects attract new core developers and how often do OSS projects lose core developers? [Results] To investigate core developer contribution patterns, we calculate the truck factor (or bus factor) of over 36,000 OSS projects to investigate how often TF developers join or abandon OSS projects. We find that 89% of our studied projects have experienced losing their core development team at least once. Our results also show that in 70% of cases, this project abandonment happens within the first three years of a project's life. We also find that most OSS projects rely on a single core developer to maintain development activities. Finally, we find that only 27% of projects that were abandoned were able to attract at least one new TF developer.

  • Xingxin WAN, Peng SONG, Siqi FU, Changjia WANG
    原稿種別: LETTER
    論文ID: 2025EDL8020
    発行日: 2025年
    [早期公開] 公開日: 2025/05/14
    ジャーナル フリー 早期公開

    In ideal facial expression recognition (FER) tasks, the training and test data are assumed to share the same distribution. However, in reality, they are often sourced from different domains, which follow different feature distributions and would seriously impair the recognition performance. In this letter, we present a novel Dynamic Graph-Guided Domain-Invariant Feature Representation (DG-DIFR) method, which addresses the issue of distribution shifts across different domains. First, we learn a robust common subspace to minimize the data distribution differences, facilitating the extraction of invariant feature representations. Concurrently, the retargeted linear regression is employed to enhance the discrimination of the proposed model. Furthermore, a maximum entropy based dynamic graph is further introduced to maintain the topological structure information in the low-dimensional subspace. Finally, numerous experiments conducted on four benchmark datasets confirm the superiority of the proposed method over state-of-the-art methods.

feedback
Top