-
Sirui SUN, Tianxiang YANG, Tengfei SHAO, Masayuki GOTO
Session ID: 3S6-GS-2-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Ryuken UDA, Yusuke IIDA
Session ID: 3Win5-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Deep learning models are known as robust to label noise in the early phase. Recent studies have shown that learning dynamics with clean label samples dominates in the early phase, which is insufficient to understand the mechanism of label noise robustness. In this study, we aimed to elucidate the mechanism by examining the differences in learning dynamics between clean and noisy labels. First, it is confirmed that the vectors of weight updates are parallel between clean and noisy labels. We visualized the shape of the loss function after learning clean and noisy datasets and compared the convergence locations. It can be seen that even when the label noise is as large as 70%, the model converges to the same local minimum or a plateau. This suggests that the flatness of the local minimum is important for the label noise robustness in the early phase.
View full abstract
-
Masahiro EBE, Atsushi AOYAMA
Session ID: 3Win5-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
We introduce a reinforcement learning approach that utilizes back-translation to numerical data for Data-to-Text generation with large language models (LLMs). Numerical data can have multiple possible interpretations, making it difficult to predefine their meaning and the key points to be explained before conducting an analysis. In this study, we focus on information recoverability in explaining numerical data and propose a reinforcement learning approach based on Proximal Policy Optimization (PPO). This approach does not require prior reference definitions and uses the error in back-translation to numerical data as a reward signal. Our experiments demonstrate that the proposed method significantly improves explanatory performance after training. Furthermore, the explanatory performance achieved with our method is significantly higher than that obtained using Direct Policy Optimization (DPO), a training method that does not require the design of a reward function. These results highlight the effectiveness of using back-translation error as a reward for enhancing explanatory performance.
View full abstract
-
YING LUO, Ichiro KOBAYASHI
Session ID: 3Win5-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Brain Encoding commonly relies on regression analysis to predict neural responses from external stimuli. However, preserving the complex relationships within the data remains a critical challenge. To address this, a method is proposed that applies the Lipschitz constraint to enhance ridge regression, significantly improving the accuracy of cortical response predictions from text embeddings. Comparison across seven state-of-the-art deep learning models reveals the superior performance of the proposed approach. The Lipschitz constraint effectively preserves the structural integrity of the data and improves prediction correlation. Additionally, information-theoretic analysis is employed to further investigate cortical response patterns. Results demonstrate that Lipschitz-enhanced ridge regression outperforms conventional methods in both prediction correlation and data structural preservation. Specifically, the Pearson correlation coefficients improved substantially, with increases ranging from 111% to over 175% across multiple models. Moreover, previously underperforming metrics now exhibit more intuitive and pronounced enhancements.
View full abstract
-
Shori MUTO, Takumi SUZUKI, Tatsuji TAKAHASHI, Yu KONO
Session ID: 3Win5-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
The performance of generative models cannot be evaluated solely by error or accuracy because output diversity, that is, diversity of images or language expressions conditioned on latent variables, is also essential. As generative models evolved, so have diversity metrics. While generative models have been trained on various modalities, progress in generative models for policies-action functions conditioned on states that represent real-world interactions-has been limited. Recently, models that embed action intentions as latent variables to generate policies were proposed, but we have no metrics to evaluate their diversity. Applying diversity metrics from other modalities is challenging because these models generate policies that map state inputs to action outputs, which prevents us from applying traditional methods straightforwardly. To address this, we propose a method that indirectly evaluates diversity using state trajectories generated from interactions between policies and the environment. Using this method, we evaluate the diversity of policies generated in a toy task to compare the performance of different policy generative models under varying parameters and architectures.
View full abstract
-
Sota AOKI, Tatsuji TAKAHASHI, Yu KONO
Session ID: 3Win5-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Reinforcement learning (RL) refines policies through trial-and-error learning based on revenue predictions anchored to states. To prevent infinite divergence, a discount factor is necessary, as RL targets not only episodic tasks but also continuous tasks with no fixed episode length. However, revenue estimation from a specific state may sometimes be inconsistent with overall task performance. To address this, we focus on the stationary visitation distribution, which emerges when a policy is executed indefinitely. By taking the expectation of the reward function under this distribution, we can accurately evaluate the current policy in a way applicable to both episodic and continuous tasks. We propose using a Transformer, which excels in sequence generation, to estimate the expected total revenue of a task from partial trajectories with respect to the stationary distribution. Regardless of conceptual superiority, the practical advantage of this approach lies in its alignment with natural reinforcement learning, where the goal is not strict optimization but selecting sufficiently good outcomes. By evaluating the task as a whole, we can clearly determine whether a policy is superior in a broader context. In this study, we integrate the estimation of task-wide expected revenue from partial trajectories into natural RL and compare it with traditional methods.
View full abstract
-
Yuna KIKUCHI, Tatsuzi TAKAHASHI, Yu KONO
Session ID: 3Win5-06
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Humans have a tendency to aim at achieving a certain goal level rather than maximizing gains single-mindedly. This tendency is clearly distinct from optimization, and corresponds to the concept of “satisficing,” which is to select a sufficiently good option drawing on a limited amount of information. Although RS is a discrete action algorithm , it is applicable to deep reinforcement learning, and when the target level is clear it outperforms optimization algorithms. However, a comparison of the performance of RS with that of optimization algorithms, within the framework of deep reinforcement learning, has not been conducted. Since satisfication itself is a very simple concept, there could be countless ways to implement it. In this study, we adopted a widely used satisficing algorithm as a baseline for exploration in a deep reinforcement learning toy task, compared its performance with that of RS, and showed that the latter outperforms the former. In turn, this result reinforces the validity of the notion of subjective regret, which is a concept of RS that is simple but effective in multiple ways.
View full abstract
-
Wataru NAKAMURA, Tatsuji TAKAHASHI, Yu KONO
Session ID: 3Win5-07
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
When humans begin a new endeavor, they initially focus on acquiring basic skills and progressively advance to intermediate and advanced levels. In essence, the focus is on achieving a goal rather than optimizing from the outset. Based on this idea, we decompose reinforcement learning into two processes: goal-oriented exploration and stepwise goal adjustment. Our algorithm, Risk-sensitive Satisficing (RS), quickly achieves satisficing by minimizing a subjective regret defined by the goal. RS also dynamically optimizes the goal in bandit problems, matching Thompson Sampling performance without requiring prior knowledge. While this demonstrates the usefulness of decomposing reinforcement learning into two key elements, current RS goal adjustment methods remain limited to bandit problems. In this study, we propose a general goal adjustment algorithm based on reinforcement learning for motor control. By integrating two simple reinforcement learning processes - rapid goal attainment and one-dimensional goal optimization - we successfully operationalize the concept of a goal.
View full abstract
-
Momoka YAJIMA, Tatsuji TAKAHASHI, Yu KONO
Session ID: 3Win5-08
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Sanae MURAMATSU, Akiko MASAKI, Takeharu EDA
Session ID: 3Win5-09
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Kohei NISHIKAWA, Koki SHIMIZU, Hiroki HASHIGUCHI
Session ID: 3Win5-10
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Ken KURATA, Gen SATO, Izumi TSUNOKUNI, Yusuke IKEDA
Session ID: 3Win5-100
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Room impulse response (RIR) is a fundamental measurement for obtaining information about sound propagation between a loudspeaker and a microphone. When measuring RIRs over a large area or with high spatial density, it becomes challenging to measure RIRs at numerous points. Recently, physics-informed neural networks (PINNs) have been applied to the problem of estimating the early RIRs from a small number of microphones. Generally, PINNs use two types of loss functions: one corresponding to physical laws and the other to data errors. A challenge with PINNs is that when the gradients associated with these two loss functions conflict, learning does not progress properly. In this study, we aimed to improve the estimation accuracies of 3D RIRs by applying PINNs with the Dynamic Pulling Method. From the simulation experiments in a 3D sound field, the proposed method achieved higher accuracy in estimating RIRs under noisy conditions compared to the conventional method.
View full abstract
-
Shogo YAMAUCHI, Yosuke YAMANO, Hideaki TAMORI
Session ID: 3Win5-101
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Yui OBARA, Mai TAKAHASHI, Hatsuho NAKAZAWA, Yuka AKINOBU, Toshiyuki KU ...
Session ID: 3Win5-102
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Shuhei TARASHIMA, Muhammad Abdul HAQ, Yushan WANG, Norio TAGAWA
Session ID: 3Win5-103
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In this work, we aim to enhance the sports ball detection and tracking baseline introduced in our previous study. Specifically, we replace the standard convolution operations with depth-wise separable convolutions to reduce model parameters and computational costs. Additionally, we adjust the number of input and output frames and incorporate multi-dataset pretraining to boost detection and tracking performance. Experimental results demonstrate that our approach successfully decreases model parameters and computational costs while achieving superior performance compared to existing methods especially on the Tennis dataset.
View full abstract
-
Qingwen CHEN, Kaito FUKUI, Hiroaki SANTO, Takeshi YAMADA, Kazuhiko NAK ...
Session ID: 3Win5-104
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Small molecule drug discovery is a widely adopted therapeutic approach for treating a diverse range of diseases. This study presents and compares three methods for utilizing graph convolutional neural networks (GCNNs) to predict hits targeting CAG repeat DNA, by representing compounds as graph-based structures and incorporating molecular descriptors. The results demonstrate that combining graph structural information with molecular descriptors enhances predictive accuracy, even in relatively small datasets. We hope this report serves as a reference for improving the predictive accuracy of small molecule activity using GCNNs.
View full abstract
-
Takafumi SAKURA, Ryo SOGA, Hideyuki KANUKA
Session ID: 3Win5-105
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In tasks such as code generation and bug fixing using large language models (LLMs), it is crucial for the models to accurately understand and verify the generated codes. However, most existing evaluation methods rely on limited inputs and execution sequences prepared by humans, thereby failing to measure a model’s ability to comprehensively design input conditions. In this study, we propose an evaluation method that leverages decision tables, widely used in software development, to assess LLMs’ control-flow understanding and input-condition coverage. Experimental results show that, while the LLM demonstrates high accuracy for small-scale functions, larger-scale functions exhibit omissions and errors, revealing limitations in the model’s capabilities. Future work will involve applying this approach to a broader set of programs to identify the limiting factors of LLMs and explore guidelines for their improvement.
View full abstract
-
Yongmin KIM, Takeshi KOJIMA, Yusuke IWASAWA, Yutaka MATSUO
Session ID: 3Win5-11
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In machine learning, curriculum learning is one of the methods that enables models to learn more effectively. To implement curriculum learning, it is necessary to determine the learning sequence. However, deciding this sequence requires annotating the dependencies between data and performing this task manually is costly. This study investigates whether Large Vision-Language Models (LVLMs) can create learning curricula in place of humans. Specifically, we evaluated whether LVLMs can generate curricula based on individual units from the middle school mathematics curriculum guidelines, in a manner similar to humans. Both humans and LVLMs were presented with explanations and example problems from two different units of the curriculum guidelines and were asked to identify the learning dependencies, based on which a curriculum was constructed. The results showed that humans tended to identify relationships between tasks more effectively than LVLMs. These findings suggest that in order to use existing LVLMs to examine task dependencies and create specific curricula, it is necessary to enhance the alignment of LVLMs with humans further.
View full abstract
-
Taihei SHIOTANI, Masahiro KANEKO, Ayana NIWA, Yuki MARUYAMA, Daisuke O ...
Session ID: 3Win5-12
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In bias evaluation of large language models (LLMs), non-English-speaking regions often rely on translated English datasets. However, such translated datasets are based on Western cultural norms and fail to fully reflect the ethical values and social norms of different cultural contexts. In this study, we construct an adversarial benchmark, JUBAKU, designed to evaluate bias specific to Japanese culture. We manually create dialogue data to elicit biases in LLMs and assess nine Japanese LLMs using JUBAKU. The results show that all models performed worse than the random baseline, revealing their vulnerability to biases unique to Japanese culture.
View full abstract
-
Shingo AYABE, Hiroshi KERA, Kazuhiko KAWAMOTO
Session ID: 3Win5-13
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Offline reinforcement learning (RL) enables policy learning from pre-collected datasets without environmental interaction. This approach reduces the cost of data collection and mitigating safety risks in robotic control. However, real-world deployment requires robustness to control failures, which remain challenging due to the lack of exploration during training. To address this issue, we propose an offline-to-online RL method that improves robustness with minimal online fine-tuning. During fine-tuning, perturbations simulating control component failures are applied to joint torque signals, including random and adversarial perturbations. We conduct experiments using legged robot models in OpenAI Gym. The results demonstrate that offline RL does not improve robustness and remains highly vulnerable to perturbations. In contrast, our method significantly improves robustness against these perturbations.
View full abstract
-
Tsuyoshi TAKANO, Hiroshi KERA, Kazuhiko KAWAMOTO
Session ID: 3Win5-14
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In online reinforcement learning (RL), image processing-based data augmentation improves data diversity and sample efficiency by exposing the model to varied observations. In contrast, offline RL relies on a fixed dataset, making dataset diversity a crucial factor in performance. Whether image augmentation is beneficial in offline RL remains an open question. In this study, we apply image processing techniques, such as rotation and translation, to augment the training data for Decision Transformer. By comparing an augmented dataset with a clean dataset in the Atari Breakout environment, we find that some image processing methods significantly reduce scores. This result suggests that increasing diversity while preserving critical game contexts and maintaining consistency with the original data distribution is crucial for effective data augmentation in offline RL.
View full abstract
-
Shun ARAKAWA, Kazuhiko KAWAMOTO, Hiroshi KERA
Session ID: 3Win5-15
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Takumi SUZUKI, Tatsuji TAKAHASHI, Yu KONO
Session ID: 3Win5-16
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Noriko NAKANO, Yuha NISHIGATA, Waka ITO, Nao SOUMA, Miyu SATO, Kimio K ...
Session ID: 3Win5-17
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Ryo SHIBAZAKI, Hiroshi KERA, Kazuhko KAWAMOTO
Session ID: 3Win5-18
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Deep learning–based classification models are known to be vulnerable to adversarial attacks, wherein the addition of imperceptible perturbations to an original sample can lead to misclassification. In this study, we address the problem of constructing a robust classifier under the Positive-Unlabeled (PU) learning setting for binary classification. A straightforward approach in PU learning is to treat all unlabeled data (U) as pseudo-negative examples and minimize the classification risk. However, when this PU learning strategy is combined with conventional adversarial defense methods, the model can overfit to adversarial robustness, thereby substantially degrading its accuracy on unperturbed images (i.e., standard accuracy). To mitigate this issue, we propose a novel method that applies a defense mechanism capable of controlling the trade-off between robustness and standard accuracy within the PU learning framework. Experiments on a benchmark dataset demonstrate that the proposed method can maintain standard accuracy while simultaneously achieving improved robustness.
View full abstract
-
Saki KATAYAMA, Mariko NIO, Yuya MATSUDA
Session ID: 3Win5-19
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Our study was conducted using the Japan's hospital claims database provided by Medical Data Vision Co. (MDV, Tokyo), for early diagnosis and comprehensive treatment planning of endometriosis (EMS). We analyzed before and after diagnosis in new EMS diagnosis, examining patterns of increase and decrease in diagnosis. Medication prescriptions and medical procedures were also analyzed descriptively. Analysis using clustering methods suggested that diagnoses of conditions potentially misdiagnosed as EMS increased prior to EMS diagnosis. Additionally, conducting examination of echo with increased painkiller prescriptions may be useful for EMS diagnosis. The study also revealed an increase in diagnosis of mental health conditions alongside EMS-related symptoms, indicating the need for comprehensive treatment addressing both physical and mental aspects. This research provides insights into improving early diagnosis and holistic care for EMS patients.
View full abstract
-
AI-Driven Journal Entries Compliant withAccounting standards for national university
Yasuhiro SHIRAKI
Session ID: 3Win5-20
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This study explored AI's potential to streamline accounting at Shinshu University, addressing labor shortages due to Japan's demographic shift. Using accounting data (April 2020 - December 2023, partially excluded), a BERT-based AI model (UiPath AI Center) was employed. Data from April 2020 to December 2022 trained the model, from January to December 2023 served as evaluation data. Evaluated different fine-tuning patterns, including product name and vendor consideration. Evaluation metrics were AI-human agreement rates (count and amount-based) and AI confidence. The count-based agreement rate exceeded the 85.29% benchmark from previous research. Limiting transactions to under ¥100,000 achieved the same rate for amount-based agreement. Confidence and count-based agreement correlated proportionally, but this was less clear for amount-based agreement. However, the correlation appeared when transactions were limited to under ¥100,000. AI can thus streamline accounting for transactions under ¥100,000.
View full abstract
-
Hiroshi KIYOTA
Session ID: 3Win5-21
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Koutarou TAMURA
Session ID: 3Win5-22
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This study analyzes the investment relationships between startups and institutional investors using data from SPEEDA Startup Insights (formerly INITIAL). By constructing an investment network, we examine its structural properties, focusing on degree distribution and preferential attachment. Our findings indicate that the network obeys a power-law degree distribution with an exponent of approximately -2.9 and a sub-linear preferential attachment rate with an exponent of 0.65. These results suggest that while highly connected startups are more likely to receive new investments, the effect is weaker than in classical preferential attachment models.
View full abstract
-
Ryuichi WATANABE
Session ID: 3Win5-23
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In this paper, we aim to clarify how generative models, including GPT-2, understand and process Japanese polite and non-polite sentences by analyzing “keigo neurons,” which respond strongly to each of these forms. Specifically, after identifying the keigo neurons that correspond to polite and non-polite expressions, we evaluate their performance as binary classifiers that distinguish between polite and non-polite sentences, and investigate their behavior when input sentences are fed into the model. Additionally, we conduct supplementary experiments in which we manipulate the activation values of these keigo neurons before generating text with the model. Our findings provide insights into how the model conceptualizes polite and non-polite language and offer suggestions for improving language-specialized models.
View full abstract
-
Kenta KIKUCHI, Koji TABATA, Tamiki KOMATSUZAKI
Session ID: 3Win5-24
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Takumu FUJIOKA, Gouhei TANAKA
Session ID: 3Win5-25
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
The fonts used in modern publications, websites, and digital media adopt a vector format, which allows for scaling without loss of image quality. However, most deep learning methods for tasks such as font generation, transformation, and classification have focused on bitmap representations, and research on deep learning for vector fonts remains relatively underexplored. In this study, we propose a method using patch embeddings for Transformer-based vector font classification. We demonstrate through numerical experiments that patch embeddings enhance performance and stabilize training. The shape of vector font characters is represented as a sequence of drawing commands, and existing methods treat each drawing command as an individual token. Our proposed approach is analogous to tokenization in language models or patch partitioning in bitmap-based image classification models.
View full abstract
-
Koki UENO, Kohei MAKINO, Yutaka SASAKI
Session ID: 3Win5-26
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Diffusion models have made remarkable advancements in various data generation tasks; however, their performance in time series forecasting, which is a task to forecast future series from past series, remains inferior to other existing approaches. To address this, this paper focuses on enhancing the performance of diffusion model-based time series forecasting. Our preliminary experiments revealed that diffusion model outputs tend to deviate from the ground truth series. To mitigate this issue, it is crucial to generate forecast series that remain close to the gold series. Motivated by the uncertainty sampling technique in active learning, we propose EUGID (Empirically estimated Uncertainty-Guided Iterative Decoding strategy), a novel decoding strategy for diffusion-model forecasting based on an estimated uncertainty distribution. Specifically, it iteratively reproduces the uncertainty distribution through a forward diffusion process and generates forecast series via a reverse diffusion process. EUGID significantly improves forecasting performance, outperforming existing diffusion-based methods on seven out of eight benchmark datasets.
View full abstract
-
Keigo NOGAMI, Gouhei TANAKA
Session ID: 3Win5-27
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Federated Learning is a decentralized machine learning method without aggregating raw data, making it a promising approach to preserve data privacy. While a lot of methods for building deep learning models with federated learning have been proposed, IoT applications face some limitations such as communication bandwidth between servers and clients and restricted computational resources of client devices. To address these challenges, this study focuses on federated learning based on reservoir computing models to improve efficiency in terms of computation and communication. A previous method for time-series anomaly detection assumes that clients simultaneously transmit local updates to the server and that the server updates the global model once. This study aims to enhance the efficiency of the previous method by proposing an approach that uses principal component analysis (PCA) to efficiently update the global model online. We clarify the tradeoff between computational performance and efficiency of the proposed approach through numerical experiments, towards achieving both efficient global model updates and reduced data communication cost.
View full abstract
-
Hiromu FUKUMOTO, Toshiaki OMORI
Session ID: 3Win5-28
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we propose a music diffusion model (MusicDiffusion) based on discrete diffusion process. To realize music generation with substantial temporal structure, we employ discrete latent space model and integrate the extracted latent space with diffusion modeling. By compressing music signals into compact latent representations, the proposed method reduces dimensionality while preserving essential musical characteristics.
View full abstract
-
Shotaro KURACHI, Toshihiko ITOH
Session ID: 3Win5-29
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
CAO THAN DAT, NGUYEN BAO LAM, PHI THAN DAT, ATSUSHI KOBAYASHI
Session ID: 3Win5-30
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Kentaro UEDA, Takehiro TAKAYANAGI
Session ID: 3Win5-31
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Remi OGUNI, Daichi MOCHIHASHI, Ichiro KOBAYASHI
Session ID: 3Win5-32
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Yuki FUJIEDA, Ryuichiro HIGASHINAKA
Session ID: 3Win5-33
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Naoyuki Terashita TERASHITA, Yusuke TOZAKI, Hideaki OMOTE, Nguyen CONG ...
Session ID: 3Win5-34
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
To improve the accurate text generation given documents with diagrams, precise recognition of diagram images is essential. Especially in specialized documents, there are often numerous visual representations of graph information, such as flowcharts, electrical circuit diagrams, and UML diagrams. However, recent research has suggested that widely used image encoders in vision-language models (VLMs) fail to accurately recognize edges within diagrams. In this study, we evaluated the contribution of training data to the ability of image encoders to recognize diagram attributes, such as node and arrows. Through contrastive learning using artificially generated chart images and text descriptions of graph information written in Mermaid notation, we confirmed that the recognition performance of image encoders for nodes and edges improved across multiple metrics.
View full abstract
-
Junya TAKAYAMA, Masaya OHAGI, Tomoya MIZUMOTO, Katsumasa YOSHIKAWA
Session ID: 3Win5-35
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Large language models (LLMs) often generate inconsistent responses to questions that share the same intent but differ in linguistic expression. This phenomenon can lead to lower task completion rates and excessive agreement with users. In this study, we investigate how variations in the linguistic representation of questions influence response tendencies across multiple models, using a question-answering dataset where responses are limited to "yes" or "no." Specifically, we construct paraphrased versions of questions through various transformations, such as replacing words with synonymous or antonymous expressions and modifying modality markers. We then compare the output probabilities of "Yes" and "No" before and after paraphrasing. Our results show that, in many models, the addition of modality markers and substitution with antonymous expressions each tend to reduce response consistency. Furthermore, we demonstrate that this tendency is already present at the pre-training stage and that it can be mitigated through few-shot prompting.
View full abstract
-
Takumi YOSHIMOTO, Masaru ISONUMA, Junichiro MORI, Ichiro SAKATA
Session ID: 3Win5-36
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Riku TAKAHASHI, Hitomi YANAKA, Hiroya TAKAMURA
Session ID: 3Win5-37
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Kei ITO, Yimeng SUN, Takao NAKAGUCHI, Masaharu IMAI
Session ID: 3Win5-38
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Real-time communication between individuals with hearing impairments and hearing individuals who have not mastered sign language remains challenging. Machine translation of sign language is essential for promoting social inclusion for people with hearing impairments. Since the introduction of Convolutional Neural Networks (CNNs), the accuracy of sign language translation has improved significantly. However, alternative approaches leveraging Transformer models are also being explored. The Video Vision Transformer, an extension of the Transformer model designed for video recognition, allows for the direct input of video data. However, to improve accuracy, preprocessing of input data is required.In this study, we fine-tuned a Video Vision Transformer pretrained on the Kinetics-400 video dataset and evaluated its performance in word-level sign language recognition using two widely recognized sign language datasets (LSA64 and WLASL100). As a result, we achieved accuracy comparable to previous studies without the need for data preprocessing.
View full abstract
-
Moriyuki KAMOTO, Marie KATSURAI
Session ID: 3Win5-39
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Yukina HIRAKURI, Takao NAKAGUCHI, Miki UENO, Yimeng SUN, Masaharu IMAI
Session ID: 3Win5-40
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Bullying is a serious issue, yet many victims hesitate to seek help. This study develops a GPT-4o chatbot to ease consultation barriers and reduce negative emotions. Integrated with cloud service, it features two character settings and counseling techniques. Bullying is difficult to address as victims often remain silent. Due to a lack of respondents with concerns, counseling effectiveness was not assessed. Increasing survey responses is needed to establish a more solid evaluation.
View full abstract
-
Kazuhiro YAMAUCHI, Marie KATSURAI
Session ID: 3Win5-41
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Sequential sentence classification (SSC) of research paper abstracts has attracted attention as a fundamental technology for information retrieval or extractive summarization. However, previous studies have only utilized English abstracts in constructing training datasets, making it difficult to apply SSC to Japanese research paper abstracts. Therefore, we created a new SSC dataset comprising abstracts from Japanese medical research papers. We trained a hierarchical bidirectional LSTM-based architecture using this dataset. Furthermore, we proposed methods to utilize existing English datasets, including data augmentation using large language models and directly using both English and Japanese data in training. Additionally, we introduced a method to enhance recognition of expressions specific to research papers. As a result, we achieved approximately 92% accuracy and 88% macro-F1 score in SSC for Japanese research papers.
View full abstract
-
Yuka NAGASE, Hitoshi SAKANO, Katsuhito SUDOH
Session ID: 3Win5-42
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This study examines the impact of proficiency level-based lexical constraints on machine translation using large language models. The results indicated that translation with such lexical constraints faced difficulties generating meaningful sentences due to an insufficient vocabulary limited by the constraints, while unconstrained translation generated fluent and natural sentences. We also observed serious under-translation phenomena in some sentences. Despite the presence of omissions, automatic quantitative evaluation revealed that COMET scores for the constrained translation exceeded those for unconstrained translation.
View full abstract
-
Naito SHUNSUKE, Hirotoshi TAIRA
Session ID: 3Win5-43
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS