Proceedings of the Annual Conference of JSAI

Using Keyword Networks to Define Indicators of Meeting Dynamics

A Lightweight Approach for Analyzing Discussion Dynamics and Topic Structures

Yingting CHEN, Taro KANNO, Satori HACHISUKA, Yuta YOSHINO, Shuhei WATA ...

Session ID: 3Q1-OS-5-03
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q1OS503

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

This study presents a lightweight method for evaluating creative meeting dynamics using keyword co-occurrence networks. Traditional approaches like idea fluency or expert evaluations often fail to capture the nuances of discussions and are challenging to implement consistently, while computational methods like Large Language Models (LLMs) are resource-intensive. Using data from 53 real-world business meetings (average duration: 1.5 hours), we analyzed accumulated and non-accumulated metrics from keyword networks to define key features through Principal Component Analysis (PCA).Three main indicators—Discussion Expansion, Local Intensity, and Topic Variety—were developed to evaluate meeting quality. These indicators effectively differentiate high-performing meetings, capturing both cumulative trends and moment-to-moment dynamics. This framework offers a scalable and cost-efficient alternative for assessing meeting creativity and productivity, with potential applications in improving team performance and collaboration. By leveraging lightweight methods, the study bridges the gap between practical usability and analytical rigor in meeting evaluation.

View full abstract

Download PDF (661K)
Creativity Analysis Using General Data on Workers and Causal Discovery

Shigeki GOCHO, Masato KUSANAGI, Suckgeun LEE, Koyo YASUTA, Yuki MATSUO ...

Session ID: 3Q1-OS-5-04
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q1OS504

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Creativity among working professionals is essential for organizational success. Previous research has examined specific factors through experiments and studies, but this approach may limit the applicability of findings to real-world settings due to its focus on particular groups and conditions. To overcome this limitation, we aim for a broader and more practical approach to creativity research. In this study, we analyze creativity using general-purpose data, rather than data designed for specific factors. We collect workplace behavioral and personality assessment data and apply causal discovery techniques to explore their relationships with creativity. Our analysis reveals that some of personality indicators are linked to creativity, while we do not find strong causal connections between workplace behavior and creativity. We believe our approach broadens the scope of creativity research and yields actionable insights for organizations to foster employee creativity.

View full abstract

Download PDF (480K)
Derivation of Facilitation Skills Based on Facial Expression Analysis and Investigation of Effects on Creativity

Yunjie WANG, Satori HACHISUKA, Taro KANNO, Yuta YOSHINO, Shuhei WATANA ...

Session ID: 3Q1-OS-5-05
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q1OS505

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (1892K)
Effects of a Robot Teaching Driving Based on the GROW Model on the Elderly

Felix JIMENEZ, Syo SUGITA, Masayoshi KANOH, Tomohiro YOSHIKAWA, Mitsuh ...

Session ID: 3Q4-GS-8-01
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q4GS801

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In recent years, educational robots that support learning have attracted attention. This study focuses on robots that teach driving skills. Previous studies have shown that collaborative learning with a robot using GROW (GROW Robot), which teaches situations and actions at each stage, is effective in teaching driving behavior to university students. However, in the current society, there is a high demand for teaching driving behavior to the elderly due to the increasing number of accidents caused by elderly drivers. Previous studies have not verified the effectiveness of this system for the elderly. In this paper, we investigate the effect of joint learning with the GROW robot on the elderly. The experiment compared three groups. The first group was the collaborative learning with a GROW robot. The second group was the collaborative learning with a robot that does not use the GROW model (conventional robot). The third group was a learning with a learning system. The experiments showed that the teaching by the GROW robot tended to be more memorable for the elderly than the teaching by the conventional robot and the learning system.

View full abstract

Download PDF (1638K)
Active Loop Closing Using Object Goal Navigation

Daiki IWATA, Kanji TANAKA, Shoya MIYAZAKI, Kouki TERASHIMA, Tomoe HIRO ...

Session ID: 3Q4-GS-8-02
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q4GS802

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (735K)
Two-Stage Reinforcement Learning with Residual Value Functions for Autonomous Forklift Control

Toru NAGAMURA, Koshi OISHI, Teruki KATO, Seigo ITO

Session ID: 3Q4-GS-8-03
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q4GS803

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

This study proposes a two-stage reinforcement learning method with residual value functions for autonomous forklifts. Automation in diverse environments is essential for forklifts due to their versatility and widespread use. However, learning from scratch in diverse environments is highly costly. To improve efficiency, we divide the forklift control task into common and environment-specific components. The common components are learned in the first stage, while the environment-specific components are efficiently learned in the second stage using residual reinforcement learning. This task division enables reuse of the learning outcomes from the common components. The evaluation experiment demonstrates that our method outperforms conventional methods in terms of success rate.

View full abstract

Download PDF (2560K)
Knowledge Transfer from RGB Vision to Thermal Vision in Multiple Person Tracking

Tatsuro SAKAI, Kanji TANAKA, Tomoe HIROKI

Session ID: 3Q4-GS-8-04
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q4GS804

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (525K)
Probabilistic Safety Guarantees for Control Using Neural CBF and Conformal Prediction

Koki HIRANO, Naoya TAKEISHI, Takehisa YAIRI

Session ID: 3Q4-GS-8-05
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q4GS805

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

We propose a novel method to ensure safety in controlling nonlinear discrete-time systems. This approach utilizes Control Barrier Functions (CBFs) learned through Neural Networks (NNs) to determine whether the system can maintain a safe state. Additionally, by leveraging Conformal Prediction (CP), we assess the error between the predicted and true values of the CBFs, providing probabilistic safety guarantees.We applied this method to a robot performing path planning using the potential field method and demonstrated its applicability to hazardous states, such as deadlocks, which are challenging to formalize. Furthermore, CP enabled us to quantify the probability of the system maintaining a safe state.By combining data-driven control design with probabilistic safety assurances, this method contributes to enhancing the safety of control systems.

View full abstract

Download PDF (407K)
A Preliminary Study on Weighted Average Hierarchical Model in Bilateral Control-based Imitation Learning

Yu-Han SHU, Koki INAMI, Koki YAMANE, Sho SAKAINO, Toshiaki TSUJI

Session ID: 3Q5-GS-8-01
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q5GS801

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Imitation learning has gained attention as a method for enabling robots to efficiently acquire human operational skills and replicate human motion techniques. However, training imitation learning models typically requires a large amount of teaching data, and re-teaching is necessary for each new task, which presents significant challenges. This study proposes an extended approach to bilateral control-based imitation learning by introducing a hierarchical model to achieve complex motions through the weighted averaging of action primitives. The proposed model is structured into two layers: the lower layer consists of multiple models that learn individual action primitives, while the upper layer is responsible for long-term motion planning and determining the optimal proportions for combining these action primitives. This method enables transfer learning across different tasks. In the validation experiment for the pick-and-place task, the upper layer was re-trained to perform a motion moving from right to left using the action primitives previously learned for the motion from left to right. As a result, autonomous motion was successfully achieved with a small amount of teaching data, thereby demonstrating the effectiveness of the proposed method.

View full abstract

Download PDF (729K)
High school students' impressions of behaviors of child-like robots that can be trained to take intelligence tests

Ikue ISHIKAWA, Felix JIMENEZ, Mayu MITANI, Takahiro NAKAJIMA, Shoko YO ...

Session ID: 3Q5-GS-8-02
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q5GS802

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In recent years, the demand for clinical psychologists has increased as the proportion of developmentally disabled children in mainstream classes has increased. Training in the administration of intelligence tests is provided through demonstrations with children. However, clinical psychologists cannot provide sufficient training because it is difficult to find subjects for training. In this study, we develop a child-like robot (proposed robot) that receives intelligence tests from clinical psychologists. The proposed robot is equipped with speech and behavior similar to that of a child undergoing intelligence testing. In this paper, we investigate whether the proposed robot's speech and movements are like those of the child being tested. In particular, the size of the robot has a strong influence on the child’s behavior. For this reason, comparative experiments with two types of robots of different sizes were conducted with high school students, who are younger than clinical psychologists. The results showed that the robot with the smaller body made a more childlike impression on the intelligence test than the robot with the larger body.

View full abstract

Download PDF (1987K)
Feasibility of Autonomous Learning Support Provided by Teacher-Type Robots Using Spatio-Temporal Perplexion Estimation Method

Koki SATO, Felix JIMENEZ, Shuichi AKIZUKI, Tomohiro YOSHIKAWA

Session ID: 3Q5-GS-8-03
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q5GS803

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In recent years, educational support robots have attracted attention. This study focuses on teacher-type robots that provide learning support for solving problems. In a previous study of teacher-type robots, a perplexion estimation method was proposed to estimate the state of the learners' perplexion (i.e., the state of being unable to solve a problem) from the learners' facial expression. However, the estimation accuracy is 61.0%, which is not practical. In this study, we proposed a spatio-temporal perplexion estimation method that uses time-series data of learners' facial expression changes and achieved an accuracy of 88.1%. In this paper, to investigate whether the robot equipped with the spatio-temporal perplexion estimation method can autonomously provide learning support, we conducted an experiment in which the robot and university students learned together. The results suggest that the proposed method can provide the same level of learning support as the conventional button-operated learning support.

View full abstract

Download PDF (1949K)
Are Language Instructions Effective for Imitation Learning of Long-Horizon Tasks?

Yuki TANAKA, Hitomi YANAKA, Hiroya TAKAMURA, Seiichiro KATSURA

Session ID: 3Q5-GS-8-04
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q5GS804

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (2178K)
Extraction of Scene-Specific Co-Occurrence Information Using Large Language Models and Its Application to Robot Scene Understanding

Kenta Gunji GUNJI, Kazunori OHNO, Shuhei KURITA, Ken SAKURADA, Satoshi ...

Session ID: 3Q5-GS-8-05
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q5GS805

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

To enable a robot to act appropriately in its operational space, it is crucial to understand the relationships between objects specific to a given context. This is because the arrangement and associations of objects determine their functionality and purpose within a scene. By accurately capturing these relationships, a robot can comprehend the intent of a scene and effectively plan and execute tasks. This study proposes a method for extracting scene-specific co-occurrence information from large language models (LLMs). While LLMs provide extensive co-occurrence knowledge, their accuracy declines in specific contexts, necessitating additional fine-tuning for real-world applications. Our approach extracts scene-specific co-occurrence information based on object placement by incorporating the surrounding objects’ information when generating co-occurrence data for objects A and B. This method generates contextually appropriate co-occurrence information without additional training, making it suitable for specific scenes and environments. By emphasizing the functional relationships formed by object groups, we demonstrate its high effectiveness in applications such as scene understanding.

View full abstract

Download PDF (570K)
RoboHara (Robot Harassment)

Masahiro SHIOMI, Kurima SAKAI, Tomo FUNAYAMA, Takashi MINATO, Hiroshi ...

Session ID: 3Q6-GS-8-01
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q6GS801

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

As robots become increasingly integrated into society, the concept of robot harassment (Robohara) is emerging. The term encompasses a wide range of harassment behaviors and applies to multiple types of perpetrators and victims, making it a polysemous or contronym concepts. Previous studies and news have primarily focused on cases where robots themselves act as agents of harassment, e.g., industrial robots interfering with human adaptability in production environments or AI-driven robots generating inappropriate speech. In contrast, we observed scenes where humans are the perpetrators of harassment against robots during a long-term experiment, and this perspective has received less attention. In this study, firstly, we surveyed the related concepts, such as robot bullying and robot abuse in order to deepen understanding of negative interactions with robots. Then, we report negative-side interaction observed in our long-term experiment, e.g., harassment behaviors directed at human-like robots by people. Finally, we described implications for human-robot interaction based on the observed Robohara.

View full abstract

Download PDF (720K)
Development of a Teleoperation System Enabling Operator-Robot Dialogue for Reducing Operator Boredom during Long-Duration Tasks

Manato UETAKE, Tomonori KUBOTA, Masaya IWASAKI, Shota MOCHIZUKI, Sanae ...

Session ID: 3Q6-GS-8-02
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q6GS802

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Teleoperated social robots have been attracting attention for their ability to improve the efficiency of dialogue tasks.However, during long-duration operations, operators experience boredom due to monotony and idle time, leading to decreased task motivation.In this study, we proposed that enabling operators to engage in dialogue with the robot they are operating could reduce their sense of boredom.The objectives of this study were to: (1) implement a teleoperation system with generative dialogue capabilities using large language models that enable flexible conversations suitable for extended use, and (2) verify the effectiveness of operator-robot dialogue in reducing operator boredom.A 14-day field experiment, in which each participant operated a robot for 5 hours, demonstrated that our implemented system significantly reduced operator boredom.The results suggest that operator-robot dialogue mitigates operator boredom during long-duration operations and may contribute to maintaining task motivation.This paper provides insights for improving operator comfort in practical uses of teleoperated social robots.

View full abstract

Download PDF (9994K)
Can a dialogue system reduce the burden of user consent?

Mutsumi TAMAI

Session ID: 3Q6-GS-8-03
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q6GS803

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

To establish and update a common ground of consent between data subjects and businesses utilizing personal data, I consider common grounds through a dialogue system that acts as an intermediary between the data subjects and businesses. This paper assumes that comprehensive consent given in advance is realized through traditional documents, and envisions a dialogue system that supplements the understanding and choices of data subjects regarding consent content, which was insufficient with just prior comprehensive consent, through natural dialogue. I hypothesize that such a dialogue system will reduce the burden on data subjects in understanding and choosing consent content, leading to better privacy considerations.

View full abstract

Download PDF (523K)
LMD-PGN

Riku UEMURA, Kanji TANAKA, Kenta TSUKAHARA, Daiki IWATA, Tomoe HIROKI

Session ID: 3Q6-GS-8-04
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q6GS804

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (1923K)
Deployment of Control Barrier Functions for Hi-performance System Design with Safety in Reinforcement Learning Approach

Kento NAGATA, Sachiyo ARAI

Session ID: 3Q6-GS-8-05
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3Q6GS805

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Deep reinforcement learning is expected to be used not only for Go and Shogi, but also for controlling robots that operate in real space and in coexistence with humans, such as self-driving cars. In recent years, research has been conducted on safety-oriented algorithms such as collision avoidance with obstacles and humans for real-world applications of reinforcement learning involving trial-and-error. However, the emphasis on safety has led to methods that suppress optimality and other aspects of task processing. In contrast, this study aims to obtain safe and high-performance control laws by combining deep reinforcement learning (DRL) and control barrier functions (CBFs) using environmental models. In the proposed method, the CBF uses a nominal model based on physical laws to identify regions that guarantee the safety of the DRL controller's operation. It also uses a Gaussian process model derived from measurement data to allow handling of uncertainties in the unknown environment that cannot be represented by the nominal model.

View full abstract

Download PDF (571K)
Proposal for an Interactive Dialogue System with LLM-Integrated 3D Avatar in AR Spaces

Shunsuke LIU, Satoshi KURIHARA

Session ID: 3R1-OS-45-01
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R1OS4501

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (831K)
Construction of a Persona Dialogue Dataset Collected from a Self-publishing Novel Website

Ryuichi UEHARA, Michimasa INABA

Session ID: 3R1-OS-45-02
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R1OS4502

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In recent years, research on character role-playing using large language models (LLMs) has been actively conducted. One approach for evaluating role-playing ability is to verify whether an LLM can respond in line with a given persona, which consists of information about the character. In order to accurately assess the role-playing ability, it is important to have the LLM perform role-playing for characters it has not been trained on before. However, many of the datasets for role-playing tasks proposed so far contain characters from well-known works, which may have appeared frequently in the LLM's pre-training data. As a result, there is a risk that the LLM's ability to utilize the persona may not be evaluated accurately. To address this, we have constructed a persona-based dialogue dataset by collecting dialogue from 608 characters across 96 online novels, including lesser-known works. The experimental results show that fine-tuning is important for improving the role-playing ability of LLMs using personas. On the other hand, we found challenges in the generalization performance of role-playing abilities for characters not included in the training data. This suggests that the dataset could be useful for exploring learning methods to improve generalization performance.

View full abstract

Download PDF (330K)
Developing an interactive agent that builds long-term relationships

Gihyang LEE, Tomoki MIYAMOTO, Daisuke KATAGAMI

Session ID: 3R1-OS-45-03
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R1OS4503

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

This study aims to establish long-term relationships between dialogue agents and humans. To achieve this, we developed a dialogue system that enables users to feel a sense of rapport based on the rapport-building process observed in human interactions. In the experiment, 30 participants were recruited via crowdsourcing and assigned to one of three conditions: rapport-building process condition, self-disclosure condition, and user-adaptive condition. Each participant interacted with the assigned dialogue system condition for three days. The results suggest that the self-disclosure condition exhibited statistically significant differences in certain aspects. Furthermore, when the system changed topics too frequently or infrequently, the participants’ evaluation of rapport and overall impression of the system declined.

View full abstract

Download PDF (741K)
Fine-tuning Large Language Models for Motivational Interviewing using Counseling Corpus

Shoma MORITA, Jie ZENG, Tatsuya SAKATO, Yukiko NAKANO

Session ID: 3R1-OS-45-04
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R1OS4504

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (540K)
Joint Optimization of Post-Processing for All Modules in Task-Oriented Dialogue Systems

Atsumoto OHASHI, Ryuichiro HIGASHINAKA

Session ID: 3R1-OS-45-05
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R1OS4505

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Post-Processing Networks (PPNs) serve as components that modify outputs from modules in task-oriented dialogue systems, with the goal of enhancing the system's overall task completion capabilities. Traditional PPN approaches, however, have been restricted to handling only a subset of modules within a system, which has significantly constrained potential performance improvements. In this paper, we introduce a method for simultaneously optimizing the post-processing of the outputs of all modules using Universal Post-Processing Networks (UniPPNs). The UniPPN utilizes a single language model capable of processing outputs from any module as a sequence transformation task. We provide a detailed explanation of the UniPPN reinforcement learning algorithm and demonstrate, through both simulation experiments using the MultiWOZ dataset and human evaluation studies, that our approach achieves superior performance compared to conventional PPN approaches.

View full abstract

Download PDF (417K)
Multi-Scale GNN based Ocean Prediction Models for 10-day Global Forecasting

Yuta HIRABAYASHI, Daisuke MATSUOKA, Konobu KIMURA

Session ID: 3R4-OS-27-01
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R4OS2701

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (1930K)
Improving the post-processing for weather forecasts using machine learning

Tomoyuki TAKENAWA, Kazuma IWASE

Session ID: 3R4-OS-27-02
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R4OS2702

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

To improve the accuracy of weather forecasts, we developed and enhanced post-processing models based on machine learning for numerical calculation results and weather forecast services. For the numerical calculation results, post-processing was applied to the output of the Japan Meteorological Agency's mesoscale model (MSM) for 18 locations across Japan, including plains, mountainous regions, and islands, focusing on precipitation, temperature, and wind speed. Meteorological variables obtained from grid points around the forecast locations were used as input features, and feature selection based on correlation analysis was applied. As a result, the model using LightGBM outperformed neural networks, CNN models, and the MSM post-processing model, MSMG (MSM Guidance), in terms of prediction accuracy. Additionally, for the weather forecast services, we focused on correcting forecasts in mountainous areas where prediction errors are particularly large, and the model using LightGBM achieved higher accuracy than the forecast services.

View full abstract

Download PDF (425K)
Performance evaluation of machine learning-based sea state estimation using ship motion

Kazuma IWASE, Ulrik Dam NIELSEN, Raphaël Emile Gilbert MOUNET, Gaute S ...

Session ID: 3R4-OS-27-03
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R4OS2703

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

This study aims to evaluate the performance of three machine learning-based methods for sea state estimation using ship motion data. The proposed methods treat ship motion as an analog to wave buoys, leveraging the measured responses to estimate sea states. All three methods rely on machine learning but provide different outputs: Method 1 outputs wave parameters, Method 2 provides point wave spectrum data and wave direction, and Method 3 generates the directional wave spectrum in a non-parametric form. While these methods demonstrate good performance using data derived from physical simulations, this study also employs observation data from wave radar collected aboard an in-service container ship for training and testing. The results show that all three methods perform well on the observation data.

View full abstract

Download PDF (500K)
Llamarine: Open-source Maritime Industry-specific Large Language Model

William NGUYEN, An PHAN, Konobu KIMURA, Hitoshi MAENO, Mika TANAKA, Qu ...

Session ID: 3R4-OS-27-04
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R4OS2704

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Large Language Models (LLMs) have shown promise in complex reasoning but often fall short in specialized domains like maritime navigation. To address this, we introduce Llamarine 1.0, the first open-source LLM designed specifically for maritime applications. Developed through continued pretraining and fine-tuning on a curated corpus of maritime textbooks, research papers, and Wikipedia content, Llamarine possesses expert-level knowledge in navigation, collision avoidance, route optimization, and regulatory compliance. Our key contributions include (a) curating a high-quality maritime dataset from authoritative sources, (b) developing a domain-specific model that outperforms general-purpose LLMs in maritime reasoning, and (c) establishing a benchmark for evaluating performance in navigation-related decision-making. Experimental results demonstrate that Llamarine surpasses both general and commercial LLMs in several maritime critical tasks. By providing a specialized, open-source foundation model, Llamarine paves the way for AI-driven advancements in maritime safety, operational efficiency, and autonomous navigation.

View full abstract

Download PDF (472K)
A physics-informed neural networks approach to calculating the fluid dynamics of vessels using surrogate models as an alternative to computational fluid dynamics

Konobu KIMURA, Akito UESHINA, Shin OKAMOTO, Yuto MIYAKE, Hiroyuki HATA ...

Session ID: 3R4-OS-27-05
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R4OS2705

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Computational fluid dynamics (CFD) methods are subject to a trade-off between accuracy and computational complexity. Specifically, fluid dynamics calculations used for ships require high levels of accuracy, which increases the amount of computational resources needed. In recent years, deep learning-based surrogate models have been proposed as an alternative to CFD to increase CFD speed. In general, supervised learning requires a large amount of CFD calculation results to be prepared in advance as training data in order to obtain sufficient accuracy, and it is practically impossible to collect data covering a wide variety of ship types. In this study, the concept of Physics-Informed Neural Network (PINN) is introduced. For preparing the training dataset, mathematical virtual hulls (wigley models) are created with varying deformations, as well as spheroid and cylindrical shapes. Further, the neural network was trained in accordance with the physical boundary conditions as well as the hydrodynamic equations. Consequently, our model was constructed several hundred times faster than conventional CFD analysis while maintaining fluid behavior. Further validation will be conducted using a model ship as well as a full-scale ship in order to develop an AI system that will allow us to achieve efficient fluid dynamics analysis even in environments with limited computing capacity.

View full abstract

Download PDF (825K)
Combining IMPALA and Demonstrations to Solve Problems with Hard Exploration and Large State-Action Spaces

Application examples in Yu-Gi-Oh! MASTER DUEL

Sora SATAKE, Soichiro HATTORI, Naoya KIHARA

Session ID: 3R5-OS-31-02
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R5OS3102

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

When applying deep reinforcement learning to current digital games, the difficulty of exploration and the vastness of the state-action space often become challenges. If a lot of play logs can be utilized, these difficulties can be mitigated through imitation learning. However, in cases where sufficient logs are difficult to collect, such as during the development of a game or events with different regulations, imitation learning may not be feasible. In this study, we propose a method to efficiently perform deep reinforcement learning using the IMPALA architecture by guiding exploration with small-scale demonstrations that developers can manually create. We show that it is possible to quickly learn hard exploration problems by devising the correction calculation in V-trace. Additionally, in current competitive digital games, we trained the AI using the proposed method. As a result, we trained an AI that is as strong as existing rule-based AI.

View full abstract

Download PDF (465K)
HTN-RL: Designable Game Character AI Using Hierarchical Task Network and Reinforcement Learning

Ray Ito ITO, Youichiro MIYAKE

Session ID: 3R5-OS-31-03
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R5OS3103

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In the game industry, artificial intelligence (AI) is widely used to control non-player characters (NPCs) as opponents or allies. While deterministic AI methods such as Behavior Trees and Hierarchical Task Networks (HTN) provide developers with precise control over character behaviors, they require extensive manual design, leading to high development costs. Reinforcement learning (RL), which is a nondeterministic method, allows for autonomous behavior generation but offers limited direct control for designers. This study explores the potential of combining HTN with deep reinforcement learning (HTN-RL) to balance designability and adaptability. By applying this approach to a Unity-based MOBA game, we examined its feasibility and effectiveness through observations from 18 international participants. The results suggest that HTN-RL may reduce manual tuning efforts while maintaining design flexibility. Furthermore, we investigated the transferability of learned behavior nodes between different HTNs, highlighting the potential for reusability in various game contexts.

View full abstract

Download PDF (694K)
Scenario translation using large-scale language models and GraphRAG

Takayuki SHIMOTOMAI, Jukiya NODA, Takushi KODANI, Tomoki TAKEKAWA, Ryu ...

Session ID: 3R5-OS-31-04
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R5OS3104

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Expanding games overseas is important but requires a lot of costs, with language translation being a key issue. Stories and scenarios are generally long, so maintaining consistency is a key issue in automatic translation. Maintaining a worldview that includes character and cultural background is also an important element. In this study, we propose a translation method for scenario that uses a GraphRAG and large-scale language model. We evaluate character consistency to demonstrate the effectiveness of the proposed method.

View full abstract

Download PDF (355K)
Environment recognition by agents using neural networks as a knowledge representation

Knowledge Representation in Neural Network Format

YOUICHIRO MIYAKE

Session ID: 3R5-OS-31-05
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3R5OS3105

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Knowledge representation is often interpreted as symbolic representation, but in this paper, a method that uses a neural network as knowledge representation is proposed. That is, a pre-trained neural network is assigned to objects in the environment, and when the agent recognizes the object, the neural network is set in the agent architecture to realize appropriate behavior for that object.

View full abstract

Download PDF (1256K)
Investigating the Impact of Supernet Pretraining in Multi-Task One-Shot NAS

Yotaro YAMAGUCHI, Yuki TANIGAKI

Session ID: 3S1-GS-2-01
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S1GS201

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

One-Shot Neural Architecture Search (NAS) reduced computational cost by pre-training a supernet and sharing its weights across subnetworks during the search process. However, its adaptation to multi-task learning remained challenging. In this study, we investigated the impact of supernet pre-training on multi-task search performance. Specifically, we examined the effectiveness of using a supernet trained on one task as a warm start for architecture search on a different task. Furthermore, we analyzed the effect of updating the supernet weights by training it on the current search task after the warm start. Our experiments aimed to determine whether a pre-trained supernet improved search efficiency and whether additional adaptation during the search process enhanced architecture performance. The results demonstrated that transferring a supernet across tasks improved search performance. Additionally, further training on the current task significantly enhanced search effectiveness.

View full abstract

Download PDF (820K)
Neural Architecture Search of Sample Re-Weighting Networks for Distribution Shift

Keisuke SUGAWARA, Kento UCHIDA, Shinichi SHIRAKAWA

Session ID: 3S1-GS-2-02
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S1GS202

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (353K)
Utilization of NAVIGATOR for Efficient Construction of Architecture Embedding Spaces

Kazuki HEMMI, Yuki TANIGAKI, Masaki ONISHI

Session ID: 3S1-GS-2-03
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S1GS203

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (1095K)
Estimating Geometric Quantities of the Encorder's Latent Spaces

Analyzing CNNs and Transformers with Information Geometry

Ikumi AKATSUKA, Noboru MURATA

Session ID: 3S1-GS-2-04
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S1GS204

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Encoders such as CNNs and Transformers can embed high-dimensional objects (e.g., images) into low-dimensional vectors via an object embedding operation, and many previous studies treat the latent space formed by these embedding vectors as a Euclidean space. In this study, I aim to capture the geometric structure of the latent space that may be overlooked under a purely Euclidean assumption. To this end, we propose a method that associates the encoder’s intermediate representations with probability distributions, thereby defining an information-geometric manifold on which we can estimate geometric quantities such as metrics and curvature. The set of distributions obtained by inputting an image dataset into the encoder forms an information-geometric manifold with the α-divergence as its distance, and its expectation coordinates coincide with the embedding vectors. Through experiments estimating the metric and curvature of the MNIST dataset learned by a CNN, we found that the latent space exhibits positive curvature in many regions, indicating that it is not necessarily flat.

View full abstract

Download PDF (342K)
Analyzing the Complexity and Hierarchy of Latent Representations in LLM

Masato SEKIGUCHI, Ryoma ISHIGAKI, Eisaku MAEDA

Session ID: 3S1-GS-2-05
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S1GS205

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Large Language Models (LLMs) have been rapidly evolving and are being utilized in various practical applications. However, many aspects of their operational principles remain unclear. In this study, we analyze the distribution of latent representations in LLMs to gain a deeper understanding of their reasoning process. As analytical methods, we employ intrinsic dimension, which captures the essential dimensionality of a distribution, and δ-hyperbolicity, which measures the hierarchical structure of the distribution. Our experimental results provide insights into the complexity and hierarchy within the reasoning process of LLMs, shedding light on how they internally handle the semantics of natural language. This study contributes not only to improving the interpretability of LLMs but also to enhancing their architectural design.

View full abstract

Download PDF (451K)
Interpolation of HDD parameters with neural network model

Improving the linear interploation accuracy with a differentiable linear interpolation layer

Katsuya SUGAWARA, Yousuke ISOWAKI, Kenichiro YAMADA, Naoki TAGAMI, Tak ...

Session ID: 3S4-GS-2-01
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S4GS201

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In low-cost embedded systems or real-time processing required fields, linear interpolation is still used due to resource or processing time restriction. HDD is one of an embedded system which requires real-time processing, thus using linear interpolation for parameters. To improve linear interpolation accuracy, we developed a small neural network model with a differentiable linear interpolation layer. Neural network was optimized by the output of attached differentiable linear interpolation layer to obtain better fittings with linear interpolation. Neural network outputs are modulated as better inputs for linear interpolation and can be used as alternatives of linear interpolation implemented in embedded systems as it is. We have tested out model with RRO (Repeatable Run Out) correction data, an HDD parameter, and obtained 10% of linear interpolation accuracy improvement or 7.5% of sampling rate reduction which leads to reduced inspection time.

View full abstract

Download PDF (618K)
Measuring Intrinsic Dimension of Token Embeddings

Takuya KATAIWA, Hakaze CHO, Tetsushi OHKI

Session ID: 3S4-GS-2-02
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S4GS202

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (1087K)
Evaluation of Design Variable Interpolation in Evolutionary Model Merging

Rio AKIZUKI, Nozomu YOSHINARI, Yuya KUDO, Yoichi HIROSE, Toshiyuki NIS ...

Session ID: 3S4-GS-2-03
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S4GS203

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Model merging is a technique for combining deep learning models without additional training and enables us to integrate the abilities of multiple large language models (LLMs) into a single LLM. Evolutionary model merging optimizes merging parameters using evolutionary computation and can reduce manual trial and error. However, it is still computationally expensive due to the repeated evaluation of merged LLMs. In particular, it is difficult to ensure a sufficient number of evaluations when optimizing merging parameters for each layer. This study experimentally evaluates design variable interpolation in evolutionary model merging using Japanese and Chinese math tasks and a newly introduced surrogate benchmark. In design variable interpolation, only the merging parameters for several layers are used as design variables, and the merging parameters for other layers are computed using interpolation methods. The experimental results show that design variable interpolation could improve the merged model performance and accelerate the search process.

View full abstract

Download PDF (414K)
An Analysis of Diversity Regularization in Deep Ensemble Learning

Tatsuhito HASEGAWA, Shunsuke SAKAI

Session ID: 3S4-GS-2-04
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S4GS204

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (455K)
Impact Analysis of Food Intake Ratios to Salt Intake Using the Explainable AI (XAI) Framework

Daiki ARAKI, Kenta OFUJI

Session ID: 3S5-GS-2-01
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S5GS201

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (843K)
Evaluation of TypiClust for Federated Active Learning in Low-Budget Regimes

Yuta ONO, Hiroshi NAKAMURA, Hideki TAKASE

Session ID: 3S5-GS-2-02
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S5GS202

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Federated Active Learning (FAL) seeks to reduce the burden of annotation under the realistic constraints of Federated Learning by leveraging Active Learning (AL). Federated active learning settings make it more expensive to obtain ground truth labels, so FAL strategies that work well in low-budget regimes are needed. In this work, we investigate the effectiveness of TypiClust, a successful low-budget AL strategy, in FAL settings. Our empirical results show that TypiClust also works well in FAL settings, although these settings present additional challenges, such as data heterogeneity, compared to AL. In addition, our sensitivity analysis of TypiClust to feature extraction methods suggests a way to perform FAL even in limited data situations.

View full abstract

Download PDF (362K)
Improving machine learning accuracy in plants and industrial equipment through dynamic selection of training data

Yusuke MOTEGI, Yoshitaka SUZUKI, Yukihiro KAWANO

Session ID: 3S5-GS-2-03
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S5GS203

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

IHI is actively working on utilizing operational data from equipment. Our products, such as plants and production equipment, are used in various industries and social activities. Therefore, providing stable and highly accurate analysis results is crucial for enhancing customer value and addressing social issue. However, in cases where the internal state changes dynamically due to shifts in operational modes or the influence of external temperatures, traditional machine learning methods may result in false detections or reduced analysis accuracy. To address this, we have developed a machine learning method that applies technology for dynamically updating learning data. This allows for high-precision analysis by dynamically selecting learning data sets suited to the changing operational environment of the equipment. This method is effective for a wide range of applications, including anomaly diagnosis and improving the operational efficiency of plants and production equipment. This paper introduces an overview of this technology and presents the results of its validation using open data.

View full abstract

Download PDF (889K)
Interagent knowledge transfer for continuing learners of object-goal navigation

Kouki TERASHIMA, Daiki IWATA, Kanji TANAKA, Tomoe HIROKI

Session ID: 3S5-GS-2-04
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S5GS204

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

To enhance object-object navigation (ON) of robots in unknown environments, the possibility of short knowledge transfer (KT) between agents is explored. Using the analogy of a situation where a human traveller efficiently explores a destination (e.g. accommodation) in an unknown location by acquiring local knowledge from locals, we propose a framework where a traveller robot (student) communicates with a local robot (teacher) to acquire ON knowledge with minimal interaction. Unlike traditional zero-shot ON approaches (large-scale language models), learning-based ON methods that utilise frontier-driven object-function maps and neural state action maps involve complex challenges to be solved in data-free KT. In this study, a lightweight plug-and-play KT module was designed for a non-cooperative black-box teacher in an open-world environment. By assuming the vision and mobility capabilities of the teacher robot and using its state-action history as a knowledge base, we developed a query-based occupancy map that dynamically represents the location of the target object Through experiments in a Habitat environment, we have demonstrated the effectiveness of this approach.

View full abstract

Download PDF (754K)
Simulating task performances of Behavioral Inattention Test (BIT) of patients with hemineglect

Shinichi ASAKAWA, Muto JUN

Session ID: 3S5-GS-2-05
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S5GS205

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

We propose a neural network model to mimic the task performances of patients with hemi-spatial neglect. The model proposed here is capable of performing several tasks of the BIT (Behavioral Ignorance Test) adopted in clinical practices. The model applied a domain adaptation for the testing tasks by transfer learning of the pre-trained model for ResNet. It has also references to the attention models such as computational neuroscience and cognitive science. These conventional models have assumed the allocation of attention across both left and right visual fields: a dorsal ‘where’ pathway for processing object locations in the visual field, and a ventral ‘what’ pathway to d eal with object recognitions, and also circuits between the parietal and frontal lobes to integrate this kind of information. The proposed model can mimic the patient's task performance due to the lesion in the right parietal lobe damag es. Compared to the conventional model, our model cab be easily compared to the patient's performance and the mode l's parameters easily interpreted. This makes it possible to consider the relationship with the attention model in the natural language processin g model.

View full abstract

Download PDF (1230K)
A study of deep embedding clustering method for tabular data considering the meaning of qualitative variables

Rin MIYAZAKI, Tianxiang YANG, Masayuki GOTO

Session ID: 3S6-GS-2-01
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S6GS201

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

[in Japanese]

View full abstract

Download PDF (666K)
Adaptive Learning Method to Achieve Both High Accuracy and Improved Recall for Critical Classes

Kazuki IIJIMA, Takashi WASHIO

Session ID: 3S6-GS-2-02
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S6GS202

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

Deep learning-based image classification models are widely utilized in various fields such as anomaly detection, object recognition, and medical image analysis. In these application domains, classification models and learning algorithms typically aim to maximize overall classification accuracy by treating all classes equally. However, in real-world scenarios, there is often a demand for enhancing the recall of specific critical classes while maintaining overall classification accuracy, as seen in the detection of specific diseases in medical diagnostic systems or pedestrian recognition in autonomous driving systems. Such adjustments, however, often involve a trade-off, where improvements in the recall of critical classes lead to a decline in overall model accuracy. To address this challenge, this study proposes a novel learning method that enhances the recall of critical classes while preserving overall accuracy. The method incorporates a weighting mechanism into the widely used cross-entropy loss function, dynamically adjusting the importance of classes based on the output probabilities generated during the training process. This adaptive approach aims to improve the recall of critical classes without compromising the overall performance of the model.

View full abstract

Download PDF (339K)
Continuous Japanese Pre-Training for Qwen2.5-32B/7B

Shinya OTANI, Kyo HATTORI, Keisuke FUJIMOTO, Kentaro NAKANISHI, Tomoki ...

Session ID: 3S6-GS-2-03
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S6GS203

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

In this study, we conducted continuous pre-training focused on Japanese for the Qwen model series developed by Alibaba Cloud, namely “Qwen2.5-32B-Instruct” and “Qwen2.5-7B-Instruct,” and evaluated their effectiveness on Japanese tasks. To balance high performance with feasible parameter sizes for real-world applications, we conducted continuous pre-training using a mixed dataset of roughly 100 billion tokens in both Japanese and English. We further applied a merging approach via ChatVector to enhance instruction-following capabilities. Evaluations using MT-Bench-Japanese and ELYZA-tasks-100 showed that the 32B model achieved scores of 8.294 and 4.37, respectively, demonstrating competitiveness comparable to closed large language models. The combined benchmark results even surpassed those of Qwen2.5-72B-Instruct, confirming the benefits of Japanese-focused continuous pre-training. On the other hand, some outputs still contain Chinese texts, suggesting potential influence from ChatVector or training data of base model. Future work will include removing mixed-language data and applying both domain- and task-specific post-training to further improve performance and address these issues.

View full abstract

Download PDF (313K)
∞-MoE: Generalizing Mixture of Experts to Infinite Experts

Shota TAKASHIRO, Takeshi KOJIMA, Shohei TANIGUCHI, Yusuke IWASAWA, Yut ...

Session ID: 3S6-GS-2-04
Published: 2025
Released on J-STAGE: July 01, 2025

DOIhttps://doi.org/10.11517/pjsai.JSAI2025.0_3S6GS204

CONFERENCE PROCEEDINGS FREE ACCESS

Show abstractHide abstract

We propose a novel framework called Infinite Mixture of Experts ($\infty$-MoE) that generalizes Mixture of Experts (MoE) from a finite number of experts to a (theoretically) infinite continuum. Traditional MoE architectures have achieved significant expressiveness by combining multiple discrete experts, but they face inherent limitations as the number and structure of experts are predetermined. To address this, our $\infty$-MoE represents experts in a continuous space, allowing the model to dynamically sample and activate an unbounded set of experts for each input. Experimental results on GPT-2 Small/Medium models demonstrate that $\infty$-MoE outperforms Dense, Switch Transformer, and standard MoE (Top-2 gating). We discuss the potential of $\infty$-MoE for more expressive model architectures and outline possible extensions to larger-scale models and multimodal tasks.

View full abstract

Download PDF (820K)

Register with J-STAGE for free!