-
Kentaro FUJII, Shingo MURATA
Session ID: 1B3-OS-41a-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Deep learning-based robots are expected to achieve various goals in real-world environments. To realize this, it is essential to handle environmental uncertainties using both goal-directed and exploratory actions. Deep active inference (DAIf) offers a promising approach but suffers from high computational costs and requires strong representational capabilities for modeling environmental dynamics. To address these challenges, we propose a novel DAIf framework. The framework comprises a hierarchical world model, an abstract world model, and an action model. The hierarchical world model learns environmental dynamics by introducing a temporal hierarchy, enhancing its representational capability. The action model learns latent states of action sequences as abstract actions. The abstract world model learns the relationship between the hierarchical world model's representation and the abstract actions, reducing computational costs. Robotic object manipulation experiments in uncertain environments demonstrated that the framework reduced computational costs compared to conventional approaches, achieved diverse goals, and generated exploratory actions to address environmental uncertainties.
View full abstract
-
Daiki TAKAHASHI, Masahiro SUZUKI, Yutaka MATSUO
Session ID: 1B3-OS-41a-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we analyzed multimodal prompts in a robot manipulation task, focusing on the interaction between textual and visual inputs. Using the VIMA benchmark, we evaluated the effects of modality dependence and the input order of observation tokens on the task success rate. The results revealed an overdependence on specific modalities and input order, indicating important issues in achieving robust multimodal learning. Our findings contribute to improve the generalizability of models in robot tasks.
View full abstract
-
Mai TERASHIMA, Katsuyoshi MAEYAMA, Pedro Miguel Uriguen ELJURI, Yuanyu ...
Session ID: 1B3-OS-41a-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we propose a method for learning a latent space representing 6D poses and performing 6D control using NewtonianVAE. NewtonianVAE, as a type of world model, learns the dynamics of the environment as a latent space from observational data and performs proportional control based on the estimated position. By using NewtonianVAE, position estimation can be achieved based on the internal dynamics of the environment rather than an external coordinate system. While previous studies have applied Newtonian VAE to translational control, 6D control has not been investigated. To address this, we propose 6D Multi-View NewtonianVAE (6D-MNVAE), which extends the latent space by incorporating rotation vector. In our experiments, we evaluated whether 6D-MNVAE can estimate 6D poses in the latent space and perform 6D control towards a target pose. Experimental results showed that 6D-MNVAE achieved 6D control with an accuracy within 7 mm and 0.02 rad. Furthermore, our method does not require feature engineering or annotation and enables 6D control using only RGB image information.
View full abstract
-
Yuta NOMURA, Shingo MURATA
Session ID: 1B3-OS-41a-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
The development of generalist robots capable of performing diverse tasks in various environments is highly anticipated. While imitation learning and reinforcement learning are effective approaches, they present a trade-off between generalization ability and data efficiency, often requiring large amounts of data to achieve high generalization. To address this challenge, we introduce play data, collected through human teleoperation driven by curiosity. This data serves as expert demonstrations with high generalization potential but may require additional data for out-of-distribution tasks. To overcome this limitation, we propose a play-based action generation framework that augments play data within a world model. By learning from both real and synthetically generated play data, the framework enables robots to generate actions toward various goal states. Additionally, autonomous data collection within the world model reduces reliance on real-world data collection. Experiments in both simulated and real-world robotic environments demonstrate that the proposed framework improves generalization ability and data efficiency by facilitating novel data collection within the world model.
View full abstract
-
Kai YAMASHITA, Masahiro SUZUKI, Yutaka MATSUO
Session ID: 1B3-OS-41a-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Recent advances in reinforcement learning for multi-agent environments have underscored the importance of Opponent-Modeling, where agents infer internal states or strategies of opponents. Recent studies have explored AutoEncoder-based latent representations that limit access to opponent information during execution for Opponent-Modeling in partially observable environments. In reinforcement learning, the state input to the policy and value function in a Markov decision process (MDP) must satisfy the Markov property and serve as a sufficient statistic for future reward prediction. However, under partial observability, many opponent modeling approaches focus solely on reconstructing opponent information in the latent representation, without ensuring that it retains Markovian or reward-predictive properties. To overcome this limitation, we propose a representation learning method that models not only the opponent but also the agent itself. We validated our method through experiments, demonstrating its effectiveness in improving reinforcement learning performance.
View full abstract
-
Yosuke NISHIMOTO, Takashi MATSUBARA
Session ID: 1B4-OS-41b-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Tomoshi IIYAMA, Masahiro SUZUKI, Yutaka MATSUO
Session ID: 1B4-OS-41b-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Complex real-world tasks are often long-horizon, making world models that can accurately predict far into the future crucial for AI agents. Hierarchical state-space models, which incorporate temporal hierarchies in latent states, have shown promise for long-term prediction by segmenting time series into subsequences and learning temporal abstraction. However, existing methods relying on rigid subsequence length assumptions or significant changes in observation often perform poorly in environments where optimal subsequence lengths vary or environmental changes occur gradually. This study proposes a method for learning hierarchical state-space models based on the discovery of frequently occurring, highly reusable patterns, drawing insights from chunking mechanisms in cognitive science. Our method extracts frequent patterns by utilizing changes in surprise and uncertainty in low-level latent states. Leveraging these patterns to learn high-level latent states reduces the complexity of transitions, enabling efficient long-term prediction. Experiments on video prediction tasks show that our method outperforms the baselines, underscoring the effectiveness of hierarchical structures derived from frequent patterns for long-term prediction.
View full abstract
-
Riko YOKOZAWA, Kentaro FUJII, Yuta NOMURA, Shingo MURATA
Session ID: 1B4-OS-41b-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Taisuke TAKAYAMA, Naoto YOSHIDA, Tadahiro TANIGUCHI
Session ID: 1B4-OS-41b-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Akihiro NAKANO, Masahiro SUZUKI, Yutaka MATSUO
Session ID: 1B4-OS-41b-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Model-based reinforcement learning (RL) is a promising approach to learning to control agents in a sample-efficient manner, but often struggles with generalization beyond tasks it was trained on. While previous work have explored using pretrained visual representations (PVR) to improve generalization, these approaches have not outperformed representations learned from scratch in out-of-distribution (OOD) settings. In this work, we propose to incorporate object-centric representations, which have demonstrated strong OOD generalization capabilities by learning compositional representations, into model-based RL with PVR. We investigate whether this object-centric inductive bias improves both sample efficiency and task performance across in-distribution and OOD environments.
View full abstract
-
Satoi YAMAGUCHI, Yuna MASTUNAGA, Takayasu IKEDA, Masahiro SUZUKI, Yuta ...
Session ID: 1B5-OS-41c-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This research explores a novel approach to task completion that does not rely on extensive pre-training in the rapidly evolving field of Web Agents. A major challenge existing Agents face is the automation of tasks accompanying image recognition. However, previous methods highlight limited compatibility between image recognition performance and approaches that do not require pre-training. To address this limitation, we propose an approach that strategically integrates several expert methods employing meta-prompt within LLM, achieving advanced environmental analysis that enables performance improvement. We evaluate the proposed approach using MiniWob++. Additionally, we compare existing Agents to the proposed approach to access task success rate. This paper offers insight into the potential of integration of meta-prompt using LLM to improve task completion rate, suggesting the possibility of a decrease in the necessity of extensive data collection and training required by current agents.
View full abstract
-
Yusei KOEN, Yuji FUJIMA, Yasuhiro TAKEDA, Makoto KAWANO, Yutaka MATSUO
Session ID: 1B5-OS-41c-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Recent studies have demonstrated that offline data, such as text, can significantly enhance the efficiency of task learning through the pretraining of world models. In particular, Dynalang has demonstrated its effectiveness in leveraging task instructions and environmental dynamics to enhance performance. However, its application has been primarily limited to the Messenger task, leaving its generalizability to other tasks and the impact of text type and quality in pretraining insufficiently explored. In this study, we extend Dynalang's approach to the simpler HomeGrid task to evaluate its generalizability. We also explore the use of large language models (LLMs) to generate and expand domain-specific text, aiming to further improve initial task performance and sample efficiency. Additionally, we propose and assess a two-stage pretraining strategy: general text is first used to develop fundamental language understanding, followed by domain-specific text to strengthen task-specific capabilities. Our findings highlight the potential of expanding the applicability of text-based pretraining strategies.
View full abstract
-
Daiki GOTO, Hayato IDEI, Yuji SHIOZUKA, Tetsuya OGATA
Session ID: 1B5-OS-41c-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
TAISEI OZAKI, Takumi MATSUSHITA, Tsuyoshi MIURA, Shohei TANIGUCHI, Yut ...
Session ID: 1B5-OS-41c-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Recent studies suggest that Large Language Models (LLMs) demonstrate capabilities beyond simple next-token prediction, leading to discussions about their potential acquisition of world models. This paper introduces Basic-JDERW, a deductive reasoning benchmark dataset that requires fundamental world understanding. The dataset comprises 103 QA tasks that necessitate the application of basic world models, ranging from physical phenomena comprehension to common sense reasoning and action planning, categorized into six types: causal reasoning, temporal reasoning, spatial reasoning, abstract concept reasoning, common sense reasoning, and planning. Through evaluation experiments with eight LLMs, we analyzed model performance across categories and examined correlations with existing benchmarks. Notably, llama3.3-70B-instruct demonstrated superior performance in categories requiring physical understanding, such as temporal and spatial reasoning. This research offers new perspectives on evaluating basic world understanding capabilities glimpsed through LLMs' reasoning abilities and aims to contribute to understanding the relationship between linguistic reasoning and world comprehension capabilities.
View full abstract
-
Is a large-scale language model a collective world model?
TADAHIRO TANIGUCHI, Ryo UEDA, Tomoaki NAKAMURA, Masahiro SUZUKI, Akira ...
Session ID: 1B5-OS-41c-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This study proposes a unified theoretical framework called "generative emergent communication" (generative EmCom) that connects emergent communication, world models, and large language models from the perspective of collective predictive coding. This framework not only allows us to understand symbol emergence as (decentralized) representation learning of external symbols within a machine learning framework but also enables us to interpret large language models as collective world models that integrate experiences from multiple agents. This research provides a novel perspective linking world models and symbol emergence, while discussing future research directions.
View full abstract
-
Riho HOSHUYAMA, Masaki INOUE
Session ID: 1D3-OS-24a-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In this work, we focus on the utilize of EV storage batteries in energy management systems. EV storage batteries are expected to complement the electricity demand and supply from renewable energy sources. However, the most important role of EVs is as a means of transportation. In order to collect electricity in this context, EV users who plan to go out are given incentives to change their usage schedule as mobility. We modeled this behavioral change and devised an energy management system that compensates for power shortages at the lowest possible cost.
View full abstract
-
Yuumi GOTO, Shuhei YAMAMOTO
Session ID: 1D3-OS-24a-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Hikaru UMADA, Shuhei YAMAMOTO
Session ID: 1D3-OS-24a-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
OSUKE HAYASHI, Shuhei YAMAMOTO
Session ID: 1D3-OS-24a-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Ren TOMIYAMA, Yoshiaki TAKIMOTO, Takeshi KURASHIMA, Hiroyuki TODA
Session ID: 1D3-OS-24a-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Hayato YASUDA, Akiko TAKAHASHI, Yusuke FUKAZAWA
Session ID: 1D4-OS-24b-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we propose a model that treats the sheet music of classical composers as image data and uses image recognition techniques for composer classification. Furthermore, we predict the relevance of classical composers' styles in the sheet music of modern composers, quantifying which classical composer's style is related to modern composers.
View full abstract
-
Zhelin XU, Shuhei YAMAMOTO, Hideo JOHO
Session ID: 1D4-OS-24b-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
With the rapid growth of scientific publications, researchers need to spend more time searching for papers that align with their research interests. To address this challenge, paper recommendation systems have been developed to help researchers in effectively identifying relevant paper. One of the leading approaches to paper recommendation is content-based filtering method which recommend papers based on the overall similarity of papers. However, studies on user information seeking behaviors indicate that, in addition to evaluating the overall similarity, researchers also pay attention to specific sections of a paper to assess their relevance to their interests. For instance, users may check the method section to determine whether a candidate paper utilize method they are interested in. In this paper, we propose a content-based filtering recommendation method that takes this information seeking behavior into account, aiming to provide users with more relevant papers. Specifically, in addition to considering the overall content of a paper, our approach also considers three specific sections (background, method, and results) and assigns weights to them to better reflect user preferences. We conduct offline evaluations on the DBLP dataset, and the results demonstrate that the proposed method outperforms six baseline methods in terms of precision@5, recall@5, MRR, and MAP.
View full abstract
-
Makoto TAKEUCHI
Session ID: 1D4-OS-24b-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
The content recommendation problem for live-streaming platforms presents several unique challenges not present in other media platforms. First, the recommended content is ad-hoc CGM content generated by the live streamer. Because it is dynamic, the audience is permanently restricted to choosing from the content being streamed when they visit the platform. In addition, the content is diverse and changes in real-time, influenced by the communication between the audience and the live streamer, so there is also the problem of obtaining the streamed content's features. Furthermore, it has been pointed out that audience behavior patterns on live-streaming platforms include exploring new streamers and exploiting known streamers. Understanding and appropriately capturing the dynamics of these viewer states is considered essential for a live-streaming content recommendation but has not been sufficiently studied. In this study, to investigate appropriate approaches to these problems specific to live-streaming platforms, we analyzed viewing behavior using user logs of a live-streaming platform.
View full abstract
-
Takumi ITO, Tomu TOMINAGA, Takeshi KURASHIMA
Session ID: 1D4-OS-24b-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Algorithmic recourse provides counterfactual action plans –recourse– for users to overturn negative AI decisions. It typically assumes that minimizing an objective function, which measures the distance between a user’s current and desired state, generates acceptable recourse. However, recent studies question this assumption, highlighting the need to revisit the objective function. In this study, we propose a novel objective function that excludes the influence of features irrelevant to AI decisions. These features are identified based on their correlation with, importance in predictions of, or users’ self-reported irrelevance with decision outcomes. The proposed approach ensures such features remain unchanged in recourse. Using experimental data from a user study with a loan application scenario, we confirmed that minimizing the proposed objective function improves recourse acceptability. User self-reports were particularly effective in identifying irrelevant features. Based on these results, we discussed future directions for enhancing user-centered algorithmic recourse generation incorporating users’ prior knowledge.
View full abstract
-
Yuta NAMBU, Masahiro KOHJIMA, Ryuji YAMAMOTO
Session ID: 1D4-OS-24b-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Heart information, especially ECG, is often used to estimate a person's internal state and behavior. However, accurate ECG measurement requires specialized equipment and the cooperation of medical staff, and although wearable devices that can measure ECG have appeared, there are some limitations. In contrast, since PPG measures blood flow in the wrist, it can only indirectly obtain heart information, but it can be measured more affordably and continuously with smartwatches and other devices. Therefore, this study focuses on generating ECGs signal from easily measurable PPGs. The method utilizes Rectified Flow to learn the shortest path between two distributions, enhancing ECG signal quality by incorporating peak position information from both signals. Experiments with the dataset in which ECGs and PPGs were measured simultaneously showed that our approach yields higher-quality ECGs than traditional diffusion models. Additionally, using generated ECGs as training data enhances activity classification performance compared to using PPGs.
View full abstract
-
Ryohei OGAWA, Hitoshi SUZUKI, Yusuke FUKAZAWA
Session ID: 1D5-OS-24c-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This study constructed a classification model for compensation amounts in civil litigation involving marital infidelity using machine learning. Features were designed based on prior knowledge and extracted from legal case documents using generative AI. SHAP analysis revealed that factors such as the duration of marriage, period of infidelity, and presence of children significantly related to the compensation amounts.
View full abstract
-
Yasunori AKAGI, Naoki MARUMO, Takeshi KURASHIMA
Session ID: 1D5-OS-24c-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Mimura TOMOHIRO, Ryo YAMADA
Session ID: 1D5-OS-24c-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
HARUTO SUGAWARA, HIROYUKI TODA
Session ID: 1D5-OS-24c-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Yukinari HANDA, Taisuke SHIRAISHI
Session ID: 1E3-GS-10-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In the manufacturing industry, an adjustment mechanism is sometimes provided in the design of a product with a large degree of individual difference. The term “individual difference” refers to the variation in the final performance of a product caused by the accumulation of intersecting parts and variations in the assembly process. By providing an adjustment mechanism, it is possible to absorb such individual differences by making adjustments during the final performance test. However, such adjustment process depend on the skill of the operator, because the initial conditions vary for each individual. Especially when there are multiple adjustment parameters and final performance parameters, the adjustment process becomes very complicated and the adjustment patterns increase exponentially. In this study, we propose a modeling method that combines a regression algorithm, causal inference, mathematical optimization, and measured value feedback to represent individual differences for the purpose of optimizing the adjustment process. The measured value feedback is a process to compensate for the bias by computing the assignment of measured values to the regression equation. This method can learn efficiently from a small number of machines samples, and succeeds in creating a model that can predict the optimal point in the adjustment of an machine with an unknown individual difference.
View full abstract
-
Ryoma NAKAMURA, Masaki MATSUDAIRA, Daisuke OKUYA
Session ID: 1E3-GS-10-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In order to contribute to the reduction of traffic accidents, we are researching a method to capture the signs of a traffic accident and quantify the risk of an accident based on these conditions. Based on the quantified risk of accidents, we believe that this method will be useful in preventing traffic accidents by alerting drivers. Previous research has shown that traffic shockwave propagation caused by multiple vehicles braking in chain reaction is highly associated with the occurrence of traffic accidents. In this paper, we propose a risk quantification method using a marked Hawkes process with the coefficient of variation of speed as a mark, focusing on the fact that the speed in that space changes violently in a time series when traffic shockwave propagation occurs, and report the results of applying the proposed method to actual traffic probe data. As a result of evaluation on real data, we confirmed that our method can appropriately quantify accident risk according to the occurrence of traffic shockwave propagation. We also confirmed that in some cases where accidents actually occurred, the risk increased before the occurrence of the accident by capturing traffic shockwave propagation.
View full abstract
-
Satoshi KAMEGAI, Tatsuo KITAHASHI, Ryoichi TANAKA
Session ID: 1E3-GS-10-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Shuta TANABE, Kenichi FUKUI, Noriko OTANI, Masayuki NUMAO
Session ID: 1E3-GS-10-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
YO EHARA
Session ID: 1E3-GS-10-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Kazuya KIKUTANI, Munehiko SASAJIMA
Session ID: 1E4-OS-3a-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Akane UEDA, Kazushi OKAMOTO, Kei HARADA, Atsushi SHIBATA, Koki KARUBE
Session ID: 1E4-OS-3a-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
User-generated recipes posted on recipe websites often contain expressions that can be interpreted in various ways by readers or that hinder the accurate reproduction of the dish. Such ambiguous expressions contribute to the difficulty of understanding recipes. The aim of this study is to clarify ambiguous expressions specific to cooking recipes and to provide clear alternatives that can be easily understood by readers. Specifically, we identified and categorized ambiguous expressions using a questionnaire, designed prompts for large language models based on the results, and proposed a method for complementing recipes using retrieval-augmented generation. Analysis of the questionnaire results reveals that ambiguous expressions include omissions, modifiers, among others. Furthermore, cooking experiments demonstrated that the readability of recipes complemented by the proposed method is improved, although the overall validity of the complemented recipes remains uncertain.
View full abstract
-
Takumi SHIBATA, Yuichi MIYAMURA
Session ID: 1E4-OS-3a-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
YASUFUMI TAKAMA, Hiroki SHIBATA
Session ID: 1E4-OS-3a-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This paper discusses the applicability of LLM (Large Language Model) for generating virtual user profiles. Recommender systems require users’ personal information such as their tastes and interaction histories, which could raise privacy concerns. To realize a recommendation without collecting users’ personal information, this paper proposes the concept of an explainable recommendation interface using virtual user profiles. By examining what users gave high/low ratings to target items, users can determine whether to accept/reject recommendations. This paper discusses the possibility of this concept based on the profiles generated by LLM and a pilot study with a prototype interface.
View full abstract
-
Madoka HAGIRI, Kazushi OKAMOTO, Kei HARADA, Atsushi SHIBATA, Koki KARU ...
Session ID: 1E4-OS-3a-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Recommender systems are widely used in e-commerce to enhance user convenience. Complementary recommendation is a technology that suggests combinations of products intended to improve convenience when used together. However, complementary relationships can be ambiguous, making it difficult to provide a clear definition. Therefore, we aim to develop a complementary recommendation system based on product usage scenarios using a large language model (LLM). By incorporating product usage scenarios, it is expected that the complementary recommendations will be supported by clear evidence. In this study, we conducted an experiment in which we input only the names of product categories into LLM (GPT-4o-mini), which then generated usage scenarios for these categories. The generated scenarios were subsequently manually evaluated. The experimental results confirm that approximately 85% of the generated scenarios were considered valid.
View full abstract
-
Kota NAGAO, Wataru SUNAYAMA, Shun HATTORI
Session ID: 1E5-OS-3b-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Kaname TAKIOKA, Megumi YASUO, Junjie SHAN, YOKO NISHIHARA
Session ID: 1E5-OS-3b-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Shunsuke ITOH, Kuon TANAKA, Haruka MATSUKURA, Yuji NOZAKI, MAKI SAKAMO ...
Session ID: 1E5-OS-3b-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Atsuya KOMORI, Wataru SUNAYAMA, Shun HATTORI
Session ID: 1E5-OS-3b-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In websites where many people can post comments, such as video-sharing platforms, analyzing the diverse range of submitted opinions can potentially serve as a basis for decision-making. However, as the quantity and variety of comments increase, summarizing these opinions manually becomes impractical. Moreover, the appropriate method of summarization may differ depending on the specific decision at hand. Therefore, this study proposes a method to extract major opinions from a set of comments by using ChatGPT for each perspective deemed necessary for the analysis. Through experiments, we verified that the proposed method can generate different summaries tailored to each perspective.
View full abstract
-
Kuon TANAKA, Shunsuke ITOH, Haruka MATSUKURA, Yuji NOZAKI, Maki SAKAMO ...
Session ID: 1E5-OS-3b-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, VTubers (Virtual YouTubers) have become more popular, with the number of VTubers exceeding 20,000. However, few methods find favorite VTubers, and understanding the appeal of a new VTuber takes time. This study aimed to facilitate the quick discovery of a VTubers' appeal by extracting self-disclosure from their chatting streams. First, we collected two chatting streams from each of the 96 randomly selected VTubers. Next, based on previous research and qualitative analysis by LLMs, we developed 31 self-disclosure items, including "reflection on experience" and "current goal." We then used GPT-4o-mini to classify whether transcriptions of the chat streams contained these self-disclosure items. We compared some of the results with human annotations for validation. As a result, over 80% of VTuber chatting streams contained self-disclosure. Additionally, while items related to “goal” and “VTuber activity” were extracted with high accuracy, items such as “interest” and “personality” had lower accuracy. In conclusion, LLM has made it possible to analyze VTuber chatting streams.
View full abstract
-
Daisuke KAJI, Heishiro TOYODA, Sawa TAKAMUKU, Kiyosumi KIDONO, Hiroyuk ...
Session ID: 1F3-OS-40a-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In the future society, AI technology will be required to pursue convenience and efficiency but will also be required to have the ability to respond to changes in individual well-being and societal demands. In this paper, we propose three basic requirements that AI systems must have in order to increase the happiness of diverse individuals and realize a rich, people-centered future society: "co-evolution between humans and AI", "continuous improvement activities" built by humans and AI to support co-evolution, and "AI governance/AI alignment" for these efforts to function properly. We then examine the basic requirements required for each layer. Furthermore, we focus on "plausibility" as an important keyword that plays an important role in each of these layers and improves the acceptability of these activities and discuss its meaning and role.
View full abstract
-
Hideki HAYASHI, Hirotaka KAJI, Kazushi IKEDA
Session ID: 1F3-OS-40a-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Today, accurate forecasting of urban transportation demand is becoming increasingly important. Although large volumes of mobility data are required for such forecasting, privacy considerations often mean that only aggregated statistics are available. If the generative model of mobility data can be estimated from these statistics, it becomes possible to generate pseudo mobility data, which can be beneficial for transportation demand forecasting. In this study, we propose a method for estimating model parameters from statistical data under the assumption that mobility data are generated by the Latent Dirichlet Allocation (LDA). The effectiveness of the proposed method was demonstrated through experiments using real data from a bike-sharing system.
View full abstract
-
Wei MANMAN, Onishi MASAKI, Yin YINGJIE
Session ID: 1F3-OS-40a-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Fuyuki ISHIKAWA
Session ID: 1F3-OS-40a-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In software engineering, principles and techniques have been discussed to analyze the requirements that each system must meet, or the risks it should avoid, and to carry out activities such as development, quality assurance, and operation based on those analyses. Similarly, in AI systems, it is essential not only to meet general requirements but also to address perspectives that are critical to the system's stakeholders through activities such as testing and performance tuning. This presentation will introduce efforts in testing and performance tuning for systems such as autonomous driving systems, focusing on an approach called Search-Based Software Engineering, which utilizes metaheuristic optimization.
View full abstract
-
Hirotaka KAJI, Jun KURIBAYASHI, Hikaru OTANI, Arisa EMA
Session ID: 1F3-OS-40a-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
As AI becomes more widespread, the importance of responsible AI and AI governance is increasing in response to various risks. AI developers and users need to acquire the knowledge to understand and practice. In this study, we focus on serious games, which are a promising way to understand social issues, and we design a card game that allows players to experience the necessity of responsible AI and resilience. As an AI strategy team member in a company or local government, players manage AI-related resources to improve operational efficiency. At the same time, they learn to balance performance and governance by dealing with incidents and crises that occur. We consider the similarities and differences between corporate and local government versions in the design.
View full abstract
-
Toshiya OKUBO, Jati Hiliamsyah HUSEN, Nobukazu YOSHIOKA, Naoyasu UBAYA ...
Session ID: 1F4-OS-40b-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS