-
Masato SUGATA, Nagisa MASUDA, Ikuko Eguchi YAIRI
Session ID: 4K1-IS-2d-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Electroencephalography (EEG)-based emotion recognition offers a noninvasive, cost-effective approach with applications in psychology, healthcare, and education. Accurate recognition of fear emotions is crucial for diagnosing and treating conditions such as phobias and anxiety disorders. This study classifies fear emotions into four levels using the DEAP dataset, leveraging Graph Neural Networks (GNNs) integrated with Long-Short-Term Memory (LSTM) networks. Two architectures, GIN-LSTM and ECLGCNN, were evaluated with raw EEG signals and Differential Entropy (DE) features. Performance was assessed using 10-fold and Leave-One-Subject-Out (LOSO) cross-validation, achieving a peak accuracy of 99.23% in 10-fold CV and 36.57% in LOSO CV, both surpassing prior studies. However, the LOSO results reveal limited generalizability to unseen subjects, highlighting the need for further research to enhance adaptability and robustness. This study demonstrates the potential of GNN-LSTM models for fear emotion classification and underscores the importance of addressing inter-subject variability to improve real-world applicability.
View full abstract
-
Mai YOSHIKAWA, Yuma ABE, Dimitar KOLEV, Hiroyuki TSUJI, Ikuko Eguchi Y ...
Session ID: 4K1-IS-2d-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In pursuing an intelligent society, the deployment of Beyond 5G/6G is anticipated. A crucial aspect of realizing this vision lies in establishing a robust non-terrestrial network encompassing satellite-based communication systems. However, space-based communication faces challenges from atmospheric disturbances. For instance, Ka-band, a crucial frequency range for satellite communication, is attenuated by rain. Similarly, optical satellite communication links are disrupted by clouds. To ensure reliable and high-quality communication, it is imperative to accurately predict the impact of weather on signal propagation, enabling the selection of ground stations and modulation methods. This research focuses on developing a predictive model for radio wave attenuation during space-to-ground communication, leveraging data from meteorological satellites. The model's core is a deep learning architecture that integrates CNNs, renowned for their proficiency in image feature extraction. The rain attenuation prediction with this model achieved a high coefficient of determination. In addition, to improve the prediction accuracy, we analyzed the complex relationship between the radio wave reception strength and Himawari standard data, a comprehensive dataset acquired from the Himawari meteorological satellite.
View full abstract
-
Ayaka ISHIHARA, Yoji YAMASHITA, Masato SUGATA, Ikuko Eguchi YAIRI
Session ID: 4K1-IS-2d-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Objective sleep deprivation detection can enhance workplace safety and productivity in professions that require long working hours. To address this, we proposed deep learning models for classifying sleep-deprived individuals using EEG data. In this study, we utilized resting-state EEG data collected from both sleep-deprived and well-rested participants and generated five datasets (EyesClosed, EyesOpen-Raw, EyesOpen-AR, EyesClosed+EyesOpen-Raw, and EyesClosed+EyesOpen-AR), then applied them to 1D CNN and 1D CNN-LSTM models. Both models achieved their peak performance with EyesOpen-AR, which slightly outperformed EyesOpen-Raw, while demonstrating comparable performance across all datasets. Applying feature extraction using differential entropy within delta, theta, alpha, and beta bands to the five datasets resulted in decreased performance. The results suggest that artifact-removal from EyesOpen-Raw is not essential for sleep deprivation detection using deep learning models. Additionally, they suggest that 1D CNN may be a more suitable choice for sleep deprivation detection, and non-feature-extracted data is more suitable than feature-extracted data.
View full abstract
-
Hiroto SASAKI, Masato SUGATA, Ikuko Eguchi YAIRI
Session ID: 4K1-IS-2d-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Graph neural networks, deep learning models designed for non-Euclidean data, have garnered attention in EEG-based emotion recognition. Recent studies explore EEG-based models and investigate multimodal models that incorporate peripheral physiological signals, such as electrooculography and electrocardiography, with ongoing research focused on feature fusion methods. The graphs used in GNNs for emotion recognition are generally constructed based on the spatial distance or the functional connectivity between channels; however, most models rely on only one type. This paper validates the effectiveness of a model that utilizes features from heterogeneous graphs and investigates various fusion methods inspired by multimodal approaches. As a result, the highest accuracy achieved was 93.87%, approximately 2% higher than that obtained using a single graph and comparable to existing methods. Furthermore, when synthesizing heterogeneous graphs, a technique that uses the embedding vector of the entire graph has proven to be more effective than one that considers individual channels.
View full abstract
-
Hiroyuki HOCHIGAI, Yutaka YAKUWA, Natsuki OKAMURA, Tianchen ZHOU, Taka ...
Session ID: 4K2-IS-2e-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
The design ICT systems that can provide application services that quickly and flexibly integrate network and software is currently the key method in DX based on ICT. Non-etheless, the problem of the huge amount of time required for the automatic design of ICT systems has been paid more and more attention. Considering the rapidly changing demand for ICT systems in various industries, the need to build and monitor systems frequently, and the difficulty in securing engineers due to the declining birthrate and aging population, the huge design time problem cannot be ignored, especially if AI is expected to reduce design time. To this end, we study the problem of reducing the design time of ICT systems with deep reinforcement learning algorithms , Weaver. First of all, we set the graph neural network as the learning model, so that deep reinforcement learning algorithm is used to solve this problem. Secondly, we have designed an algorithm that combines the representative reinforcement learning algorithm Double DQN and Noisy Network. Then, we attempted to shorten the design time by changing the normal distribution to a truncated normal distribution at the optimal value among them. Finally, sufficient trials are conducted to verify our proposed method. The results show that, the proposed method reduces the learning time required to complete learning of the design.
View full abstract
-
WEN ZHOU, Shuichiro MIWA, Yang LIU, Koji OKAMOTO
Session ID: 4K2-IS-2e-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
A generative AI architecture called bubbly flow generative adversarial networks (BF-GAN) is developed, designed to generate realistic and high-quality bubbly flow images through physically conditioned inputs, jg and jf. Initially, 52 sets of bubbly flow experiments under varying conditions are conducted to collect 140,000 bubbly flow images with physical labels of jg and jf for training data. A multi-scale loss function is then developed, incorporating mismatch loss and pixel loss to enhance the generative performance of BF-GAN further. Regarding evaluative metrics of generative AI, the BF-GAN has surpassed conventional GAN. Physically, key parameters of bubbly flow generated by BF-GAN are extracted and compared with measurement values and empirical correlations, validating BF-GAN's generative performance. The comparative analysis demonstrate that the BF-GAN can generate realistic and high-quality bubbly flow images with any given jg and jf within the research scope. BF-GAN offers a generative AI solution for two-phase flow research, substantially lowering the time and cost required to obtain high-quality data. In addition, it can function as a benchmark dataset generator for bubbly flow detection and segmentation algorithms, enhancing overall productivity in this research domain. The BF-GAN model is available online (https://github.com/zhouzhouwen/BF-GAN).
View full abstract
-
Norio NISHIOKA, Kenji TANAKA
Session ID: 4K2-IS-2e-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
With the growth of the anime movie market, predicting box office revenue has become increasingly important. Traditional movie revenue prediction research has mainly focused on factors such as news articles and reviews. However, there is a lack of research considering factors specific to anime movies. This study proposes an anime movie box office revenue prediction model that utilizes original manga sales related data, along with production information such as the director and original author, as features. Specifically, we construct models, using linear regression, multiple regression, and random forest, and compare their prediction accuracy. Using actual animated box office revenue data in Japan from 2012 and 2023, our evaluation suggests that the proposed model, particularly the simple linear regression model which uses the box office revenue by director, is effective in predicting the box office revenue of anime movies based on comics.
View full abstract
-
Bersilin Charles ROBERT, Shotaro NISHIMURA, Ko UCHIDA, Hiroshi HONDA
Session ID: 4K2-IS-2e-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This study explores a real-time vehicle motion analysis approach by fine-tuning open-source large language models (LLMs) using numerical vehicle data. Real-time motion analysis can help provide driving feedback, develop training modules for novice drivers, and assess driving performance using signals like speed, acceleration, and steering angle. A wide range of LLMs exists, including large-scale cloud-based models (e.g., GPT-4o, Gemini-2.0) and open-source solutions (e.g., Llama-3.1). Cloud-based models provide high-quality driving feedback but are expensive, slow, and not always available for real-time use. In contrast, open-source models are more accessible and can be deployed locally, but they struggle with understanding complex numerical data. To tackle this, we fine-tuned the LLaMA 3.1 model using data gathered from the Assetto Corsa racing simulator, which captures both typical and extreme driving conditions. Our model achieved 84.07% accuracy in classifying different driving behaviors, such as smooth braking and sudden acceleration, showing that fine-tuned LLMs can effectively interpret vehicle data.
View full abstract
-
Haoyu LU, Takafumi KATO, Ken-ichi FUKUI
Session ID: 4K2-IS-2e-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Sleep is vital for physical recovery, brain function, and emotional health. Polysomnography (PSG) is the gold standard for assessing sleep quality, but it is intrusive and impractical for widespread application. Sound data is a non-intrusive alternative, though its complexity makes extracting meaningful information difficult. This study enhances sound-based sleep quality assessment using a Knowledge Distillation (KD) framework. The teacher model integrates PSG features, physical factors, sleep stage data, sound data, and questionnaire factors, using a Gated Variable Selection Neural Network (GVSN) to identify key information from multimodal inputs. The student model uses physical factors and sound features extracted from one night’s sleep events and learns from the teacher via a SoftMax-based KD process. Results show the student model's accuracy improves significantly, demonstrating the potential of KD to improve sound-based sleep quality assessment.
View full abstract
-
Ziwei XU, Ryutaro ICHISE
Session ID: 4K3-IS-2f-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Global events can significantly impact various sectors but detail the impacts from global events to a specific business remains complex. If these impacts could be represented explicitly, it will enhance event prediction across various domains and help to avoid risk. This paper addresses this challenge within the finance sector by proposing a method that uses causal graphs to link global events with company-specific business events through geographical and taxonomic mappings. We outline a framework encompassing event acquisition, geographic mapping, and event taxonomic mapping, and illustrate its application with an example, demonstrating how geographical and ontological factors reveal hidden influences on business events.
View full abstract
-
Francis SANCO, Clifford BRONI-BEDIAKO, Massayasu ATSUMI
Session ID: 4K3-IS-2f-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Few-shot semantic segmentation enables pre-trained networks to generalize to new data with minimal labelled samples per class, addressing challenges of data scarcity and annotation cost. While few-shot learning methods have shown success, a more practical challenge lies in segmenting both base classes (pre-trained classes) and novel classes (new classes with few examples) in a single task. So, Generalized Few-Shot Semantic Segmentation(GFSS) was introduced, evaluating models on their ability to handle familiar and unseen classes. Existing approaches use VGG and ResNet backbones, but struggle with handling multi-scale features, which is crucial for segmenting varying size objects. Additionally, Siamese learning has proven effective for few-shot tasks but has not been widely explored in generalized few-shot learning. This paper proposes a novel solution by integrating Pyramid Vision Transformer (PVT), which introduces multi-scale features into transformers, with a Siamese Transformer Module(STM) for enhanced adaptation of support features to query features. Our approach aims to improve effectiveness and robustness of GFSS, addressing scale variation challenges and the need for better adaptation to novel class. Our work aims to: Show the capabilities of PVT for dense predictions Extend Siamese networks for GFSS
View full abstract
-
Yuefeng XU, Rui ZHONG, Junqi ZHANG, Chao ZHANG, Jun YU
Session ID: 4K3-IS-2f-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
SFE, a prominent optimization algorithm for solving high-dimensional feature selection problems, exhibits strong exploitation capability using a single search agent. However, its ability for global exploration remains relatively weak. This paper introduces a flip-flop mutation mechanism and adaptive acceptance selection into the SFE algorithm and proposes a targeted two-stage improvement to enhance its performance in high-dimensional spaces. Specifically, the following enhancements are made to the original SFE algorithm: (1) the search process is divided into two distinct stages, each with a different focus, (2) a random flip-flop mutation mechanism is incorporated, and (3) mitigation of trapping in localized solutions by adaptive acceptance selection. We evaluate the proposed algorithm on 21 high-dimensional datasets and compare its performance with the original SFE algorithm and six state-of-the-art binary evolutionary algorithms. Experimental and statistical results demonstrate that these improvements significantly enhance the global exploration capability of SFE, making it more robust and effective for addressing high-dimensional feature selection challenges.
View full abstract
-
Masaya TANIGUCHI, Naoki NEGISHI, Yusaku NISHIMIYA, Keisuke SAKAGUCHI, ...
Session ID: 4K3-IS-2f-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This study explores the impact of the presentation order of positive and negative data on grammar acquisition in language models. We specifically focus on a text search problem, with the target grammar represented by a regular language. To conduct the study, we prepare two types of data: positive data, where sentences conforming to the target grammar are embedded within the text, and negative data, where such sentences are absent. Our findings demonstrate that both the sampling strategy for positive and negative data and the order in which these datasets are presented influence the language model's ability to acquire grammatical structures.
View full abstract
-
HIROAKI SHIMOMA, Takeshi MORITA
Session ID: 4L1-OS-36-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Masahiro OTOMO, Akio KOBAYASHI, Junichi ISHIHARA, Tetsuo KATSURAGI, Ju ...
Session ID: 4L1-OS-36-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Derivation of Initial Posture Candidates Based on Machine Learning Adapted to the Environment
Rei YAMAMOTO, Natsuki MIYATA, Yusuke MAEDA
Session ID: 4L1-OS-36-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This study aims to develop a system that quantitatively evaluates and visualizes indoor areas that are both accessible to children and prone to accidents by analyzing the reach postures of a digital human model. The system supports caregivers in creating a safer environment. Since reach postures vary dynamically depending on location—for example, a child may support their body with one hand—these postures exhibit discontinuous mechanical modes. To optimize the final posture using an optimization method, it is crucial to provide an appropriate initial posture. Therefore, this study introduces a machine learning-based approach for estimating initial postures and quantitatively evaluates the preventive effects of existing accident prevention products, thereby verifying the effectiveness of the proposed system.
View full abstract
-
Mikiko OONO, Hanae YOSHINARI, Ayano NOMURA, Yoshifumi NISHIDA, Hisashi ...
Session ID: 4L1-OS-36-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, the development of sensing systems for disease prevention and health management has been expected in the field of health care. Sensing systems can gather biological information such as heart rate, blood pressure, and respiration, as well as daily activities, and analyze the state of physical and mental health to evaluate one’s health status. On the other hand, from the viewpoint of consumers, they need to make a comprehensive decision of what they want to know, the type of data to be used for health evaluation, and the risks associated with data collection. In this study, based on the concept of “Willingness to Expose (WTE),” proposed by the authors, we investigated the relationship between WTE and the type of service for health monotoring. The results showed that WTE was higher for those who were concerned about their fall risks, indicating that one's WTE would be increased when percerived reasonable benefits from sensor techonologies.
View full abstract
-
Yuya KAWABE, Naoki NOZAKI, Syunsuke SASAKI, Mikiko ONO, Kozi KITAMURA, ...
Session ID: 4L1-OS-36-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This study proposes a novel risk assessment method that considers daily objects and behaviors in living spaces. The method enables us to assess the risk of injury occurrence in an environment by considering the arrangement of furniture, daily behavior, and a large-scale database of domestic accidents. To verify its feasibility, we developed a prototype system. This digital twin system consists of an accident database, 3D scanning from smartphones or devices, posture data measurement via RGB cameras, and real-time and post-event risk visualization. Validation experiments in a simulated environment confirmed that the system can assess changes in average injury severity due to different movement paths and object arrangements. Additionally, expert interviews were conducted using the prototype to identify potential applications and related challenges.
View full abstract
-
KEIGO OKADA
Session ID: 4L2-GS-10-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
The X-ray images are sent to the image server via the imaging system to check the image resolution and patient misidentification. Chest X-ray images are usually taken in a standing position with back and front views, but for patients who cannot stand, the images are inverted to be taken in front and back views. However, if the image is inverted incorrectly and sent to the imaging system, and if the examiner is unaware of the inversion, an incident may occur in which a left-right inverted image is sent. We focused on the fact that the left and right lung fields have different shapes, and examined the feasibility of segmenting the contours and constructing a system to prevent left-right inversion. Using miniJSRT_database, a database of labeled chest X-ray images published by the Japanese Society of Radiological Technology, we constructed a U-Net-based segmentation model to extract lung field regions from chest X-ray images. We extracted lung field regions from 50 cases of chest X-ray images collected separately at Showa University Northern Yokohama Hospital using the above model. The IoU of the lung field region was calculated from the pseudo-inverted images and the original images, and showed a low value of 0.64. On the other hand, the IoU between the normal images showed a high value of 0.94. This indicates that the lung field segmentation can accurately detect false inversions in the images.
View full abstract
-
Soh NISHIMOTO
Session ID: 4L2-GS-10-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In machine learning, including machine learning, it is desirable to deal with a database that is organized under certain conditions. Currently, we are carrying out a project to collect craniofacial CT images used to diagnose facial bone fractures. The data from each facility is obtained with different CT equipment, exposing and imaging conditions are not standardized. The slice width is diverse. In our previous experience, accuracy improved by narrowing down the input region, when estimating feature point coordinates in public databases of craniofacial images. In this study, the previously mentioned data from real clinical scenes were processed to meet certain conditions, and cropped specific regions, using deep learning methods in combination. Results: The actual clinical data was able to be formatted in a certain way and narrowed down region almost automatically.
View full abstract
-
Takaaki NIHO, Akishige YUGUTI, Yosio MATSUMOTO, Kiyoko OTANI, Asao OGA ...
Session ID: 4L2-GS-10-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Yuma YOKOTA, Kei SUZUKI, Kenichi INOUE, Midori SUGAYA
Session ID: 4L2-GS-10-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Kenichi INOUE, Kei SUZUKI, Midori SUGAYA
Session ID: 4L2-GS-10-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Takeshi TESHIMA, Kenta SHINOTSUKA, Yuchi MATSUOKA
Session ID: 4L3-OS-38-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Ryuichi OGAWA, Shigeyoshi SHIMA
Session ID: 4L3-OS-38-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Kota MOCHIDA, Teppei NAKANO, Mari WAKABAYASHI, Tomomi SATO, Tetsuji OG ...
Session ID: 4L3-OS-38-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Takeaki SAKABE, Yuko SAKURAI, Satoshi OYAMA
Session ID: 4L3-OS-38-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In this study, we evaluate user perceptions of datasets used in machine learning competitions. Datasets for such competitions are often selected in an unsystematic manner, making it challenging to choose datasets that align with participants' skill levels. To address this issue, we conducted a user study as part of the evaluation of a framework for automated dataset generation. In this study, participants trained and tested models using both existing datasets and those generated by the framework, and we collected test results based on algorithm selection and performance variations. The impact of datasets on participants' performance was examined using objective metrics. The results of our evaluation indicate that the generated datasets exhibit properties more suitable for educational competitions compared to existing datasets.
View full abstract
-
Jiyi LI
Session ID: 4L3-OS-38-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Whether Large Language Models (LLMs) can surpass crowdsourcing in data annotation tasks has gained interest recently. Some works verified this issue with the average performance of individual crowd workers and LLMs on some specific tasks. However, the aggregated answers are the eventually collected annotations, rather than the crowd answers themselves. The scenarios involving crowd answer aggregation need further study. Our studies concentrated on two types of annotations including categorical labels and text answers. On the one hand, we studied the scenario of answer aggregation on the crowd categorical labels in the classification tasks and LLMs are used as creators of the labels. We propose a Crowd-LLM hybrid aggregation method, finding that We propose a Crowd-LLM hybrid aggregation method, finding that adding LLM labels from good LLMs to existing crowdsourcing datasets can enhance the quality of the aggregated labels of the datasets. On the other hand, we also explored text answer aggregation and assess LLMs as aggregators in close-ended crowd text answer scenarios. We proposed a hybrid aggregation approach within a Creator-Aggregator Multi-Stage (CAMS) framework. Experiments demonstrate that our approach can further improve the answer quality based on the combinations of three resources of workers and answers. These findings have been published at ICASSP2024 and EMNLP2024.
View full abstract
-
Sota KANEKO, Seiji YAMADA
Session ID: 4M1-OS-14a-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Takumi TSUJIYAMA, Seiji YAMADA, Takashi ONODA
Session ID: 4M1-OS-14a-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
If trust in AI breaks down due to over-trust or under-trust by humans, achieving high performance in human-AI collaborative decision-making becomes difficult. To address this issue, the human-AI trust relationship needs to be optimized by adaptively calibrating trust. Against this background, prior research proposed a trust calibration AI that automatically detects over-trust or under-trust and encourages humans to calibrate their trust in AI. This AI requires a cognitive/AI performance model to estimate the problem-solving ability of both humans and AI. However, specific methods to create this model do not exist at present. Therefore, this study proposes a method to construct a cognitive/AI performance model using Support Vector Machine (SVM) classification model. To evaluate the effectiveness of this method, an experiment was conducted using a chest X-ray interpretation task as a case study.
View full abstract
-
Mibuki TAKAGI, Kazunori TERADA
Session ID: 4M1-OS-14a-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Yosuke FUKUCHI, Seiji YAMADA
Session ID: 4M1-OS-14a-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Keito MIYAKE, Kumi OZAKI, Seiji YAMADA
Session ID: 4M1-OS-14a-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
AI technology in healthcare has made remarkable progress, with continuous improvements in diagnostic support accuracy and efficiency. However, medical professionals sometimes prioritize human judgment over AI despite recognizing the high performance of the systems they use, a phenomenon known as "algorithm aversion". In healthcare settings, medical errors remain a serious concern, and algorithm aversion may lead to overlooking human errors that AI support systems could prevent. Therefore, properly addressing algorithm aversion is essential for improving safety where AI assistance is available. This study quantitatively analyzes how psychological factors influence AI output usage rates (reliance rate) in shaping medical professionals' attitudes toward AI systems, focusing on their sense of control and responsibility. The analysis employs questionnaire items to examine correlations with reliance rates through statistical analysis. The findings are expected to guide human-AI interactions in medical settings while contributing to the theoretical foundation for addressing a crucial challenge: the collaboration between humans and AI.
View full abstract
-
Takahiro TSUMURA, Seiji YAMADA
Session ID: 4M2-OS-14b-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
When a person is assisted by another person, he or she may want to give back to that person. In recent years, more and more agents have been collaborating with people, but because of differences in individual abilities, not all agents perform tasks with the same capacity. In this study, we investigated whether empathy and trust toward an agent are promoted when a person and multiple agents perform a common typing task within a time limit and one agent performs the remaining task instead. To investigate whether the principle of retroactivity toward agents is at work, we also prepared four different colors of agents; 392 participants showed that people do not identify with agents individually, and that being helped by an agent has a high impact on empathy and trust. The results indicate that evaluations of agents who help people also increase evaluations of similar agents who did not help people, which may help make agents more acceptable in a society where the use of agents is increasing.
View full abstract
-
Examining the Role of Robot Altruism in Human Prosociality
Chenlin HANG, Masahiro SHIOMI, Seiji YAMADA
Session ID: 4M2-OS-14b-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
This study investigates the impact of robot self-sacrifice on human trust and prosocial behaviors. While existing research in human-robot interaction (HRI) often delves into moral dilemmas, such as the trolley problem, this work shifts focus to practical contexts where robots engage in altruistic actions, like sacrificing their own battery to assist a user. In an experiment involving 30 participants, results revealed that robots exhibiting self-sacrificial behavior significantly encouraged prosocial actions, although perceptions of the robots' social attributes remained unchanged across conditions. These findings suggest that while self-sacrifice does not necessarily enhance how robots are perceived, it can effectively promote cooperative behaviors among humans. This research contributes to the development of socially interactive robots capable of fostering prosocial dynamics in human-robot coexistence.
View full abstract
-
Mari SAITO, Seiji YAMADA
Session ID: 4M2-OS-14b-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Hikaru MATSUZAKI, Tomoyuki MAEKAWA, Kentaro ISHII, Michita IMAI
Session ID: 4M2-OS-14b-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
We propose Interactive-SmartClerk, a robot system for recommending items to a first-time customer by mimicking a store clerk. Interactive-SmartClerk solves the cold-start problem by asking the customer's preferences for a few items. Even if the number of samples of the customer's preferences is small, Interactive-SmartClerk can make recommendations that match the customer's preferences. Even if the customer's preferences are very different from the general preferences, recommendations can be made that are different from the current trends by capturing the individuality of the customer's preferences. In the current study, we designed Interactive-SmartClerk to recommend three dresses from a dataset of 100 dress images. We collected 647 people's preferences for the dress images through cloud sourcing. We evaluated the performance of Interactive-SmartClerk and found that it performed better than the baseline system that always recommends commonly preferred dresses. We also found that Interactive-SmartClerk adapted to the preferences of minority people who preferred dresses that were not widely preferred. We carried out a case study using a real robot, and Interactive-SmartClerk recommended appropriate dresses for people with various preferences.
View full abstract
-
Ryuki MATSUOKA, Shiro KUMANO, Michita IMAI, Hiromi NARIMATSU
Session ID: 4M2-OS-14b-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Yuki SAKAMOTO, Takahisa UCHIDA, Hiroshi ISHIGURO
Session ID: 4M3-OS-14c-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Motoaki SATO, Sota KITADA, Kazunori TERADA
Session ID: 4M3-OS-14c-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Ryuhei MIYAZTO, Hsin-Tai WU, Kei HARADA, Kazushi OKAMOTO, Atsushi SHIB ...
Session ID: 4M3-OS-14c-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Automatic document summarization is a technique that extracts important elements from an original document and condenses them into short sentences. In this research, we propose an aspect-based summarization method for novels that generates summaries focusing on specific aspects. Specifically, we utilize Large Language Model (LLM) to extract the relationships between characters and events within the text and identify the aspect of each part of the novel based on these relationships. Subsequently, we collect the parts corresponding to the target aspect and generate a summary sentence for each aspect. To evaluate the summaries, we compare the answer accuracy to question-answer (QA) pairs created from the set of sentences corresponding to each aspect, using both the original document and the generated summary as references. The results demonstrate that the proposed method can generate summaries that comprehensively reflect the target aspect, a capability that was difficult to achieve with conventional methods.
View full abstract
-
MASATO TANAKA, TAKESHI HARA, TSUGUMI MATSUO, SEIJI YAMADA
Session ID: 4M3-OS-14c-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
[Purpose] We furthered the study "Learning and understanding lung structure described by the context of radiological, anatomical, and pathological images" presented at the 38th Japanese Society for Artificial Intelligence in 2024, and attempted to generate learning content with the accuracy that can actually be used by students by inputting individual image data that form the context and the linguistic information that explains them into LLM (GPT-4o). [Method] LLM learning was performed using prompt learning and fine-tuning. For prompt learning, multi-modal data was used, which used image data and linguistic information as learning data. The output was educational content in a Q&A format. [Results] When generating educational resources using one-shot learning with prompts, the resulting Q&As were highly logical and promoted understanding of the input learning information. On the other hand, when learning using Fine-Tuning, Q&As containing incorrect information that deviated from the learning data were occasionally found. Based on this experiment, we believe that generating learning resources using Prompt Engineering is suitable for practical use.
View full abstract
-
Marina MIKAMI, Takuya MATUZAKI
Session ID: 4N1-GS-7-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
The purpose of this study is to create an accent dictionary for low-frequency words by automatically detecting the phonetic accent position of words based on TV speech data, in which many low-frequency words such as proper nouns and new words appear. First, the fo value of each phoneme, mora pronunciation, mora position within a word, and part of speech were extracted as features from the speech data, and a classifier was created using these as input. The model was trained on speech data from Corpus of Spontaneous Japanese (CSJ) and used to predict the accent positions of the nouns that appear in LaboroTVSpeech. The accuracy of the resulting accent dictionary was only 77-86%.
View full abstract
-
Komei HIRUTA, Yosuke YAMANO, Hideaki TAMORI
Session ID: 4N1-GS-7-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
-
Satoshi KAWAMURA, Kohei YAMAMOTO, Hideaki TAMAI
Session ID: 4N1-GS-7-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
The detection of anomalous sounds is crucial for the efficient operation and maintenance of factory machinery. However, obtaining real-world anomalous data is often impractical, limiting the training of anomaly detection models. To address this issue, classification-based approaches utilizing pseudo-anomalous data have garnered significant interest. Pseudo-anomalies can be generated by treating normal sounds from non-target machines as anomalies or by applying perturbations to normal sounds using neural networks. While the former approach necessitates machine-specific models, the latter may fail to adequately capture the unique characteristics of target machines. This study investigates various sound data augmentation techniques for generating pseudo-anomalies and systematically evaluates their combinations to improve anomaly detection performance. Experimental results demonstrate that certain augmentations substantially enhance detection accuracy. Furthermore, we analyze the individual impact of each method and discuss the influence of augmented sound data on anomaly detection models.
View full abstract
-
Gen SATO, Yusuke IKEDA
Session ID: 4N1-GS-7-04
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
Sound field simulation is used in various applications, such as acoustic design of concert halls, and generation of sound fields in virtual spaces. In particular, sound field simulation based on the wave equation can provide highly accurate results. However, estimation time needs to be improved. Therefore, sound field estimation using deep learning has been proposed to reduce the computation time required for the simulation. On the other hand, when deep learning is used, the estimated sound field is typically limited to a fixed grid. Therefore, we proposed Deep Operator Networks (DeepONets) to estimate the sound field at arbitrary locations. In particular, we aim to improve generalization performance of DeepONet for different room shapes by introducing a convolutional neural network into the Branch network of DeepONet. Experiments were conducted in rectangular room sound fields, and the results demonstrated that DeepONet can estimate the sound field with a high accuracy, achieving a mean SNR of approximately 24 dB.
View full abstract
-
Tasuku KITADE, May Phyo KHAING, Masanori TSUJIKAWA, Koji OKABE, Ryo IS ...
Session ID: 4N1-GS-7-05
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
We are investigating a medical documentation assistant system that aims to improve the efficiency of record and report creation by physicians by automatically generating medical documents from the recognized results of speech. For this system, it is essential that medical terminology is recognized with high accuracy. Difficult to obtain medical data, we propose a method to correct speech recognition errors using word reading information without using it. Specifically, we detect speech recognition errors from the recognition results using a Large Language Model (LLM) and obtain the readings of words identified as recognition errors through morphological analysis. Furthermore, we extract words similar to those readings and finally select the appropriate word from them using the LLM to correct the recognition errors. Evaluation experiments using simulated medical speech recognition results confirmed that the proposed method achieved a 12.9% reduction in errors of medical terms.
View full abstract
-
Motoki CHIBA, Osamu ITO, Takumi IIDA
Session ID: 4N2-GS-7-01
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
With the advancement of autonomous driving technologies, accurately estimating traffic accident risks has become increasingly important. Advanced Driver-Assistance Systems (ADAS), which use sensors for obstacle detection and provide collision mitigation and evasive steering, have reduced traffic accidents. However, the diversity and complexity of accident scenarios limit the potential of traditional ADAS technologies to achieve further reductions in accidents. Recently, Vision-Language Model (VLM) has been applied to the autonomous driving field. While VLMs possess broad knowledge and achieve reasonable accuracy in traffic scene understanding, they struggle to evaluate accident risks involving detailed and complex factors. Fine-tuning VLM specialized for traffic accident risk estimation is necessary, but the significant cost of data collection and annotation poses practical challenges. This study proposes a traffic accident risk explanation method using multimodal Retrieval-Augmented Generation (RAG) to improve explanation performance efficiently with minimal data. By leveraging a small amount of manually annotated data for retrieval and reference, the proposed method enhances explanatory capabilities for previously unseen images. Experimental results show that the proposed method improves traffic accident scene understanding compared to baseline model.
View full abstract
-
Masahiro OKANO, Tomoyasu NANAUMI, Shuhei TSUKUI, Junuchiro FUJII
Session ID: 4N2-GS-7-02
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS
In recent years, due to the decreasing number of skilled engineers in the civil engineering field, there is an increasing demand for further efficiency improvements in tasks such as geological surveys. The quality evaluation of cores in boring surveys relies on visual observation and measurement by skilled engineers based on multiple evaluation indicators, such as maximum core length and Rock Quality Designation (RQD). Moreover, conventional automation methods have been limited to recognition through image binarization processing, requiring parameter adjustments depending on the imaging environment and geological variations. This study focuses on maximum core length and RQD, which have relatively clear quantitative criteria and can be determined from images. As a generalizable automatic measurement method that does not require additional training, usage of the features of normal maps obtained from an image-based foundation model is proposed. To verify the effectiveness of the proposed method, accuracy evaluations is conducted by using practical industry standards (maximum core length: MSE is 3 or less, RQD: MSE 10 or less) as evaluation criteria. The results confirmed the feasibility of automatic measurement within practical application standards.
View full abstract
-
Taketo SASAKI, Ryohei ORIHARA, Yasuyuki TAHARA, Akihiko OHSUGA, Yuichi ...
Session ID: 4N2-GS-7-03
Published: 2025
Released on J-STAGE: July 01, 2025
CONFERENCE PROCEEDINGS
FREE ACCESS