-
Lingfeng Zhao, Christina Garcia, Shunsuke Komizunai, Noriyo Colley, At ...
2025Volume 2025Issue 1 Pages
1-28
Published: May 19, 2025
Released on J-STAGE: May 16, 2025
JOURNAL
OPEN ACCESS
In this paper, we improve nursing activity recognition in gastrostomy tube feeding (GTF) with temporal variations and sequential errors by integrating activity context to Large Language Model (LLM) for guided feature selection and post-processing. GTF is a delicate nursing procedure that allows direct stomach access in children for supplemental feeding or medication, but it is underrepresented in datasets, posing challenges for accurate detection. Manual feature engineering may overlook subtle but important motion cues, particularly in opening and closing the gastrostomy cover, where changes are minimal and localized to the hands. Additionally, sequence inconsistencies and missed activities limit the effectiveness of pose estimation methods in healthcare. Leveraging the contextual adaptability of LLMs, we generate new features suggested by the language model, combining them with hand-crafted features to optimize the model. For post-processing, a sliding window smoothing method based on majority voting is applied. To mitigate duration-based discrepancies, a priority handling is incorporated for short-duration activities to pre-
serve their recognition accuracy while addressing repeated labels caused by long-duration actions. Particularly, we applied activity recognition to our unique GTF dataset collected from recorded video of two nurses, two students, and two staff members for three days with 17 labeled activities. Keypoints are extracted using YOLO11. Compared to the baseline, the application of LLM to GTF nurse activity recognition with pose estimation improved the Random Forest performance of F1-score from 54% to 57%. Additionally, incorporating the sliding window smoothing approach based on majority voting with short-term action priority, resulted in a 3% further increase.
View full abstract
-
Taihei Fujioka, Christina Garcia, Sozo Inoue
2025Volume 2025Issue 1 Pages
1-28
Published: May 19, 2025
Released on J-STAGE: May 16, 2025
JOURNAL
OPEN ACCESS
In this study, we propose to optimize temporal parameters with pose estimation data of simulated abnormal activities of developmentally disabled individuals by incorporating behavior context to Large Language Models (LLMs). Facilities for the developmentally disabled face the challenge of detecting abnormal behaviors because of limited staff and the difficulty of spotting subtle movements. Traditional methods often struggle to identify these behaviors because abnormal actions are irregular and unpredictable, leading to frequent misses or misclassifications. The main
contributions of this work is the creation of a unique dataset with labeled abnormal behaviors and the proposed application of LLMs to this dataset comparing results of Zero-Shot and Few-Shot. Our method leverages the context of the collected abnormal activity data to prompt LLMs to suggest window size, overlap rate, and LSTM model’s length sequence tailored to the specific characteristics of these activities. The dataset includes labeled video data collected for four days from five normal participants performing eight activities with four abnormal behaviors. The data was collected with normal participants to simulate activities, and no individuals with disabilities. For evaluation, we assessed all normal versus abnormal activities and per abnormal activity recognition comparing with
the baseline without LLM. The results showed that Few-Shot prompting delivered the best performance, with F1-score improvements of 7.69% for throwing things, 7.31% for attacking, 4.68% for head banging, and 1.24% for nail biting as compared to the baseline. Zero-Shot prompting also
demonstrated strong recognition capabilities, achieving F1 scores above 96% across all abnormal behaviors. By using LLM-driven suggestions with YOLOv7 pose data, we optimize temporal parameters, enhancing sensitivity to abnormal behaviors and generalization across activities. The model reliably identifies short, complex behaviors, making it ideal for real-world caregiving applications.
View full abstract
-
Kanae Matsui, Hibiki Kaneko
2025Volume 2025Issue 1 Pages
1-26
Published: June 05, 2025
Released on J-STAGE: June 05, 2025
JOURNAL
OPEN ACCESS
Recent advances in ubiquitous sensing technologies have enabled systems to recognize and support daily human activities for behavioral change interventions. While previous research has explored activity recognition in various healthcare domains, automated monitoring and support systems for daily skin care activities remain understudied. This paper presents a novel system that combines continuous sensor monitoring of skin conditions with personalized behavior change support. The system employs precision sensors to capture fine-grained temporal changes in skin moisture and sebum levels, enabling detailed activity recognition of users’ skin care routines and their effects. Based on the collected multimodal sensor data and identified behavioral patterns, the system implements an adaptive intervention mechanism that provides personalized recommendations aligned with individual skin types and care activities. Our research aims to promote sustainable behavior changes in skin care routines by helping users objectively understand the relationship between their daily activities and skin condition variations. Through experimental evaluation, we demonstrate how the system effectively recognizes skin care activities and influences user behavior for improved skin health maintenance. This work contributes to expanding activity recognition and behavior computing applications in the personal care domain.
View full abstract
-
Kaito Takayama, Shoko Kimura, Guillaume Lopez
2025Volume 2025Issue 1 Pages
1-18
Published: June 05, 2025
Released on J-STAGE: June 05, 2025
JOURNAL
OPEN ACCESS
To alleviate the tension that performers feel during a performance, we proposed a system that judges the audience’s laughter in real-time using machine learning and conveys the feedback to the performers. The conventional threshold-based laughter judgment was insufficient to alleviate tension.Therefore, this study adopted a method to accurately judge laughter using machine learning and reduce stress by providing vibrational feedback. In this experiment, changes in the tension level of a comedic comic performer were evaluated using the system, and a statistically significant tension-relieving effect was obtained. The questionnaire results also suggested areas for improvement, such as the usefulness of visual feedback. Besides, a comparison of the actual laughter timing and the laughter judgment using machine learning showed that the system could recognize laughter at approximately 60% the exact timing.
View full abstract
-
Zeyu Liu, Guillaume Lopez, Shoko Kimura
2025Volume 2025Issue 1 Pages
1-21
Published: June 05, 2025
Released on J-STAGE: June 05, 2025
JOURNAL
OPEN ACCESS
A 2023 survey of over 4,000 regular fitness enthusiasts revealed that wearable devices have remained a prominent topic. Studies have shown that planned fitness activities and long-term progress tracking can enhance motivation and provide accurate fitness evaluations. However, most current fitness tracking applications require manual data entry before and after workouts, potentially distracting users and reducing workout effectiveness. Few applications can automatically record diverse fitness movements for extended periods.
To address these challenges, this study aims to develop a system using Android-based smartwatches and smartphones to recognize and quantify users’ fitness-related movements automatically. By eliminating manual operation, the system offers long-term feedback on fitness activities. The research comprises four key components: (1) Training two machine learning models to recognize motion states and fitness movements with feature dimensionality reduction for real-time mobile operation; (2) Proposing the Double-Layer Sliding Window method to recognize and count fitness movements during exercise based on sliding windows and peak detection; (3) Developing an algorithm for automatic fitness movement recognition and counting in long-term exercise environments; (4) Conducting experiments in Python environments and analyzing statistical data. Results demonstrated the potential of this approach in automatic fitness activity recording.
View full abstract
-
Shunsuke Miyazawa, Shoko Kimura, Guillaume Lopez
2025Volume 2025Issue 1 Pages
1-17
Published: June 05, 2025
Released on J-STAGE: June 05, 2025
JOURNAL
OPEN ACCESS
By analyzing sports movements, performance and skill evaluation can be performed, which can assist players in improving their skills. In basketball shooting, joint movement is essential, particularly on the elbow, shoulder, and wrist. Most previous studies have relied on camera-based systems, which can be costly and environmentally dependent. Wearable device-based systems are also limited in evaluation metrics, often lacking sufficient data for technical support. This study focuses on beginner basketball players, aiming to guide them toward a perfect shooting form by providing feedback on the forearm angle in the set position to achieve an optimal forearm angle. The ideal angle was set at 32±5 degrees based on the forearm angles observed during shots by three experienced players. This system utilizes a smartwatch and smart glass. The smartwatch measures the forearm angle, and based on this data, smart glass provides real-time feedback. When the angle is ideal, the screen shows green; when not, it shows red, prompting the user to adjust the arm angle until it turns green. In this experiment, 20 inexperienced players shot 10 regular and 10 free throws using the system. The evaluation metrics included the forearm angle at the set position, its variability, and the system’s effectiveness as measured by the System Usability Scale (SUS). As a result, when using the system, all participants achieved the ideal forearm angle at the set position, improving their shooting form during the set phase. In forearm angle variability, most players could reduce variability, leading to a more stable shooting form. The SUS evaluation showed an average score of 73.8, indicating good usability. Post-experiment surveys revealed minimal discomfort during shooting due to the devices, suggesting a minor impact from wearing the devices. Many players reported feeling an improvement in shooting form by focusing on the arm angle, confirming the training effectiveness of the system from a subjective perspective. As prospects, Since some subjects found it difficult to hold the position above their heads during the set, it is necessary to examine whether feedback during the hold and during the set would help them acquire the ideal shooting form. Furthermore, creating a system to suggest the appropriate force for throwing the ball could further improve shooting accuracy.
View full abstract
-
Iqbal Hassan, Nazmun Nahid, Sozo Inoue
2025Volume 2025Issue 1 Pages
1-25
Published: June 05, 2025
Released on J-STAGE: June 05, 2025
JOURNAL
OPEN ACCESS
-
Milyun Ni’ma Shoumi, Defry Hamdhana, Kazumasa Harada, Hitomi Oshita, S ...
2025Volume 2025Issue 1 Pages
1-50
Published: June 05, 2025
Released on J-STAGE: June 05, 2025
JOURNAL
OPEN ACCESS
In this paper, we propose a modular framework that integrates fewshot and Generated Knowledge Prompting (FS-GKP) for health information extraction and summarization from nurse-elderly conversation transcripts. This tasks is essential for monitoring elderly patients and assisting nurses in completing the visiting nurse form. FS-GKP generates additional domain-specific knowledge from transcription data, which serves as the basis for more accurate extraction and summarization. FS-GKP uses a structured chain of prompts that allows each step to build on the previous step, thus improving interpretability and precision. Experiments reveal that the GKP using few-shot technique significantly enhances extraction performance with average accuracy across all health categories is 78.57%, outperforming individual methods like zero-shot (52.49%) and few-shot (45.24%). FS-GKP also provides the best results for the summarization task compared to the other five techniques (zero-shot, few-shot, Chain-of-Thouth (CoT), Self-consistency, Few-shot CoT) with ROUGE-1: 0.43, ROUGE-2: 0.22, ROUGE-L: 0.32, BLEU: 0.28, BERTScore Precision: 0.75, Recall: 0.72, F1: 0.73, and SBERT Cosine Similarity: 0.83. These results highlight the potential of FS-GKP, to improve the accuracy of health information extraction and streamline the summarization process, effectively aligning it with categories in visiting nurse forms.
View full abstract
-
Ankur Bhatt, Ko Watanabe, Jayasankar Santhosh, Andreas Dengel, Shoya I ...
2025Volume 2025Issue 1 Pages
1-24
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
JOURNAL
OPEN ACCESS
Self-confidence is a pivotal trait that profoundly impacts performance across various life domains. It fosters positive outcomes by facilitating quick decision-making and timely actions. In the context of video-based learning, accurate detection of self-confidence is critical as it enables the provision of personalized feedback, thereby enhancing learners’ experiences and improving their confidence levels. This study addresses the challenge of self-confidence detection by evaluating and comparing traditional machine-learning methods with an advanced deep-learning approach using eye-tracking data collected through two distinct modalities: an eye-tracker and an appearance-based model. Our experimental setup involved fourteen participants, each of whom viewed eight distinct videos and provided corresponding responses. To analyze and assess the collected data, we implemented and compared five different algorithms: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and a deep-learning based 1D Convolutional Neural Network (1D CNN) and Transformer models. The 1D CNN model achieved the highest macro F1-scores using leave-one-participant-out cross-validation (LOPOCV), with performances of 0.662 on eye-tracking data and 0.635 on appearance-based data. In contrast, under leave-one-question-out cross-validation (LOQOCV), Logistic Regression demonstrated superior performance for eye-tracking data (F1-score: 0.560), while Transformer-based models yielded the highest F1-score (0.616) for appearance-based data. These findings underscore the effectiveness of deep learning in capturing complex gaze behavior patterns, thereby providing a robust framework for estimating self-confidence in video-based learning environments.
View full abstract
-
Yukihiro Kadono, Naoya Yoshimura, Takuya Maekawa, Takahiro Hara
2025Volume 2025Issue 1 Pages
1-28
Published: June 28, 2025
Released on J-STAGE: June 28, 2025
JOURNAL
OPEN ACCESS
This study investigates how to handle erroneous results of probabilistic activity recognition models in industrial applications. Sensor-based automated methods to recognize work activities has various practical applications in industrial domains such as detecting bottlenecks in the flow of tasks, detecting outliers in work processes, reviewing work processes, and managing labor. Although errors in the results of activity recognition models that rely on probabilistic machine learning processes are inevitable, to the best of our knowledge, methods for handling the erroneous results in industrial applications have not been thoroughly investigated. As a case study on outlier detection in work processes, we experimentally investigate methods for performing reliable outlier detection based on the outputs of probabilistic activity recognition models by using large-scale data on packaging work processes.
View full abstract
-
Ko Watanabe, Yuki Matsuda, Yugo Nakamura, Yutaka Arakawa , Shoya Ishim ...
2025Volume 2025Issue 1 Pages
1-17
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
JOURNAL
OPEN ACCESS
In programming education, fostering self-regulated learning (SRL) skills is essential for both students and teachers. This paper introduces Track-ThinkDashboard, an application designed to visualize the learning workflow by combining web browsing and programming logs in one unified view. The system aims to (1) help students monitor and reflect on their problem-solving processes, identify knowledge gaps, and cultivate effective SRL strategies, and (2) enable teachers to identify at-risk learners more effectively and provide targeted, data-driven guidance. We conducted a study with 33 participants (32 male, one female) from Japanese universities—some with prior programming instruction and some without—to explore differences in web browsing and coding patterns. The dashboards revealed multiple learning approaches (e.g., try-and-error, try-and-search, and more) and highlighted how domain knowledge influenced overall activity flow. We discuss how this visualization can be used continuously or in one-off experiments, the privacy considerations involved, and opportunities for expanding data sources for richer behavioral insights.
View full abstract
-
Koji Yokoyama, Goshiro Yamamoto, Chang Liu, Sho Mitarai, Kazumasa Kish ...
2025Volume 2025Issue 1 Pages
1-27
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
JOURNAL
OPEN ACCESS
Vision-based human activity recognition achieved high-accuracy activity classification on supervised data. However, applying these models to complex real-world videos requires a field-specific analysis of unique activity features. This research aims to automatically identify and classify the individual role of each member of the surgical team. We introduce a categorical variational autoencoder-based role classification model us- ing semi-supervised training. In addition, we analyze how the activities’ features represented by the model’s latent variables change according to intraoperative situations. To test our algorithm, we use surveillance videos of three entire pieces of operation as testbeds. Results show that changes in the latent variables of our role classification model, as well as the pre- diction results, have the potential to describe intraoperative situations and events. Our algorithm is expected to improve surgical evaluation, education, and event detection, contributing to improved safety and operational efficiency.
View full abstract
-
Ignatius Iwan, Bernardo Nugroho Yahya, Seok-Lyong Lee
2025Volume 2025Issue 1 Pages
1-17
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
JOURNAL
OPEN ACCESS
User or client activity data often contains sensitive personal information, making privacy a critical concern. Federated Learning (FL) has emerged as a promising solution for training activity recognition models without directly sharing user data. However, real-world sensor data collected from client devices is typically sparse and exhibits imbalanced distributions across clients, a phenomenon known as non-independently and identically distributed (non-IID) data. This non-IID nature degrades the performance of traditional FL frameworks, which aim to produce a single global model for all clients. Such an approach is insufficient to capture the unique activity patterns of individual clients. Furthermore, considering that clients themselves are unique in terms of individual, a single global model is inadequate to learn all clients’ specific pattern. In- stead of relying on a single model, this work draws inspiration from the mixture-of-experts model and proposes a novel framework that synergizes the global model with the local model which can be personalized. The global model learns generalizable features applicable across clients, while the local model captures user-specific patterns. During prediction, a fusion mechanism optimally combines knowledge from the global and local models to improve classification accuracy. Extensive experiments on public datasets collected from both controlled environments and real-world settings demonstrate the framework’s effectiveness in improving model performance and adapting to diverse.
View full abstract
-
Kyoya Iwatsuru, Shoya Ishimaru, Andrew Vargo, Koichi Kise
2025Volume 2025Issue 1 Pages
1-15
Published: July 01, 2025
Released on J-STAGE: July 01, 2025
JOURNAL
OPEN ACCESS
This paper presents a system that supports a reader by identifying difficult paragraphs using eye-tracking implicitly and generating comprehension questions with a Large Language Model (LLM). Our aim is to evaluate the potential of generative AI as a tool to augment human cognitive capabilities. If readers answer the questions correctly, this confirms that they understand the text. Conversely, if they answer incorrectly, it indicates a lack of understanding, allowing the system to provide explanations for the misunderstood parts, thereby improving the readers’ understanding of the text. Our experiment showed that half of the participants improved their comprehension of the text. Further analysis revealed that the motivation to read/use the system is one of the most important factors to increase the effectiveness of the proposed system.
View full abstract
-
Haruki Kai, Tsuyoshi Okita
2025Volume 2025Issue 1 Pages
1-25
Published: July 02, 2025
Released on J-STAGE: July 02, 2025
JOURNAL
OPEN ACCESS
We developed a deep learning algorithm for human activity recognition using sensor signals as input. In this study, we built a pre-trained language model based on the Transformer architecture, which is widely used in natural language processing. By leveraging this pre-trained model, wE aimed to improve performance on the downstream task of human activity recognition. While this task can be addressed using a vanilla Transformer, we propose an enhanced n-dimensional numerical processing Transformer that incorporates three key features: embedding n-dimensional numerical data through a linear layer, binning-based preprocessing, and a linear transformation in the output layer. We evaluated the effectiveness of our proposed model across five different datasets. Compared to the vanilla Transformer, our model demonstrated a 10%–15% improvement in accuracy.
View full abstract
-
Kunpeng Zhao, Asahi Miyazaki, Tsuyoshi Okita
2025Volume 2025Issue 1 Pages
1-36
Published: July 02, 2025
Released on J-STAGE: July 02, 2025
JOURNAL
OPEN ACCESS
Human Activity Recognition (HAR) has recently witnessed advancements with Transformer-based models. Especially, Action-Former shows us a new perspectives for HAR in the sense that this approach gives us additional outputs which detect the border of the activities as well as the activity labels. ActionFormer was originally proposed with its input as image/video. However, this was converted to with its input as sensor signals as well. We analyze this extensively in terms of deep learning architectures. Based on the report of high temporal dynamics which limits the model’s ability to capture subtle changes effectively and of the interdependencies between the spatial and temporal features. We propose the modified Action-Former which will decrease these defects for sensor signals. The key to our approach lies in accordance with the Sequence-and-Excitation strategy to minimize the increase in additional parameters and opt for the swish activation function to retain the information about direction in the negative range. Experiments on the WEAR dataset show that our method achieves substantial improvement of a 16.01% in terms of average mAP for inertial data.
View full abstract
-
Gao Huayu, Huang Tengjiu, Ye Xiaolong, Tsuyoshi Okita
2025Volume 2025Issue 1 Pages
1-32
Published: July 02, 2025
Released on J-STAGE: July 02, 2025
JOURNAL
OPEN ACCESS
AI-based motion capture is an emerging technology that offers a cost-effective alternative to traditional motion capture systems. However, current AI motion capture methods rely entirely on observed video sequences, similar to conventional motion capture. This means that all human actions must be predefined, and movements outside the observed sequences are not possible. To address this limitation, we aim to apply AI motion capture to virtual humans, where flexible actions beyond the observed sequences are required. We assume that while many action fragments exist in the training data, the transitions between them may be missing. To bridge these gaps, we propose a diffusion-model-based action completion technique that generates complementary human motion sequences, ensuring smooth and continuous movements. By introducing a gate module and a position-time embedding module, our approach achieves competitive results on the Human3.6M dataset. Our experimental results show that (1) MDC-Net outperforms existing methods in ADE, FDE, and MMADE but is slightly less accurate in MMFDE, (2) MDC-Net has a smaller model size (16.84M) compared to HumanMAC (28.40M), and (3) MDC-Net generates more natural and coherent motion sequences. Additionally, we propose a method for extracting sensor data, including acceleration and angular velocity, from human motion sequences.
View full abstract