International Journal of Activity and Behavior Computing

Summary of the Virtual Data Generation for Complex Industrial Activity Recognition

Qingxin Xia, Christina Garcia, Umang Dobhal, Min Xinyi, Tanigaki Kei, ...

2025 Volume 2025 Issue 2 Pages 1-12
Published: May 21, 2025
Released on J-STAGE: May 21, 2025

DOIhttps://doi.org/10.60401/ijabc.101

JOURNAL OPEN ACCESS

Show abstractHide abstract

This paper presents a summary of the virtual data generation for complex industrial activity recognition Challenge, which focused on exploring virtual data generation techniques to improve the performance of human activity recognition (HAR) in complex industrial environments. The challenge utilized the OpenPack dataset, a large-scale multimodal collection of sensor data captured during real-world packaging operations. Participants were tasked with generating synthetic accelerometer data to augment a baseline HAR model. Four teams from different countries proposed diverse approaches, including interpolation, classical augmentations, variational autoencoders, and GAN-based methods. Their submissions were evaluated using micro F1 score across multiple random seeds to test robustness. The results reveal that while deep generative models offer strong potential, simpler signal-based techniques also perform competitively when wellaligned with the data structure. Additionally, incorporating finer-grained action labels within each operation can help guide more realistic virtual data generation, leading to improved HAR model performance by better capturing intra-operation dynamics. Based on these findings, we discuss key insights and suggest future directions for designing robust, semantically consistent, and computationally efficient virtual data generation pipelines for industrial HAR applications.

View full abstract

Download PDF (1507K)
Class-Conditioned Variational Autoencoder with Evolutionary Optimization for the Virtual Data Generation Challenge

Yuuki Tachioka

2025 Volume 2025 Issue 2 Pages 1-14
Published: May 22, 2025
Released on J-STAGE: May 22, 2025

DOIhttps://doi.org/10.60401/ijabc.105

JOURNAL OPEN ACCESS

Show abstractHide abstract

Human activity recognition (HAR) plays a crucial role in optimizing packing operations in industrial settings. However, HAR performance is often constrained by the limited availability of labeled data. To address this issue, we propose a novel data augmentation framework using conditional variational autoencoders (CVAE) to generate high-quality synthetic data. Our approach ensures class consistency while increasing data diversity by conditioning the generation process on activity labels. Additionally, we optimize hyperparameters for data generation using evolutionary computation, further improving recognition accuracy. The proposed method is validated on the OpenPack dataset, demonstrating its effectiveness in enhancing HAR performance without modifying the recognition model itself. Our key contributions include the introduction of a robust data augmentation pipeline, the application of CVAE for HAR, and the use of evolutionary computation to optimize data generation. Our model trained with data augmentation achieved an F1 score of 53.4%, while the recognition model trained without data augmentation achieved an F1 score of 48.1%.

View full abstract

Download PDF (1165K)
A Data-Centric Approach to Human Activity Recognition: Enhancing Industrial HAR with Virtual Data Generation

Matthias Tschope, Kirsten Harms, Daniel Eckhoff, Paul Lukowicz, Andrea ...

2025 Volume 2025 Issue 2 Pages 1-16
Published: May 22, 2025
Released on J-STAGE: May 22, 2025

DOIhttps://doi.org/10.60401/ijabc.106

JOURNAL OPEN ACCESS

Show abstractHide abstract

Machine learning algorithms depend heavily on the availability of data. Since data collection can be both time-consuming and costly, virtual data generation and augmentation techniques are commonly used. Given that inertial measurement unit (IMU) datasets for industrial human activity recognition (HAR) are typically small, we propose an approach that combines traditional augmentation techniques with class-independent and class-dependent sliding window generation to enhance the performance of the given HAR classifier. This work was part of the Virtual Data Generation for Complex Industrial Activity Recognition challenge. We consider five augmentation methods and three resampling methods. Our experiments demonstrate an improvement of 7.04 percentage points over the baseline accuracy of 55.89%. The code is available on GitHub https://github.com/b02a8S0E/ABCDChallengePacking.

View full abstract

Download PDF (455K)
Virtual Data Generation for Complex Industrial Activity Recognition : An Approach Based on Interpolation Algorithms and Data Augmentation Techniques

Atsushi Yanagisawa

2025 Volume 2025 Issue 2 Pages 1-21
Published: May 22, 2025
Released on J-STAGE: May 22, 2025

DOIhttps://doi.org/10.60401/ijabc.107

JOURNAL OPEN ACCESS

Show abstractHide abstract

In this study, we propose a virtual data generation method tailored for the OpenPack Challenge, which leverages acceleration sensor data along with associated operation and action labels. Our approach integrates simple yet effective data augmentation techniques (e.g., jitter, axis switching, and time warping) with advanced interpolation algorithms (including various forms of RBF and FFT-based methods) to generate high-quality virtual data. Experimental findings indicate that data-driven methods specifically adapted to the characteristics of sensor signals can substantially improve HAR model performance, offering a promising solution in scenarios where large-scale data collection is impractical. Specifically, by grouping the data according to the action and operation labels and applying an RBF-based interpolation algorithm to each group, an F1 score of 0.6379 was achieved. However, employing more advanced data generation algorithms such as GANs may further improve performance.

View full abstract

Download PDF (275K)
Depression Detection via Facial Expressions and Movement Analysis Using Machine Learning with Optimized Feature Selection

Bagus Hardiansyah, Fajar Astuti Hermawati, Dymas Adi Saputra, Danara D ...

2025 Volume 2025 Issue 2 Pages 1-25
Published: May 22, 2025
Released on J-STAGE: May 22, 2025

DOIhttps://doi.org/10.60401/ijabc.108

JOURNAL OPEN ACCESS

Show abstractHide abstract

Mental health disorders affect millions of people globally, posing considerable challenges for the detection and monitoring of depression. In this study, we aim to introduce several feature selection and machine learning approaches to optimize the data on facial behavior AUs, probabilities classification, head Euler angles, and facial landmarks with 25 participants. We computed statistical features for AUs, probabilities classification, head Euler angles, and facial landmarks such as min, max, mean, median, sum, and standard deviation of each facial data point from 25 participants for detection value data from PHQ-9 label depression episode between 0 and 1. Therefore, we achieve significant indicators of depressive episodes, achieving 0.94% of AUROC with the model SVC approach, 0.92% of AUROC with the model random forest classifier approach, 0.91% of AUROC with the model XGBClassifier approach, and 0.91% of AUROC with the model ANN machine learning approach, respectively.

View full abstract

Download PDF (654K)
Transformer-based Depression Detection Model using FacePsy Data using Facial Action Units

Sujay Saha, Maharshi Niloy, Brototy Saha

2025 Volume 2025 Issue 2 Pages 1-12
Published: May 22, 2025
Released on J-STAGE: May 22, 2025

DOIhttps://doi.org/10.60401/ijabc.109

JOURNAL OPEN ACCESS

Show abstractHide abstract

As a psychological disease, depression needs to be detected and treated. However, the detection of depression is hard, and there is hardly any automated way to assess depression. In this paper, we worked on a dataset named ”FacePsy” which consists of facial landmark data of 25 participants. The depression among the participants was measured using the phq9 scale over a period of time. Using the dataset, we build a transformer-based model with an attention mechanism. The major challenge was to prepare the dataset to fit into the transformer model. The dataset was not evenly balanced, which created a bias in the results. However, we tried to minimize that by adding the self-attention mechanism to our model. In the universal method for evaluation, our proposed method achieved an accuracy of 93%, a Precision of 91%, and an F1-score of 84%. While in the hybrid model evaluation, our proposed method achieved an accuracy of 62% and an F1-score of 46%. This paper was submitted in ”BeyondSmile: Detecting Depression through Facial Behavior and Head Gestures” challenge as part Activity and Behavior Computing 2025 Conference.

View full abstract

Download PDF (259K)
Temporal-aware Ensemble Learning Facial Behavior Analysis for Accurate Depression Assessment

Muhammad Abdul Latief, Andi Prademon Yunus, Raphon Galuh Chandraningty ...

2025 Volume 2025 Issue 2 Pages 1-25
Published: May 23, 2025
Released on J-STAGE: May 23, 2025

DOIhttps://doi.org/10.60401/ijabc.110

JOURNAL OPEN ACCESS

Show abstractHide abstract

Depression is a significant global health burden, affecting over 322 million people worldwide and projected to surpass cardiovascular disease as the leading cause of disability by 2030. Despite advancements in mental health services, accurate and accessible diagnostic methods remain a critical challenge. Traditional approaches, such as psychiatric consultations, face limitations due to the physician-patient ratio and reliance on subjective self-report scales, which can lead to inaccuracies. Recent research has explored alternative methods, including facial behavior analysis, for objective depression assessment. This approach is cost-effective, non-invasive, and suitable for real-world applications. This study builds upon existing research, such as FacePsy and MoodCapture, by introducing a temporalaware ensemble learning framework that enhances depression assessment by integrating multiple models to capture both static and dynamic facial behavior patterns. Through comprehensive experiments, we demonstrate that models trained on specific facial features, such as Eyes Open and Smiling Probability, achieve superior performance, with F1 Scores ranging from 0.7093 to 0.7388, accuracy from 0.6607 to 0.7058, and AUROC from 0.7432 to 0.7845. In contrast, models trained on full feature sets exhibit lower performance, highlighting the importance of effective feature selection and pre-processing. The incorporation of temporal modeling further refines depression detection by capturing subtle facial dynamics that static models may overlook. This study bridges theoretical research with practical applications, fostering the development of innovative solutions for addressing the mental health crisis.

View full abstract

Download PDF (395K)
Leveraging Machine Learning and Deep Learning for Enhanced Parkinson’s Disease Symptom Analysis

Happy Gery Pangestu, Andi Prademon Yunus, Raphon Galuh Chandraningtyas ...

2025 Volume 2025 Issue 2 Pages 1-22
Published: May 23, 2025
Released on J-STAGE: May 23, 2025

DOIhttps://doi.org/10.60401/ijabc.111

JOURNAL OPEN ACCESS

Show abstractHide abstract

Parkinson’s disease (PD) remains a significant challenge in medical diagnosis and monitoring, requiring robust machine learning techniques to enhance predictive accuracy and improve patient outcomes. This study explores the use of Logistic Regression, ensemble learning models, and deep learning LSTM model to classify PD symptoms more effectively and reliably. Additionally, feature engineering techniques such as velocity, acceleration, sliding window, rolling mean, rolling variance, and time difference are incorporated to enhance model performance, capture essential movement patterns, and mitigate data noise. Experimental results demonstrate that feature engineering significantly improves classification accuracy, with Decision Tree achieving the highest testing accuracy of 0.989, Other tree models such as XGBoost and Random forest also give good result in accuration of 0.979 and 0.961 when combined using sliding window. Furthermore, the integration of deep recurrent architectures enhances the detection of temporal patterns in PD tremors, with LSTM achieving a testing accuracy of 0.73. These findings highlight the effectiveness of combining ensemble learning with deep learning for PD detection, offering potential improvements for early diagnosis, continuous patient monitoring, personalized treatment strategies, and disease progression assessment.

View full abstract

Download PDF (696K)
Deep Convolutional Long Short-Term Memory Attention for Parkinson’s Activity Recognition

Shangai Li, Demirhan Hilmi

2025 Volume 2025 Issue 2 Pages 1-18
Published: May 23, 2025
Released on J-STAGE: May 23, 2025

DOIhttps://doi.org/10.60401/ijabc.112

JOURNAL OPEN ACCESS

Show abstractHide abstract

This research studies the difficulties in identifying activity patterns in individuals with Parkinson’s disease through triaxial accelerometer data, where the fundamental discrepancy between millisecond-level sensor outputs and minute-level activity labels presents considerable preprocessing obstacles. To maintain label consistency, we omitted minutes displaying mixed activity, resulting in some data loss. Traditional machine learning techniques, such sliding window segmentation, time and frequency domain feature extraction, and random forests, provide valuable insights. Our principal contribution resides in deep learning. We suggested a hybrid model for deep learning named DeepConvLSTM-Attention. This model integrates convolutional neural networks for the extraction of robust local features with recurrent neural networks augmented by attention mechanisms to proficiently capture temporal dependencies. The findings from our studies indicate that DeepConvLSTM Attention significantly improves both the F1 score and overall accuracy compared to standalone LSTM and CNN integrated with LSTM architectures. This underscores its capacity for enhanced activity identification and tailored therapy approaches in the management of Parkinson’s disease.

View full abstract

Download PDF (2340K)
Tackling Data Scarcity in Tremor Challenge: Class-Conditional Data Augmentation

Yuuki Tachioka

2025 Volume 2025 Issue 2 Pages 1-18
Published: May 23, 2025
Released on J-STAGE: May 23, 2025

DOIhttps://doi.org/10.60401/ijabc.113

JOURNAL OPEN ACCESS

Show abstractHide abstract

With the progression of an aging society, early detection of neurodegenerative diseases such as Parkinson’s disease (PD) has become increasingly important. Tremors, a primary symptom of PD, serve as key indicators of disease progression and treatment efficacy. However, the scarcity of tremor-related data poses a significant challenge in developing robust human activity recognition (HAR) models. To address this issue, we propose a data augmentation framework using a conditional variational autoencoder to generate high-quality synthetic data conditioned on activity labels. Additionally, we employ evolutionary computation to optimize hyperparameters for data generation, further improving model performance. Our approach enhances both the diversity and robustness of training datasets, enabling the development of more accurate recognition models. Experimental results demonstrate that our framework significantly improves HAR, particularly in scenarios with limited real-world data. This method provides a scalable solution to advance PD detection and monitoring through multimodal sensor-based HAR. Our model trained with data augmentation achieved an F1 score of 45.6% in three different sets, while the recognition model trained without data augmentation achieved an F1 score of 31.6%.

View full abstract

Download PDF (1354K)
Recognizing Normal and Unusual Activities for Parkinson’s Disease using Multimodal Sensor Data

Manqi Zhang, Xinyan Hu, Gulustan Dogan

2025 Volume 2025 Issue 2 Pages 1-16
Published: May 23, 2025
Released on J-STAGE: May 23, 2025

DOIhttps://doi.org/10.60401/ijabc.114

JOURNAL OPEN ACCESS

Show abstractHide abstract

Early detection of Parkinson’s disease (PD) motor symptoms is crucial for improving clinical outcomes. This study presents a convolutional neural network-long short-term memory (CNN-LSTM) model for classifying PD-related activities using accelerometer data, developed as part of the ABC Challenge 2024. Our approach integrates a sliding window strategy with signal mutation detection to address data alignment challenges, combined with time-frequency feature extraction and temporal pattern modeling. Evaluated on a dataset of 9 subjects performing 10 activities, the model achieves an F1 score of 80.2% through leave-one-subject-out cross-validation. The results demonstrate the framework’s capability for continuous monitoring of symptoms such as tremors and freezing of gait. Future work will focus on expanding the dataset and optimizing the model architecture for real-world deployment.

View full abstract

Download PDF (298K)
Native Arabic EEG-based Silent Speech Decoding Using Deep Learning Techniques

Taveena Lotey, Salini Yadav, Partha Pratim Roy

2025 Volume 2025 Issue 2 Pages 1-19
Published: June 05, 2025
Released on J-STAGE: June 05, 2025

DOIhttps://doi.org/10.60401/ijabc.115

JOURNAL OPEN ACCESS

Show abstractHide abstract

Silent Speech Recognition (SSR) using electroencephalography (EEG) is an emerging area in brain-computer interface (BCI) research, enabling communication without vocal articulation. However, EEG-based SSR remains challenging due to low signal-to-noise ratio, inter-subject variability, and limited training data. This study explores machine learning (ML) and deep learning (DL) models for EEG-based silent speech decoding, utilizing the Native Arabic Silent Speech Dataset, which consists of EEG recordings from ten participants performing six distinct silent speech commands. A comprehensive preprocessing pipeline, including epoching, baseline correction, Independent Component Analysis (ICA) for artifact removal, and bandpass filtering, is applied to enhance signal quality. We evaluate both traditional ML classifiers (Support Vector Machines, Random Forests, and K-Nearest Neighbors) and DL models such as ShallowNet, EEGNet, Long-Short Term Memory (LSTM), and EEG-Conformer to assess their effectiveness in silent speech decoding. Our best-performing model, LSTM, achieved an accuracy of 19.57% and 22.79% under cross-subject evaluation and subject-wise evaluation, respectively. The study highlights the challenges in generalizing EEG-based SSR models and the need for improved domain adaptation techniques for better classification performance. This research is part of the Silent Speech Decoding Challenge (SSDC) challenge of Activity and Behavior Computing (ABC) conference.

View full abstract

Download PDF (1447K)
Development of an EEG-Based Silent Speech Recognition Model on the Native Arabic Silent Speech Dataset Using Light BERT Architecture

Masaki Shuzo, Reiya Hiramoto, Ryoma Ishigaki, Shingo Ando, Motoki Saka ...

2025 Volume 2025 Issue 2 Pages 1-16
Published: June 05, 2025
Released on J-STAGE: June 05, 2025

DOIhttps://doi.org/10.60401/ijabc.116

JOURNAL OPEN ACCESS

Show abstractHide abstract

This study participated in the Silent Speech Decoding Challenge (SSDC) and investigated the application of a lightweight BERT-based architecture for EEG-based silent speech recognition. We pre-trained a foundation model using publicly available EEG data and fine-tuned it on the SSDC dataset. The model was evaluated on six silent speech commands: “right,” “left,” “up,” “down,” “select,” and “cancel.” The average accuracy and F1 score across all eight subjects were 0.165 and 0.137, respectively. Subject 5 achieved the highest discrimination performance, with an accuracy of 0.239 and an F1 score of 0.223. However, the overall classification performance remained below 25%. The confusion matrix analysis revealed frequent misclassifications across multiple classes, highlighting the challenges of EEG-based silent speech recognition. Accuracy varied across subjects, with the highest exceeding 20% and the lowest below 10%. These findings indicate that while pre-training captured meaningful EEG signal representations, the classification accuracy after fine-tuning was limited, emphasizing the difficulty of silent speech recognition using EEG. Despite these challenges, our approach provides insights into EEG-based classification and demonstrates the potential of BERT-based architectures for future research in this domain.

View full abstract

Download PDF (592K)
Inner Speech Recognition Using EEG Signals with Reservoir Computing

Arie Rachmad Syulistyo, Yuichiro Tanaka, Hakaru Tamukoh

2025 Volume 2025 Issue 2 Pages 1-19
Published: June 05, 2025
Released on J-STAGE: June 05, 2025

DOIhttps://doi.org/10.60401/ijabc.117

JOURNAL OPEN ACCESS

Show abstractHide abstract

Inner speech recognition using electroencephalography (EEG) signals holds significant potential for advancing brain-computer interface (BCI), particularly for individuals with speech impairments. However, decoding inner speech from EEG data remains a challenging task due to the nonlinear, high-dimensional and temporal dynamic nature of neural signals. In order to address the challenges, this study explores the application of reservoir computing (RC), with a particular focus on bidirectional RC, for the purpose of classifying inner speech from EEG signals. Contrary to unidirectional RC, which processes data in a single time direction, bidirectional RC captures both past and future temporal dependencies. This enhancement of the extraction of meaningful EEG features for classification is a significant contribution of this study. We evaluate the performance of both unidirectional and bidirectional RC architectures across a range of reservoir sizes (400, 500, and 600 units) to identify the most effective configuration for this task. Our results demonstrate that the bidirectional RC consistently outperforms the unidirectional RC in terms of accuracy and F1-score across all reservoir sizes, highlighting its superior ability to extract comprehensive temporal features from EEG data. The optimal performance is achieved by bidirectional RC with 600 reservoir units, yielding an accuracy of 18.94% and an F1-score of 19.02%, which surpasses all other configurations. In contrast, unidirectional RC with 600 units exhibits a lower accuracy of 17.22%. These findings underscore the potential of bidirectional RC in EEG signals classification for inner speech recognition, offering a promising direction for developing efficient BCI with low-cost computation.

View full abstract

Download PDF (2063K)
A Novel Hybrid AI model for Silent Speech Decoding Challenge with EEG Data

Anam Suri, Suraiya Jabin

2025 Volume 2025 Issue 2 Pages 1-15
Published: June 05, 2025
Released on J-STAGE: June 05, 2025

DOIhttps://doi.org/10.60401/ijabc.118

JOURNAL OPEN ACCESS

Show abstractHide abstract

Silent speech interfaces (SSIs) have become a promising pathway for human-computer interaction, particularly for individuals with speech impairments. Through this study, we have explored the classification of speech data using Native Arabic Silent Speech Dataset, collected by Qatar University Machine Learning Group. This dataset is designed for the classification and analysis of inner (silent) speech based on EEG signals, with a focus on six specific commands: up, down, right, left, select, and cancel. The challenge was to perform inter-session and inter-subject classification of these tasks. For both inter-subject and inter-session classification, we have experimented with different state-of-the-art Artificial Intelligence (AI) models like Convolutional Neural Network (CNN), Transformer, and Recurrent Neural Networks (RNNs) like Long Short Term Memory (LSTM) and Gated Recurrent Units (GRUs). After comparing the results from all these models, we have proposed a hybrid AI model, EEGCNN GRU, combining 1D CNNs and GRUs. This model was made as to capture the spatio-temporal features present in the EEG data. For inter-session classification, individual model for each subject was trained on data from the first three sessions, which were used to predict labels of the fourth unlabelled session of each subject. The inter-subject model was trained with three sessions from 8 subjects which was used to predict labels of two subjects having unlabelled data of four sessions. Our proposed model achieved an average F1-score, recall, precision and test accuracy of 48.90%, 48.97%, 50.00% and 49.97% respectively, for intersession classification. For inter-subject classification, the corresponding values were 20.97%, 20.98%, 21.08%, and 20.98%.

View full abstract

Download PDF (1302K)

Register with J-STAGE for free!