International Journal of Activity and Behavior Computing

Summary of the Virtual Data Generation for Complex Industrial Activity Recognition

Qingxin Xia, Christina Garcia, Umang Dobhal, Min Xinyi, Tanigaki Kei, ...

2025 年2025 巻2 号 p. 1-12
発行日: 2025/05/21
公開日: 2025/05/21

DOIhttps://doi.org/10.60401/ijabc.101

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

This paper presents a summary of the virtual data generation for complex industrial activity recognition Challenge, which focused on exploring virtual data generation techniques to improve the performance of human activity recognition (HAR) in complex industrial environments. The challenge utilized the OpenPack dataset, a large-scale multimodal collection of sensor data captured during real-world packaging operations. Participants were tasked with generating synthetic accelerometer data to augment a baseline HAR model. Four teams from different countries proposed diverse approaches, including interpolation, classical augmentations, variational autoencoders, and GAN-based methods. Their submissions were evaluated using micro F1 score across multiple random seeds to test robustness. The results reveal that while deep generative models offer strong potential, simpler signal-based techniques also perform competitively when wellaligned with the data structure. Additionally, incorporating finer-grained action labels within each operation can help guide more realistic virtual data generation, leading to improved HAR model performance by better capturing intra-operation dynamics. Based on these findings, we discuss key insights and suggest future directions for designing robust, semantically consistent, and computationally efficient virtual data generation pipelines for industrial HAR applications.

抄録全体を表示

PDF形式でダウンロード (1507K)
Understanding Normal and Unusual Activities Related to Parkinson's Disease from Sensor Data: Summary of Activity and Behavior Computing Tremor Challenge

Shahera Hossain, Tahera Hossain, Christina Garcia, Tahia Tazin, Md. Ma ...

2025 年2025 巻2 号 p. 1-18
発行日: 2025/09/13
公開日: 2025/09/13

DOIhttps://doi.org/10.60401/ijabc.103

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Recognizing normal and unusual activities (e.g., tremor) from sensor data, particularly those related to the wear-off phenomenon of Parkinson's disease (PD), is crucial to improving the quality of life of elderly individuals. This paper presents five selected methods from the Activity and Behavior Computing Tremor Challenge 2025, which focused on developing robust models to recognize Parkinson's disease-related activities using wearable triaxial accelerometer data. These approaches range from traditional machine learning pipelines based on handcrafted multi-domain features to advanced deep learning architectures incorporating temporal modeling and attention mechanisms. The review also highlights innovative strategies such as adaptive window segmentation and data augmentation using conditional variational autoencoders. Each method addressed key challenges, including data label misalignment, class imbalance, noisy sensor readings, and the limited size of the dataset. The explored models ranged from traditional XGBoost classifiers to deep learning architectures such as CNN-LSTM hybrids, attention-enhanced DeepConvLSTM and Transformer-based classifiers, each demonstrating varying levels of success. This summary provides insights into the trade-offs between model interpretability and predictive performance, emphasizes the effectiveness of combining feature engineering with deep learning, and outlines directions for future research in scalable PD monitoring systems. The findings highlight the need for more advanced approaches to reliably decipher these challenging activities.

抄録全体を表示

PDF形式でダウンロード (592K)
Class-Conditioned Variational Autoencoder with Evolutionary Optimization for the Virtual Data Generation Challenge

Yuuki Tachioka

2025 年2025 巻2 号 p. 1-14
発行日: 2025/05/22
公開日: 2025/05/22

DOIhttps://doi.org/10.60401/ijabc.105

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Human activity recognition (HAR) plays a crucial role in optimizing packing operations in industrial settings. However, HAR performance is often constrained by the limited availability of labeled data. To address this issue, we propose a novel data augmentation framework using conditional variational autoencoders (CVAE) to generate high-quality synthetic data. Our approach ensures class consistency while increasing data diversity by conditioning the generation process on activity labels. Additionally, we optimize hyperparameters for data generation using evolutionary computation, further improving recognition accuracy. The proposed method is validated on the OpenPack dataset, demonstrating its effectiveness in enhancing HAR performance without modifying the recognition model itself. Our key contributions include the introduction of a robust data augmentation pipeline, the application of CVAE for HAR, and the use of evolutionary computation to optimize data generation. Our model trained with data augmentation achieved an F1 score of 53.4%, while the recognition model trained without data augmentation achieved an F1 score of 48.1%.

抄録全体を表示

PDF形式でダウンロード (1165K)
A Data-Centric Approach to Human Activity Recognition: Enhancing Industrial HAR with Virtual Data Generation

Matthias Tschope, Kirsten Harms, Daniel Eckhoff, Paul Lukowicz, Andrea ...

2025 年2025 巻2 号 p. 1-16
発行日: 2025/05/22
公開日: 2025/05/22

DOIhttps://doi.org/10.60401/ijabc.106

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Machine learning algorithms depend heavily on the availability of data. Since data collection can be both time-consuming and costly, virtual data generation and augmentation techniques are commonly used. Given that inertial measurement unit (IMU) datasets for industrial human activity recognition (HAR) are typically small, we propose an approach that combines traditional augmentation techniques with class-independent and class-dependent sliding window generation to enhance the performance of the given HAR classifier. This work was part of the Virtual Data Generation for Complex Industrial Activity Recognition challenge. We consider five augmentation methods and three resampling methods. Our experiments demonstrate an improvement of 7.04 percentage points over the baseline accuracy of 55.89%. The code is available on GitHub https://github.com/b02a8S0E/ABCDChallengePacking.

抄録全体を表示

PDF形式でダウンロード (455K)
Virtual Data Generation for Complex Industrial Activity Recognition : An Approach Based on Interpolation Algorithms and Data Augmentation Techniques

Atsushi Yanagisawa

2025 年2025 巻2 号 p. 1-21
発行日: 2025/05/22
公開日: 2025/05/22

DOIhttps://doi.org/10.60401/ijabc.107

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

In this study, we propose a virtual data generation method tailored for the OpenPack Challenge, which leverages acceleration sensor data along with associated operation and action labels. Our approach integrates simple yet effective data augmentation techniques (e.g., jitter, axis switching, and time warping) with advanced interpolation algorithms (including various forms of RBF and FFT-based methods) to generate high-quality virtual data. Experimental findings indicate that data-driven methods specifically adapted to the characteristics of sensor signals can substantially improve HAR model performance, offering a promising solution in scenarios where large-scale data collection is impractical. Specifically, by grouping the data according to the action and operation labels and applying an RBF-based interpolation algorithm to each group, an F1 score of 0.6379 was achieved. However, employing more advanced data generation algorithms such as GANs may further improve performance.

抄録全体を表示

PDF形式でダウンロード (275K)
Depression Detection via Facial Expressions and Movement Analysis Using Machine Learning with Optimized Feature Selection

Bagus Hardiansyah, Fajar Astuti Hermawati, Dymas Adi Saputra, Danara D ...

2025 年2025 巻2 号 p. 1-25
発行日: 2025/05/22
公開日: 2025/05/22

DOIhttps://doi.org/10.60401/ijabc.108

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Mental health disorders affect millions of people globally, posing considerable challenges for the detection and monitoring of depression. In this study, we aim to introduce several feature selection and machine learning approaches to optimize the data on facial behavior AUs, probabilities classification, head Euler angles, and facial landmarks with 25 participants. We computed statistical features for AUs, probabilities classification, head Euler angles, and facial landmarks such as min, max, mean, median, sum, and standard deviation of each facial data point from 25 participants for detection value data from PHQ-9 label depression episode between 0 and 1. Therefore, we achieve significant indicators of depressive episodes, achieving 0.94% of AUROC with the model SVC approach, 0.92% of AUROC with the model random forest classifier approach, 0.91% of AUROC with the model XGBClassifier approach, and 0.91% of AUROC with the model ANN machine learning approach, respectively.

抄録全体を表示

PDF形式でダウンロード (654K)
Transformer-based Depression Detection Model using FacePsy Data using Facial Action Units

Sujay Saha, Maharshi Niloy, Brototy Saha

2025 年2025 巻2 号 p. 1-12
発行日: 2025/05/22
公開日: 2025/05/22

DOIhttps://doi.org/10.60401/ijabc.109

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

As a psychological disease, depression needs to be detected and treated. However, the detection of depression is hard, and there is hardly any automated way to assess depression. In this paper, we worked on a dataset named ”FacePsy” which consists of facial landmark data of 25 participants. The depression among the participants was measured using the phq9 scale over a period of time. Using the dataset, we build a transformer-based model with an attention mechanism. The major challenge was to prepare the dataset to fit into the transformer model. The dataset was not evenly balanced, which created a bias in the results. However, we tried to minimize that by adding the self-attention mechanism to our model. In the universal method for evaluation, our proposed method achieved an accuracy of 93%, a Precision of 91%, and an F1-score of 84%. While in the hybrid model evaluation, our proposed method achieved an accuracy of 62% and an F1-score of 46%. This paper was submitted in ”BeyondSmile: Detecting Depression through Facial Behavior and Head Gestures” challenge as part Activity and Behavior Computing 2025 Conference.

抄録全体を表示

PDF形式でダウンロード (259K)
Temporal-aware Ensemble Learning Facial Behavior Analysis for Accurate Depression Assessment

Muhammad Abdul Latief, Andi Prademon Yunus, Raphon Galuh Chandraningty ...

2025 年2025 巻2 号 p. 1-25
発行日: 2025/05/23
公開日: 2025/05/23

DOIhttps://doi.org/10.60401/ijabc.110

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Depression is a significant global health burden, affecting over 322 million people worldwide and projected to surpass cardiovascular disease as the leading cause of disability by 2030. Despite advancements in mental health services, accurate and accessible diagnostic methods remain a critical challenge. Traditional approaches, such as psychiatric consultations, face limitations due to the physician-patient ratio and reliance on subjective self-report scales, which can lead to inaccuracies. Recent research has explored alternative methods, including facial behavior analysis, for objective depression assessment. This approach is cost-effective, non-invasive, and suitable for real-world applications. This study builds upon existing research, such as FacePsy and MoodCapture, by introducing a temporalaware ensemble learning framework that enhances depression assessment by integrating multiple models to capture both static and dynamic facial behavior patterns. Through comprehensive experiments, we demonstrate that models trained on specific facial features, such as Eyes Open and Smiling Probability, achieve superior performance, with F1 Scores ranging from 0.7093 to 0.7388, accuracy from 0.6607 to 0.7058, and AUROC from 0.7432 to 0.7845. In contrast, models trained on full feature sets exhibit lower performance, highlighting the importance of effective feature selection and pre-processing. The incorporation of temporal modeling further refines depression detection by capturing subtle facial dynamics that static models may overlook. This study bridges theoretical research with practical applications, fostering the development of innovative solutions for addressing the mental health crisis.

抄録全体を表示

PDF形式でダウンロード (395K)
Leveraging Machine Learning and Deep Learning for Enhanced Parkinson’s Disease Symptom Analysis

Happy Gery Pangestu, Andi Prademon Yunus, Raphon Galuh Chandraningtyas ...

2025 年2025 巻2 号 p. 1-22
発行日: 2025/05/23
公開日: 2025/05/23

DOIhttps://doi.org/10.60401/ijabc.111

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Parkinson’s disease (PD) remains a significant challenge in medical diagnosis and monitoring, requiring robust machine learning techniques to enhance predictive accuracy and improve patient outcomes. This study explores the use of Logistic Regression, ensemble learning models, and deep learning LSTM model to classify PD symptoms more effectively and reliably. Additionally, feature engineering techniques such as velocity, acceleration, sliding window, rolling mean, rolling variance, and time difference are incorporated to enhance model performance, capture essential movement patterns, and mitigate data noise. Experimental results demonstrate that feature engineering significantly improves classification accuracy, with Decision Tree achieving the highest testing accuracy of 0.989, Other tree models such as XGBoost and Random forest also give good result in accuration of 0.979 and 0.961 when combined using sliding window. Furthermore, the integration of deep recurrent architectures enhances the detection of temporal patterns in PD tremors, with LSTM achieving a testing accuracy of 0.73. These findings highlight the effectiveness of combining ensemble learning with deep learning for PD detection, offering potential improvements for early diagnosis, continuous patient monitoring, personalized treatment strategies, and disease progression assessment.

抄録全体を表示

PDF形式でダウンロード (696K)
Deep Convolutional Long Short-Term Memory Attention for Parkinson’s Activity Recognition

Shangai Li, Demirhan Hilmi

2025 年2025 巻2 号 p. 1-18
発行日: 2025/05/23
公開日: 2025/05/23

DOIhttps://doi.org/10.60401/ijabc.112

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

This research studies the difficulties in identifying activity patterns in individuals with Parkinson’s disease through triaxial accelerometer data, where the fundamental discrepancy between millisecond-level sensor outputs and minute-level activity labels presents considerable preprocessing obstacles. To maintain label consistency, we omitted minutes displaying mixed activity, resulting in some data loss. Traditional machine learning techniques, such sliding window segmentation, time and frequency domain feature extraction, and random forests, provide valuable insights. Our principal contribution resides in deep learning. We suggested a hybrid model for deep learning named DeepConvLSTM-Attention. This model integrates convolutional neural networks for the extraction of robust local features with recurrent neural networks augmented by attention mechanisms to proficiently capture temporal dependencies. The findings from our studies indicate that DeepConvLSTM Attention significantly improves both the F1 score and overall accuracy compared to standalone LSTM and CNN integrated with LSTM architectures. This underscores its capacity for enhanced activity identification and tailored therapy approaches in the management of Parkinson’s disease.

抄録全体を表示

PDF形式でダウンロード (2340K)
Tackling Data Scarcity in Tremor Challenge: Class-Conditional Data Augmentation

Yuuki Tachioka

2025 年2025 巻2 号 p. 1-18
発行日: 2025/05/23
公開日: 2025/05/23

DOIhttps://doi.org/10.60401/ijabc.113

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

With the progression of an aging society, early detection of neurodegenerative diseases such as Parkinson’s disease (PD) has become increasingly important. Tremors, a primary symptom of PD, serve as key indicators of disease progression and treatment efficacy. However, the scarcity of tremor-related data poses a significant challenge in developing robust human activity recognition (HAR) models. To address this issue, we propose a data augmentation framework using a conditional variational autoencoder to generate high-quality synthetic data conditioned on activity labels. Additionally, we employ evolutionary computation to optimize hyperparameters for data generation, further improving model performance. Our approach enhances both the diversity and robustness of training datasets, enabling the development of more accurate recognition models. Experimental results demonstrate that our framework significantly improves HAR, particularly in scenarios with limited real-world data. This method provides a scalable solution to advance PD detection and monitoring through multimodal sensor-based HAR. Our model trained with data augmentation achieved an F1 score of 45.6% in three different sets, while the recognition model trained without data augmentation achieved an F1 score of 31.6%.

抄録全体を表示

PDF形式でダウンロード (1354K)
Recognizing Normal and Unusual Activities for Parkinson’s Disease using Multimodal Sensor Data

Manqi Zhang, Xinyan Hu, Gulustan Dogan

2025 年2025 巻2 号 p. 1-16
発行日: 2025/05/23
公開日: 2025/05/23

DOIhttps://doi.org/10.60401/ijabc.114

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Early detection of Parkinson’s disease (PD) motor symptoms is crucial for improving clinical outcomes. This study presents a convolutional neural network-long short-term memory (CNN-LSTM) model for classifying PD-related activities using accelerometer data, developed as part of the ABC Challenge 2024. Our approach integrates a sliding window strategy with signal mutation detection to address data alignment challenges, combined with time-frequency feature extraction and temporal pattern modeling. Evaluated on a dataset of 9 subjects performing 10 activities, the model achieves an F1 score of 80.2% through leave-one-subject-out cross-validation. The results demonstrate the framework’s capability for continuous monitoring of symptoms such as tremors and freezing of gait. Future work will focus on expanding the dataset and optimizing the model architecture for real-world deployment.

抄録全体を表示

PDF形式でダウンロード (298K)
Native Arabic EEG-based Silent Speech Decoding Using Deep Learning Techniques

Taveena Lotey, Salini Yadav, Partha Pratim Roy

2025 年2025 巻2 号 p. 1-19
発行日: 2025/06/05
公開日: 2025/06/05

DOIhttps://doi.org/10.60401/ijabc.115

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Silent Speech Recognition (SSR) using electroencephalography (EEG) is an emerging area in brain-computer interface (BCI) research, enabling communication without vocal articulation. However, EEG-based SSR remains challenging due to low signal-to-noise ratio, inter-subject variability, and limited training data. This study explores machine learning (ML) and deep learning (DL) models for EEG-based silent speech decoding, utilizing the Native Arabic Silent Speech Dataset, which consists of EEG recordings from ten participants performing six distinct silent speech commands. A comprehensive preprocessing pipeline, including epoching, baseline correction, Independent Component Analysis (ICA) for artifact removal, and bandpass filtering, is applied to enhance signal quality. We evaluate both traditional ML classifiers (Support Vector Machines, Random Forests, and K-Nearest Neighbors) and DL models such as ShallowNet, EEGNet, Long-Short Term Memory (LSTM), and EEG-Conformer to assess their effectiveness in silent speech decoding. Our best-performing model, LSTM, achieved an accuracy of 19.57% and 22.79% under cross-subject evaluation and subject-wise evaluation, respectively. The study highlights the challenges in generalizing EEG-based SSR models and the need for improved domain adaptation techniques for better classification performance. This research is part of the Silent Speech Decoding Challenge (SSDC) challenge of Activity and Behavior Computing (ABC) conference.

抄録全体を表示

PDF形式でダウンロード (1447K)
Development of an EEG-Based Silent Speech Recognition Model on the Native Arabic Silent Speech Dataset Using Light BERT Architecture

Masaki Shuzo, Reiya Hiramoto, Ryoma Ishigaki, Shingo Ando, Motoki Saka ...

2025 年2025 巻2 号 p. 1-16
発行日: 2025/06/05
公開日: 2025/06/05

DOIhttps://doi.org/10.60401/ijabc.116

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

This study participated in the Silent Speech Decoding Challenge (SSDC) and investigated the application of a lightweight BERT-based architecture for EEG-based silent speech recognition. We pre-trained a foundation model using publicly available EEG data and fine-tuned it on the SSDC dataset. The model was evaluated on six silent speech commands: “right,” “left,” “up,” “down,” “select,” and “cancel.” The average accuracy and F1 score across all eight subjects were 0.165 and 0.137, respectively. Subject 5 achieved the highest discrimination performance, with an accuracy of 0.239 and an F1 score of 0.223. However, the overall classification performance remained below 25%. The confusion matrix analysis revealed frequent misclassifications across multiple classes, highlighting the challenges of EEG-based silent speech recognition. Accuracy varied across subjects, with the highest exceeding 20% and the lowest below 10%. These findings indicate that while pre-training captured meaningful EEG signal representations, the classification accuracy after fine-tuning was limited, emphasizing the difficulty of silent speech recognition using EEG. Despite these challenges, our approach provides insights into EEG-based classification and demonstrates the potential of BERT-based architectures for future research in this domain.

抄録全体を表示

PDF形式でダウンロード (592K)
Inner Speech Recognition Using EEG Signals with Reservoir Computing

Arie Rachmad Syulistyo, Yuichiro Tanaka, Hakaru Tamukoh

2025 年2025 巻2 号 p. 1-19
発行日: 2025/06/05
公開日: 2025/06/05

DOIhttps://doi.org/10.60401/ijabc.117

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Inner speech recognition using electroencephalography (EEG) signals holds significant potential for advancing brain-computer interface (BCI), particularly for individuals with speech impairments. However, decoding inner speech from EEG data remains a challenging task due to the nonlinear, high-dimensional and temporal dynamic nature of neural signals. In order to address the challenges, this study explores the application of reservoir computing (RC), with a particular focus on bidirectional RC, for the purpose of classifying inner speech from EEG signals. Contrary to unidirectional RC, which processes data in a single time direction, bidirectional RC captures both past and future temporal dependencies. This enhancement of the extraction of meaningful EEG features for classification is a significant contribution of this study. We evaluate the performance of both unidirectional and bidirectional RC architectures across a range of reservoir sizes (400, 500, and 600 units) to identify the most effective configuration for this task. Our results demonstrate that the bidirectional RC consistently outperforms the unidirectional RC in terms of accuracy and F1-score across all reservoir sizes, highlighting its superior ability to extract comprehensive temporal features from EEG data. The optimal performance is achieved by bidirectional RC with 600 reservoir units, yielding an accuracy of 18.94% and an F1-score of 19.02%, which surpasses all other configurations. In contrast, unidirectional RC with 600 units exhibits a lower accuracy of 17.22%. These findings underscore the potential of bidirectional RC in EEG signals classification for inner speech recognition, offering a promising direction for developing efficient BCI with low-cost computation.

抄録全体を表示

PDF形式でダウンロード (2063K)
A Novel Hybrid AI model for Silent Speech Decoding Challenge with EEG Data

Anam Suri, Suraiya Jabin

2025 年2025 巻2 号 p. 1-15
発行日: 2025/06/05
公開日: 2025/06/05

DOIhttps://doi.org/10.60401/ijabc.118

ジャーナルオープンアクセス

抄録を表示する抄録を非表示にする

Silent speech interfaces (SSIs) have become a promising pathway for human-computer interaction, particularly for individuals with speech impairments. Through this study, we have explored the classification of speech data using Native Arabic Silent Speech Dataset, collected by Qatar University Machine Learning Group. This dataset is designed for the classification and analysis of inner (silent) speech based on EEG signals, with a focus on six specific commands: up, down, right, left, select, and cancel. The challenge was to perform inter-session and inter-subject classification of these tasks. For both inter-subject and inter-session classification, we have experimented with different state-of-the-art Artificial Intelligence (AI) models like Convolutional Neural Network (CNN), Transformer, and Recurrent Neural Networks (RNNs) like Long Short Term Memory (LSTM) and Gated Recurrent Units (GRUs). After comparing the results from all these models, we have proposed a hybrid AI model, EEGCNN GRU, combining 1D CNNs and GRUs. This model was made as to capture the spatio-temporal features present in the EEG data. For inter-session classification, individual model for each subject was trained on data from the first three sessions, which were used to predict labels of the fourth unlabelled session of each subject. The inter-subject model was trained with three sessions from 8 subjects which was used to predict labels of two subjects having unlabelled data of four sessions. Our proposed model achieved an average F1-score, recall, precision and test accuracy of 48.90%, 48.97%, 50.00% and 49.97% respectively, for intersession classification. For inter-subject classification, the corresponding values were 20.97%, 20.98%, 21.08%, and 20.98%.

抄録全体を表示

PDF形式でダウンロード (1302K)

J-STAGEへの登録はこちら（無料）