Abstract
In this study, we propose to optimize temporal parameters with pose estimation data of simulated abnormal activities of developmentally disabled individuals by incorporating behavior context to Large Language Models (LLMs). Facilities for the developmentally disabled face the challenge of detecting abnormal behaviors because of limited staff and the difficulty of spotting subtle movements. Traditional methods often struggle to identify these behaviors because abnormal actions are irregular and unpredictable, leading to frequent misses or misclassifications. The main
contributions of this work is the creation of a unique dataset with labeled abnormal behaviors and the proposed application of LLMs to this dataset comparing results of Zero-Shot and Few-Shot. Our method leverages the context of the collected abnormal activity data to prompt LLMs to suggest window size, overlap rate, and LSTM model’s length sequence tailored to the specific characteristics of these activities. The dataset includes labeled video data collected for four days from five normal participants performing eight activities with four abnormal behaviors. The data was collected with normal participants to simulate activities, and no individuals with disabilities. For evaluation, we assessed all normal versus abnormal activities and per abnormal activity recognition comparing with
the baseline without LLM. The results showed that Few-Shot prompting delivered the best performance, with F1-score improvements of 7.69% for throwing things, 7.31% for attacking, 4.68% for head banging, and 1.24% for nail biting as compared to the baseline. Zero-Shot prompting also
demonstrated strong recognition capabilities, achieving F1 scores above 96% across all abnormal behaviors. By using LLM-driven suggestions with YOLOv7 pose data, we optimize temporal parameters, enhancing sensitivity to abnormal behaviors and generalization across activities. The model reliably identifies short, complex behaviors, making it ideal for real-world caregiving applications.