Abstract
We developed a deep learning algorithm for human activity recognition using sensor signals as input. In this study, we built a pre-trained language model based on the Transformer architecture, which is widely used in natural language processing. By leveraging this pre-trained model, wE aimed to improve performance on the downstream task of human activity recognition. While this task can be addressed using a vanilla Transformer, we propose an enhanced n-dimensional numerical processing Transformer that incorporates three key features: embedding n-dimensional numerical data through a linear layer, binning-based preprocessing, and a linear transformation in the output layer. We evaluated the effectiveness of our proposed model across five different datasets. Compared to the vanilla Transformer, our model demonstrated a 10%–15% improvement in accuracy.