Silent speech interfaces (SSIs) have become a promising pathway for human-computer interaction, particularly for individuals with speech impairments. Through this study, we have explored the classification of speech data using Native Arabic Silent Speech Dataset, collected by Qatar University Machine Learning Group. This dataset is designed for the classification and analysis of inner (silent) speech based on EEG signals, with a focus on six specific commands: up, down, right, left, select, and cancel. The challenge was to perform inter-session and inter-subject classification of these tasks. For both inter-subject and inter-session classification, we have experimented with different state-of-the-art Artificial Intelligence (AI) models like Convolutional Neural Network (CNN), Transformer, and Recurrent Neural Networks (RNNs) like Long Short Term Memory (LSTM) and Gated Recurrent Units (GRUs). After comparing the results from all these models, we have proposed a hybrid AI model, EEGCNN GRU, combining 1D CNNs and GRUs. This model was made as to capture the spatio-temporal features present in the EEG data. For inter-session classification, individual model for each subject was trained on data from the first three sessions, which were used to predict labels of the fourth unlabelled session of each subject. The inter-subject model was trained with three sessions from 8 subjects which was used to predict labels of two subjects having unlabelled data of four sessions. Our proposed model achieved an average F1-score, recall, precision and test accuracy of 48.90%, 48.97%, 50.00% and 49.97% respectively, for intersession classification. For inter-subject classification, the corresponding values were 20.97%, 20.98%, 21.08%, and 20.98%.
View full abstract