Abstract
Self-confidence is a pivotal trait that profoundly impacts performance across various life domains. It fosters positive outcomes by facilitating quick decision-making and timely actions. In the context of video-based learning, accurate detection of self-confidence is critical as it enables the provision of personalized feedback, thereby enhancing learners’ experiences and improving their confidence levels. This study addresses the challenge of self-confidence detection by evaluating and comparing traditional machine-learning methods with an advanced deep-learning approach using eye-tracking data collected through two distinct modalities: an eye-tracker and an appearance-based model. Our experimental setup involved fourteen participants, each of whom viewed eight distinct videos and provided corresponding responses. To analyze and assess the collected data, we implemented and compared five different algorithms: Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and a deep-learning based 1D Convolutional Neural Network (1D CNN) and Transformer models. The 1D CNN model achieved the highest macro F1-scores using leave-one-participant-out cross-validation (LOPOCV), with performances of 0.662 on eye-tracking data and 0.635 on appearance-based data. In contrast, under leave-one-question-out cross-validation (LOQOCV), Logistic Regression demonstrated superior performance for eye-tracking data (F1-score: 0.560), while Transformer-based models yielded the highest F1-score (0.616) for appearance-based data. These findings underscore the effectiveness of deep learning in capturing complex gaze behavior patterns, thereby providing a robust framework for estimating self-confidence in video-based learning environments.