Article ID: 2024EDL8094
Recognizing fatigue drivers is essential for improving road safety. Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have been applied to identify the state of drivers. However, these models frequently encounter various challenges, including a vast number of parameters and low detection effectiveness. To address these challenges, we propose Dual-Lightweight-Swin-Transformer (DLS) for driver drowsiness detection. We also propose the Spatial-Temporal Fusion Model (STFM) and Global Saliency Fusion Model (GSFM), where STFM fuses the spatial-temporal features and GSFM fuses the features from different layers of STFM to enhance detection efficiency. Simulation results show that DLS increases accuracy by 0.33% and reduces the computational complexity by 49.3%. The running time per test epoch of DLS is reduced by 33.1%.