Article ID: 2025EDP7044
Currently, research in deepfake speech detection focuses on the generalization of detection systems towards different spoofing methods, mainly for noise-free clean speech. However, the performance of speech anti-spoofing countermeasure (CM) systems often does not work well in more complicated scenarios, such as those involving noise and reverberation. To address the problem of enhancing the robustness of CM systems, we propose a transfer learning-based hybrid approach with Speech Enhancement front-end and Counter Measure back-end Joint optimization (SECM-Joint), investigating its effectiveness in improving robustness against noise and reverberation. Experimental results show that our SECM-Joint method reduces EER by 19.11% to 64.05% relatively in most noisy conditions and 23.23% to 30.67% relatively in reverberant environments compared to a Conformer-based CM baseline system without pre-training. Additionally, our dual-path U-Net (DUMENet) further enhances the robustness for real-world applications. These results demonstrate that the proposed method effectively enhances the robustness of CM systems in noisy and reverberant conditions. Codes and experimental data supporting this work are publicly available at: https://github.com/ikou-austin/SECM-Joint