Deepfake speech detection: approaches from acoustic features related to auditory perception to deep neural networks

Masashi UNOKI; Kai LI; Anuwat CHAIWONGYEN; Quoc-Huy NGUYEN; Khalid ZAMAN

doi:10.1587/transinf.2024MUI0001

Abstract

Skillfully fabricated artificial replicas of authentic media using advanced AI-based generators are known as “deepfakes.” Deepfakes have become a growing concern due to their increased distribution in cyber-physical spaces. In particular, deepfake speech, which is fabricated by using advanced AI-based speech analysis/synthesis techniques, can be abused for spoofing and tampering with authentic speech signals. This can enable attackers to commit serious offenses such as fraud by voice impersonation and unauthorized speaker verification. Our research project aims to construct the basis of auditory-media signal processing for defending against deepfake speech attacks. To this end, we introduce current challenges and state-of-the-art techniques for deepfake speech detection and examine current trends and remaining issues. We then introduce the basis of the acoustical features related to auditory perception and propose methods for detecting deepfake speech based on auditory-media signal processing consisting of these features and deep neural networks (DNNs).

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!