Deepfake speech detection: approaches from acoustic features related to auditory perception to deep neural networks

Masashi UNOKI; Kai LI; Anuwat CHAIWONGYEN; Quoc-Huy NGUYEN; Khalid ZAMAN

doi:10.1587/transinf.2024MUI0001

抄録

Skillfully fabricated artificial replicas of authentic media using advanced AI-based generators are known as “deepfakes.” Deepfakes have become a growing concern due to their increased distribution in cyber-physical spaces. In particular, deepfake speech, which is fabricated by using advanced AI-based speech analysis/synthesis techniques, can be abused for spoofing and tampering with authentic speech signals. This can enable attackers to commit serious offenses such as fraud by voice impersonation and unauthorized speaker verification. Our research project aims to construct the basis of auditory-media signal processing for defending against deepfake speech attacks. To this end, we introduce current challenges and state-of-the-art techniques for deepfake speech detection and examine current trends and remaining issues. We then introduce the basis of the acoustical features related to auditory perception and propose methods for detecting deepfake speech based on auditory-media signal processing consisting of these features and deep neural networks (DNNs).

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

PPV is available from https://globals.ieice.org/en_transactions/information

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）