IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532

This article has now been updated. Please use the final version.

Deepfake speech detection: approaches from acoustic features related to auditory perception to deep neural networks
Masashi UNOKIKai LIAnuwat CHAIWONGYENQuoc-Huy NGUYENKhalid ZAMAN
Author information
JOURNAL FREE ACCESS Advance online publication

Article ID: 2024MUI0001

Details
Abstract

Skillfully fabricated artificial replicas of authentic media using advanced AI-based generators are known as “deepfakes.” Deepfakes have become a growing concern due to their increased distribution in cyber-physical spaces. In particular, deepfake speech, which is fabricated by using advanced AI-based speech analysis/synthesis techniques, can be abused for spoofing and tampering with authentic speech signals. This can enable attackers to commit serious offenses such as fraud by voice impersonation and unauthorized speaker verification. Our research project aims to construct the basis of auditory-media signal processing for defending against deepfake speech attacks. To this end, we introduce current challenges and state-of-the-art techniques for deepfake speech detection and examine current trends and remaining issues. We then introduce the basis of the acoustical features related to auditory perception and propose methods for detecting deepfake speech based on auditory-media signal processing consisting of these features and deep neural networks (DNNs).

Content from these authors
© 2024 The Institute of Electronics, Information and Communication Engineers
feedback
Top