Formant estimation of high-pitched noisy speech using homomorphic deconvolution of higher-order group delay spectrum

Husne Ara Chowdhury; Mohammad Shahidur Rahman

doi:10.1250/ast.44.84

Abstract

Estimating the formant frequencies of high-pitched speech is essential in many speech processing applications. Unfortunately, most existing methods cannot accurately estimate the formant frequencies from high-pitched speech. Moreover, the available formant estimators do not show noise immunity. In this paper, we propose a higher-order group delay (GD) spectrum-based deconvolution method for formant estimation of high-pitched noisy speech with higher accuracy. Although cepstrum is known to provide a source-filter separation, to some extent, it gets affected by ambient noise. We employ the spectral-root-deconvolution technique on the third-order GD spectrum that yields a noise-robust cepstrum. The resulting cepstrum is found to produce significant improvement when estimating formant frequencies. We evaluated the proposed method on five synthetic vowels and some natural vowels spoken by male and female speakers by calculating the estimation error of the formant frequencies and standard F2–F1 plots, respectively. An utterance from the Texas Instruments and Massachusetts Institute of Technology (TIMIT) database has been utilized to plot the formant contours on the respective spectrogram. We compared the results with the three state-of-the-art methods. Our proposed technique outperforms all approaches, particularly with high-pitched speaking in a noisy environment.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!