Speech intelligibility prediction with the dynamic compressive gammachirp filterbank and modulation power spectrum

Katsuhiko Yamamoto; Toshio Irino; Toshie Matsui; Shoko Araki; Keisuke Kinoshita; Tomohiro Nakatani

doi:10.1250/ast.40.84

抄録

The speech-based envelope power spectrum model (sEPSM) was developed to predict the speech intelligibility of sounds produced by nonlinear speech enhancement algorithms such as spectral subtraction. It is a linear model with a linear, level-independent gammatone (GT) filterbank as the front-end. Therefore, it seems difficult to evaluate speech sounds with low and high sound pressure levels (SPLs) consistently because the intelligibility of the speech is dependent on the SPL as well as the signal-to-noise ratio. In this study, the sEPSM was extended with the dynamic compressive gammachirp (dcGC) auditory filterbank and a ``common'' normalization factor of the modulation power spectrum component to improve the predictability of the model. For evaluating the proposed model, we performed subjective experiments on the intelligibility of speech sounds enhanced by spectral subtraction and a Wiener filter algorithm. We compared the subjective speech intelligibility scores with the objective scores predicted by the proposed dcGC-sEPSM, original GT-sEPSM, and other well-known conventional methods such as the short-time objective intelligibility measure (STOI), coherence speech intelligibility index (CSII), and hearing aid speech perception index (HASPI). The result shows that the proposed dcGC-sEPSM predicted the subjective results better did than the other methods.

著者関連情報

お気に入り & アラート

閲覧履歴

前身誌

Journal of the Acoustical Society of Japan (E)

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）