Abstract
In this paper, we propose a speech intelligibility estimation method using the Support Vector Regression (SVR) with the normalized segmental Signal-to-Noise Ratio in 25 critical bands (cbSNRseg). In the proposed method, estimation was done in the target 32 noise environments which were classified into 3 clusters by the ambient noise clustering method with MIR (Music Information Retrieval) features and the x-means algorithm. Next, We compared cbSNRseg and 1/3 octave bands SNRseg (obSNRseg) and used the cross-validation RMSE in 5 regression methods including SVR. As a result, the weighted sum of RMSE using cbSNRseg was better than obSNRseg with RMSE reduction factor of about 0.8 compared to all other regression methods. Finally, we compared the performance of each regression methods in open tests. As a result, the best regression method was the SVR using the RBF kernel, in which RMSE is reduced by a factor of about 0.7 compared to other regression methods.