The difference between training and testing environments is the major reason of performance degradation of speech recognition. In this paper, to further decrease the mismatch, we apply temporal filtering, Auto-Regression and Moving-Average (ARMA) filtering or RelAtive SpecTrAl (RASTA) filtering, as a post-processor for the log-Energy dynamic Range Normalization-Cepstral Mean and Variance Normalization (ERN-CMVN) based speech features, referred to as [EC]-ARMA and [EC]-RASTA. From experimental results conducted on Aurora 2.0 database, the integrated approaches with temporal filtering are shown the best performance among the several integrated approaches.
View full abstract