Representations of Speech by Sparse Coding Algorithm

Kotani Manabu; Shirata Yasunobu; Satoshi Maekawa; Ozawa Seiichi; Akazawa Kenzo

doi:10.1541/ieejeiss1987.120.12_1996

Abstract

It was reported that a sparse coding algorithm produced a set of basis functions being spatially localized, oriented, and bandpass for natural images. The application of Independent Component Analysis (ICA) to the natural images has shown to be similar results to the sparse coding's result. However, the ICA can be applied in the case of basis function matrices to be non-singular and invertible. There are not such limitations in the sparse coding algorithm. This property allows that the code is overcomplete, that is, the number of code elements is greater than the effective dimensionality of the input space. The purpose of this paper is to examine what characteristics of speech the sparse coding algorithm extracts from natural sounds. Speech data was Japanese five vowels uttered by a female speaker during about 1sec. Most of the basis functions were localized in frequency after the training. Some basis functions only shifted in time and resembled each other. Each basis function was compared with the speech data and the result was that some basis functions responded selectively to each vowel. The frequency analysis for the basis function showed that some basis functions extracted the pitch frequency and the formant of each vowel.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!