Representations of Speech by Sparse Coding Algorithm

Kotani Manabu; Shirata Yasunobu; Satoshi Maekawa; Ozawa Seiichi; Akazawa Kenzo

doi:10.1541/ieejeiss1987.120.12_1996

抄録

It was reported that a sparse coding algorithm produced a set of basis functions being spatially localized, oriented, and bandpass for natural images. The application of Independent Component Analysis (ICA) to the natural images has shown to be similar results to the sparse coding's result. However, the ICA can be applied in the case of basis function matrices to be non-singular and invertible. There are not such limitations in the sparse coding algorithm. This property allows that the code is overcomplete, that is, the number of code elements is greater than the effective dimensionality of the input space. The purpose of this paper is to examine what characteristics of speech the sparse coding algorithm extracts from natural sounds. Speech data was Japanese five vowels uttered by a female speaker during about 1sec. Most of the basis functions were localized in frequency after the training. Some basis functions only shifted in time and resembled each other. Each basis function was compared with the speech data and the result was that some basis functions responded selectively to each vowel. The frequency analysis for the basis function showed that some basis functions extracted the pitch frequency and the formant of each vowel.

著者関連情報

お気に入り & アラート

閲覧履歴

発行機関からのお知らせ

【電気学会会員の方】購読している論文誌を無料でご覧いただけます（会員ご本人のみの個人としての利用に限ります）。購読者番号欄にMyページへのログインIDを，パスワード欄に生年月日8ケタ（西暦，半角数字。例：19800303）を入力して下さい。

ダウンロード

論文(PDF)の閲覧方法はこちら
閲覧方法 (389.7K)

前身誌

電気学会論文誌. C

電氣學會雜誌

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）