Abstract
We describe what characteristics an independent component analysis can extract from Japanese continuous speech. Speech data was selected from ATR database uttered by a female speaker. The data was recorded at 20kHz sampling frequency and was pre-processed with a whitening filter. The learning algorithm of a network was an information-maximization approach proposed by Bell and Sejnowski. After the learning, most of the basis functions that are columns of a mixing matrix were localized in both time and frequency. Furthermore, we confirmed that there were some basis functions to extract the acoustic feature such as the pitch and the formant of each vowel.