We conducted human language identification experiments using signals with reduced segmental information with Japanese and bilingual subjects. American English and Japanese excerpts from the OGI Multi-Language Telephone Speech Corpus were processed by spectral-envelope removal (SER), vowel extraction from SER (VES) and temporal-envelope modulation (TEM). The processed excerpts of speech were provided as stimuli for perceptual experiments. We calculated D indices from the subjects’ responses, ranging from -2 to +2 where positive/negative values indicate correct/incorrect responses, respectively. With the SER signal, where the spectral-envelope is eliminated, humans could still identify the languages fairly successfully. The overall D index of Japanese subjects for this signal was +1.17. With the VES signal, which retains only vowel sections of the SER signal, the D index was lower (+0.35). With the TEM signal, composed of white-noise-driven intensity envelopes from several frequency bands, the D index rose from +0.29 to +1.69 corresponding to the increasing number of bands from 1 to 4. Results varied depending on the stimulus language. Japanese and bilingual subjects scored differently from each other. These results indicate that humans can identify languages using signals with drastically reduced segmental information. The results also suggest variation due to the phonetic typologies of languages and subjects’ knowledge.
View full abstract