Recent advances in neuro-cognitive and brain-imaging studies on speech communication are reviewed and their potential contributions to phonetic science are discussed. Topics are referred from brain imaging research on the role of emotional prosody for mind reading from speech, and on the neural processes of reading and sentence generation, as well as acquisition mechanisms of phonetic categories and neural plasticity.
The most pervasive brain network model of spoken language processing is Wernicke-Lichtheim's classical view, which was revived by Geschwind in 1965 as the 'disconnexion' account. This has been established through various neuropsychological observations of language disorders of patients with aphasia, alexia and so on. The recent development of brain imaging techniques, e.g., PET (positron emission tomography), fMRI (functional MRI), MEG (magneto-encephalography) and NIRS (near infrared spectroscopy), has enabled us to directly investigate normal brain activities, and studies using these techniques challenge the disconnexion view. In this paper a brief summary of the classical view is given, and recent brain imaging studies are reviewed, primarily focused upon neural substrates of speech perception, the semantic system, phonological/phonetic encoding and motor programming.
This review focuses on the localization and function of the neural center for speech motor control based on recent trends in re-evaluating the traditional views and our own brain imaging experiments using functional MRI. The first experiment with repetitions of short Japanese phrases indicated that the region in the inferior cerebellum for speech breathing and vocalization was more medial than that for articulation. The second experiment with changing and repetitive syllables revealed that the activity of the left anterior insula was found only in the task of changing syllables, and suggested that the insula was involved in phonetic encoding and motor planning.
A layman's view about the brain science approach to speech was presented. The most obvious advantage of the brain science approach over the traditional behavioral approach consists in the possible clarification of speech encoding and decoding processes; the processes that remain largely untouched in the behavioral approach. Especially, construction of computational model of speech perception will open up a brand-new landscape of speech science. However, to achieve these goals effectively, brain science should pay more attention to the fruits obtained by the behavior approach. Experiments of brain science should be designed based upon sufficient understanding of the linguistic properties of experimental stimuli. It is especially important to pay enough attention for the probabilistic properties of linguistic signs. For example, the distinction between semantically normal and abnormal stimuli is not necessarily a dichotomy. It is possible to compute the occurrence probability of semantically deviant sentences using the power of powerful computer and large-scale annotated language corpora. In the same vein, retrieval of a speech corpus reveals that the long-believed distributional restriction of the /tyu/ mora in Japanese is not a strict one as long as spontaneous speech is concerned. Results of brain scientific experiments will enrich greatly our understanding of speech, if the experiments are designed correctly.
Differences between the typical pronunciation of Tokyo and that of Osaka were studied using "Kyonen Narano momizio Yumito mita," a sentence in which both varieties of Japanese have the same lexical tonal pattern. Twelve speakers from Tokyo and 12 from Osaka uttered the test sentence. The recordings were submitted to 36 listeners from Tokyo and 32 from Osaka to judge the speakers' provenience. A correlation analysis revealed that the utterances successfully recognized as Tokyo dialect by the Tokyo listeners have a shorter duration of [jo] in "kyonen" and of the first [a] in "Nara" and an irregular vibration of vocal cords at the end of the sentence, while those recognized as Osaka speech by the Osaka listeners have earlier F0 rise-falls and a less dynamic pitch movement.
The present study deals with the relation between +VOT tendency in the voiced alveolar plosive /d/ in Japanese and the birth year of the speakers. According to the results of the acoustic analysis of /d/ at the word initial position, there exists a strong correlation between the +VOT tendency in /d/ and the birth year of the speakers. We can assume that a manner of production of /d/ at the word initial position has been changing over about 60 years among the speakers whose birth year ranges from 1933 to 1988. However, the results from the listening test suggest that +VOT tendency does not influence the perception of /d/ and /t/ of the word initial position.
I hypothesized that plosive perception contributes to speech perception of Japanese. I have noted that Shanghai speakers with the voicing contrast yet with a modest knowledge of Japanese grammar are generally higher in listening comprehension than Mandarin speakers without the voicing contrast but with a higher knowledge of Japanese grammar. In order to verify this hypothesis, I examined the relationship between the plosive perception and speech perception of the two groups. Results indicate that confusion in plosive consonants is a factor contributing to the confusion of speech perception and, especially, that confusion in voiceless plosive [t] and [k], whose frequencies are the two highest in Japanese speech, is a significant factor preventing Mandarin speakers from perceiving Japanese speech.