The vibration of the vocal folds produces the primary sound source for vowels. This paper first reviews vocal fold anatomy and the kinematics associated with typical vibratory motion. A brief historical background is then presented on the basic physics of vocal fold vibration and various efforts directed at mathematical modeling of the vocal folds. Finally, a low-dimensional model is used to simulate the vocal fold vibration under various conditions of vocal tract loading. In particular, a “no-tract” case is compared to two cases in which the voice source is coupled to vocal tract area functions representing the vowels /i/ and /a/, respectively.
Recent developments in observation techniques such as magnetic resonance imaging allow us to obtain an accurate description of the vocal-tract shape. It is thus possible to perform analyses of the acoustic characteristics of three-dimensional vocal-tracts at higher frequencies where the assumption of plane wave propagation does not hold. Historical and conventional one-dimensional models of vocal-tracts are briefly described followed by recent knowledge of the acoustic characteristics of three-dimensional vocal-tracts.
Our ability to discriminate sounds such as vowels is not uniform throughout acoustic space. That is, our auditory perceptual spaces are warped representations of acoustic space. One example of auditory space warping, the perceptual magnet effect, arises from exposure to the phonemes of an infant’s native language. We have developed a neural model that accounts for this effect. The model is based on the idea that category learning during infancy changes the distribution of the firing preferences of neurons in auditory cortical maps and thus changes the discriminability of sounds from different parts of acoustic space. The model predicts that it should be possible to induce a perceptual magnet effect for non-speech stimuli. This prediction was verified by a psychophysical experiment in which subjects underwent categorization training involving non-speech auditory stimuli that were not “categorical” prior to training. The model further predicts that the magnet effect arises because prototypical vowels have smaller auditory cortical representations than non-prototypical vowels. This prediction was supported by a functional magnetic resonance imaging (fMRI) experiment involving prototypical and non-prototypical examples of the vowel /i/. The model thus provides an account of phoneme category learning that unifies observations from auditory psychophysics, cortical neurophysiology, and neural modeling.
The Dispersion-Focalization Theory (DFT) attempts to predict vowel systems by using the competition between two perceptual costs: (i) dispersion based on inter-vowel distances, (ii) local focalization based on intra-vowel spectral salience related to formants proximity. The first cost is related to the global structure of the system and the second to the internal structure of each vowel element. The competition takes place in an auditory (formant-based) space, and it is controlled by two parameters, namely λ which sets the respective weight of F1 and higher formants in auditory distances, and α which sets the respective weights of the dispersion and focalization costs. We describe a new methodology for testing the DFT predictions: for a given number of vowels, the so-called “phase spaces” allow us to determine the DFT winner in the (λ, α) space. We present a refined analysis of the UPSID database inventory of vowel systems. From the comparison between experimental phase spaces and UPSID data we not only derive a (λ, α) region for which DFT predictions fit quite well with the phonological inventories and are compatible with preferred 3-to-7 vowels systems, but also what the possible variants in the systems are and in which order they can appear.