The Converter/Distributor (C/D) Model of phonetic implementation represents the temporal organization of speech signals, syllables being the minimal phonological units for concatenation, as articulatory and phonatory actions, depicted as complex, multi-dimensional, semi-quantitative patterns. Beyond F0 and intensity for prosodic description, the C/D model describes voice quality control by relating the temporal characteristics of glottal changes, linked with respiratory control, to articulatory movements of tongue, lips, velum, and mandible. Stress is represented as syllable pulse height, i.e., syllable magnitude, while the rhythm of an utterance is the organizational pattern of syllable magnitude distribution within the time domain of phrasal speech production.
The C/D model is a theory of the phonology-phonetics interface. This paper presents my personal understanding of the C/D model, based on my reading of Osamu Fujimura's work as well as my personal interaction with him. I also point out some key features of the C/D model as a theory of the phonology-phonetics interface.
The Converter/Distributor (C/D) model (Fujimura 2000) provides a comprehensive and explicit framework to model how the phonological, prosodic organization is mapped onto actual speech production. The goal of this paper is (i) to walk readers through how to construct the prosodic representations of the C/D model from actual articulation data, and (ii) discuss some crucial concepts of the C/D model. Some basic hypotheses of the C/D model are (1) phonological syllable magnitude increases with increased sentence stress, (2) amount of jaw displacement is the articulatory correlate of syllable magnitude, (3) phonological syllable timing is calculated from speed patterns of the crucial articulators of onset and coda consonants, and (4) once syllable magnitude and syllable timing are determined, we can automatically calculate phonological phrasing patterns, with phrase boundaries which come with predicted durational values. All of these computational aspects of the C/D model can and should be tested empirically. In this paper, we attempt to explain and discuss these aspects of the C/D model in detail, especially for those readers who are not already familiar with the model.
Japanese vowel devoicing typically occurs on a high vowel between voiceless consonants. Such devoicing is regular and complete. By contrast, atypically there are devoicing cases of non-high vowels and in non-devoicing environments, which are irregular and gradient. For these varying manifestations, some physiological studies demonstrated distinct muscle activities and glottal spread patterns. Also there is a phonological account that postulates distinct feature specifications for those types. This study attempts to incorporate these postulations in the C/D model to generate different manifestations, and it further discusses the potential of the model to map phonological specifications to their variable phonetic output.
The dominant view in the field of Japanese phonetics and phonology is that Japanese metrical prominence, if anything, manifests itself as pitch accent, whose primary acoustic correlate is F0 fall. Work by Osamu Fujimura has challenged this view, by arguing that Japanese has stress, which is realized by way of increases in jaw opening. In addition, he argues that jaw displacement patterns show declination within a phrase, just as F0 does. This paper reports an experiment using EMA (ElectroMagnetic Articulograph) which examined these claims. The results of the current experiment show that these claims by Fujimura are in principle correct empirically, and hold across all six native speakers of Japanese tested in this experiment. In addition, the current results reveal that Japanese exhibits final stress, which is a new finding going beyond the original insights offered by Fujimura's work. A further acoustic analysis shows that initial and final stress manifest itself in high F1, and surprisingly, low intensity. All in all, we conclude that Japanese has both initial and final stress, with declination observed within the phrase-internal syllables.
The Converter/Distributor model provides a comprehensive framework for the prosodic organization of speech production. As a means to explore some of the hypotheses of the C/D model, we examine articulatory events involved in the production of contrastive emphasis, and we report on the following findings: (1) syllable magnitude increases with increased sentence stress, (2) the amount of jaw displacement is commensurate with syllable magnitude, and (3) the speeds of crucial articulators within the "iceberg" region for demisyllabic onset or coda are relatively invariant to the amount of excursion of the crucial articulators independent of prosodic changes.
The C/D model (Fujimura 1992, 2007) is an explicit framework that calculates the continuous physical gestures and quantitative phonetic information of speech sounds from the input data which consist of the qualitative phonological information. Fujimura proposes that this input information is given in the form of a syllable-based set of unary and underspecified phonological features rather than a set of binary features used by many phonological theories. This study argues for the structure of the syllable-based input information with 'mora sets', and representations of unary features that define the qualitive characters of the impulse response functions. This argument is developed based on a discussion of vowel devoicing in Japanese.
This study compares the acoustically and articulatorily derived rhythmic structure of American English utterances of a short three digit phrase uttered within a semi-spontaneous dialogue. The Converter/Distributor model of Osamu Fujimura forms the framework for determining the articulatory rhythm. Within this model, syllable stress is measured as a scalar representation of articulatory strength. Articulatory syllable duration is an abstract representation of syllable magnitude and is directly proportional to it. Results show that the acoustic rhythmic structure varies greatly from the articulatory rhythmic structure for these semi-spontaneous utterances. Emphasis was shown to change syllable durations, syllable magnitudes and boundary magnitudes.
A syllable-based phonological description of a language should represent all the meaningful oppositional patterns within the syllabic domain. Syllable feature underspecification allows for parsimony of description and provides a framework in which to capture dialectal variation of phonetic implementation in natural discourse. Given the entire syllable as the domain, syntagmatic relationships evince rules for co-occurrence among feature sets in onset, coda, nucleus, and any affixes. In English, syllable features demonstrate that the same phoneme in syllable onset and coda functions differently, with constraints restricting tautosyllabic co-occurrence. Moreover, the grammar of English phonology requires a coda feature in every non-reduced syllable.
This is the description of an idea to build a syllable based speech synthesizer as part of a text to speech (TTS) system. The speech synthesizer is designed to emulate speech production with methods that are informed by what is known about the natural speech production processes, while aiming for high quality speech output. The goal is to build a detailed modular and partially hierarchical time-varying dynamic system that is controlled at the top by discrete multidimensional processes in an abstract space of feature variables. While not further investigating this, it is assumed that these can be derived from the input text. The features used to describe speech are closely related to speech articulation, rhythm and pausing, as well as other identifiable prosodic parameters. Together with phrase/rhythm and prosodic features, the syllable is used as atomic speech unit, and syllable features play a central role in the control of the synthesizer. For this, Fujimura's C/D model provides a large part of the framework. The system is designed as a generative model of observable speech production processes that makes it possible to use system identification procedures, analysis-by-synthesis methods, and methods of machine learning. This data driven approach will be necessary in order to obtain the large number of parameters that are deemed necessary to specify with some accuracy the properties of observed speech produced by one or more speakers, so that the model can generalize to produce high quality speech from arbitrary text.
Morpheme segmentation is a core function in cognitive neural processing of spoken as well as written sentences. This study investigated if morpheme parsing for Japanese Kana phrases, in which multiple morphemes are continuously aligned without any separating blank space, is triggered automatically. To avoid conscious attention on phrase meanings, a short term memory task was performed by 28 healthy volunteers, in which each participant judged if a specified character existed or not in a previously-displayed phrase of 500ms. The brain activities were measured by fMRI while participants were performing the tasks. The results suggest morpheme parsing is automatically activated in the left frontotemporal language network for native Japanese when viewing Japanese multi-morpheme Kana phrases.
This paper sheds light on variable speaking units, the "gears", of common Japanese speech. Through the description of four types of linguistic resources for these units, i.e. letter character, intonation, clause, and phrase (bunsetsu in Japanese), we see that various linguistic phenomena are speaking mode-dependent: (i) lexical determinacy of Japanese accent does not hold for letter character-mode speech; (ii) non-conflict between intonation and lexical accent does not hold for intonation-mode speech; (iii) inversion is possible only in clause-mode speech; and (iv) occurrence of copulas, attitudinal particles, and intonation jump-up at speech, phrase allows for sentential items such as copulas, attitudinal particles, and intonation jump-up.
The DIVA (Directions Into Velocities Articulator) model provides a computational and neuroanatomical account of speech acquisition and production; however, its prediction of speech perception and production for Mandarin is limited. The aim of this study is to modify the original DIVA model to simulate both normal and speech disordered productions in Mandarin. The proposed version of the model provides additional functions of speech perception, tonal acquisition and diphthong production. Computer simulation of our modified DIVA model verifies its ability to simulate Mandarin tonal production in diphthong and speech perception across vowels.