Anthropological Science
Online ISSN : 1348-8570
Print ISSN : 0918-7960
ISSN-L : 0918-7960
Reviews
Non-linear dynamics in mammalian voice production
ISAO T. TOKUDA
著者情報
ジャーナル フリー HTML

2018 年 126 巻 1 号 p. 35-41

詳細
Abstract

Animal vocalizations range from tonal sounds to irregular atonal sounds and are generated from non-linear oscillations of the vocal folds as well as from turbulent noise in the glottis. Comprehensive study on bioacoustic signals indicates the existence of a diversity of non-linear phenomena, such as limit cycles, subharmonics, biphonation, chaos, and bifurcations in animal vocalizations, which may provide keys to understanding animal communications. In this paper, we review the concept of nonlinear dynamics and its methodology as applicable to bioacoustics. Acoustical analysis of recorded sounds, simulation of a biomechanical model of the voice production system, and physical experiment of the vocal tract and vocal folds are presented to demonstrate non-linear features inherent in animal vocalizations. We focus on source–filter interaction as one of the main regulators of the non-linear property that can lead either to efficient vocalization or to voice instability in animal sounds. A combination of different approaches is suggested to be of great use for extracting the essential features of non-linear dynamics in animal vocalizations.

Introduction

One of the main research interests in bioacoustics is the basic mechanism of sound production in animal vocalizations. This new interdisciplinary field has been made possible by the combination of acoustic measurements in vivo and ex vivo, neurophysiological and anatomical investigation of the vocal apparatus, and the physics of sound production. Findings from bioacoustics may provide a comprehensive understanding of animal communications (Tembrock, 1996; Bradbury and Vehrencamp, 1998) and clues about the evolution of acoustic communication systems, possibly leading to the origin of language (Hauser, 1996; Fitch, 2010). In many species, the voice is produced via two-stage processes (Chiba and Kajiyama, 1941; Fant, 1960; Titze, 1994; Taylor and Reby, 2010): (i) the airflow produced from the lungs induces tissue vibration of the vocal folds, generating the ‘source’ sound (see Figure 1a); (ii) the spectral structure of the source sound is shaped by the vocal-tract ‘filter.’ Through the filtering process, resonance frequencies determined by the vocal-tract configuration are amplified, while other frequency components are diminished. The source sound characterizes the lowest frequency of the voice (i.e. the fundamental frequency), while the filter forms the overall spectral structure. Although fundamental frequency has been considered to convey central information in animal communications, recent studies have suggested that the formants, which are defined as spectral peaks of the sound spectrum (Fant, 1960) and closely related to the resonance frequencies of the vocal tract, are controlled not only by humans but also by animals and might be used in their acoustic communications.

Figure 1

(a) Schematic illustrations of vocal apparatus. (b) Asymmetric two-mass model (Steinecke and Herzel, 1995). The model describes the vocal folds as a set of two masses coupled by springs and dampers. The asymmetry parameter, Q, determines the tension imbalance between the left and right vocal folds. The subglottal pressure, Ps, induces an airflow that passes through the vocal folds and supports their oscillations.

The vocal-fold oscillations are due to the combined nonlinear effects of pressure, airflow, tissue elasticity, and collision between the vocal folds (Titze, 1994). Such complex vocal-fold oscillations are known to be well characterized by non-linear equations of motion with only few dynamic variables. In fact, in studies of the human voice, a variety of voice instabilities have been described by non-linear dynamics. For instance, desynchronized oscillations of the left and right vocal folds (Ishizaka and Isshiki, 1976; Steinecke and Herzel, 1995), desynchronization of anterior–posterior vibratory modes of the vocal folds (Berry et al., 1994; Neubauer et al., 2001), excessively high subglottal pressure (Jiang et al., 2001), interference with supraglottal resonances (Hatzikirou et al., 2006), vocal cord nodules and polyps (Titze et al., 1993), and register transitions (Tokuda et al., 2007) can lead to chaotic oscillations of the vocal folds.

One example of this non-linear effect that is commonly observed in our daily life is a frequency jump (register break) during singing. When we switch our singing style (register) from chest to falsetto or vice versa, we may sometimes sing out of tune. In terms of non-linear dynamics, this kind of vocal instability is due to bifurcations of different oscillatory states, which will be explained in detail in this article.

Animal vocalizations range from almost harmonic to irregular noisy sounds (Hauser, 1996; Tembrock, 1996; Bradbury and Vehrencamp, 1998). In order to achieve a systematic understanding of such a complex dynamic process, which should play an important role in animal vocalizations, the concept of ‘nonlinear dynamics’ has been introduced (Fee et al., 1998; Wilden et al., 1998; Fitch et al., 2002). The idea of non-linear dynamics implies that irregular animal utterances might be generated from deterministic non-linear dynamics with only a few state variables. To detect such non-linear dynamics, several approaches have been proposed. One is to record animal vocalizations in a natural or controlled environment and to analyze the data with a conventional spectrogram or, alternatively, a specialized technique suitable for characterizing non-linear dynamics in bioacoustic signals (Wilden et al., 1998; Fitch et al., 2002; Tokuda et al., 2002). Another is to carry out an excised larynx experiment to investigate the property of vibrating tissues (van den Berg, 1968; Berry et al., 1996; Herbst et al., 2012). A third approach is to develop biomechanical models that mathematically describe the process of voice production. For instance, vocal membranes have been modeled as additional vibrating tissues (Mergell et al., 1999) and labia in songbirds have been described by a flapping model (Amador et al., 2008). Physical models have also been utilized to experimentally examine the acoustic functions of the air sac in mammals (Riede et al., 2008). For a comprehensive understanding of animal vocalizations, the combined use of different approaches is highly important.

The aim of the present review is to introduce the basics of non-linear dynamics in voice production and to describe their importance in animal communications. Among various factors that regulate non-linear property, in the latter part, we focus on source–filter interaction as a mechanism that can lead either to efficient vocalization or to voice instability in animal sounds.

Classification of dynamic states and bifurcations

Bioacoustic signals show a rich variety of non-linear phenomena such as limit cycles, subharmonics, biphonation, chaos, and transitions between them (Wilden et al., 1998; Fitch et al., 2002; Herbst et al., 2013). This section describes the basic characteristics of these phenomena from the perspective of non-linear dynamics. As an example of a vocal system that produces various non-linear phenomena, a two-mass model of the vocal folds is utilized. The model was introduced by Ishizaka and Flanagan (1972) and adapted by Steinecke and Herzel (1995) to study the effect of asymmetry between the left and right vocal folds on their vibrations. Figure 1b shows a schematic representation of the two-mass model. Each vocal-fold tissue is divided into upper and lower portions of the masses, which are coupled by springs. For simplicity, the flow inside the glottis obeys the Bernoulli principle below the narrowest part of the glottis. Despite its simplified formula, with only four degrees of freedom, the model is known to capture essential features of various vocalizations including voice pathology. Here, as the key parameters controlling the vocal-fold vibrations, we use the asymmetry parameter Q, which determines the tension imbalance between the left and right vocal folds, and the subglottal pressure Ps, which supports their oscillations. Other parameter values are set as the standard ones that are widely applied to human voice and animal vocalizations. Note here that this particular model and the associated parameters are used merely as an example of the voice production model, while a variety of other vocal-fold models may produce essentially the same non-linear phenomena as their parameters vary.

By setting different values for the two parameters, five types of dynamic states are drawn in Figure 2. On the left panels, dynamic trajectories are drawn in three-dimensional state space. To display the time waveforms, the flow rates generated from the glottis are displayed against time (middle panels). The power spectra, obtained by the fast Fourier transform of the corresponding time series, are also shown in the right panels.

Figure 2

Five dynamic states generated from the asymmetric two-mass model. The parameter values were set as: (a–c) {Ps, Q} = {0.002 g/cm ms2, 0.8}; (d–f) {Ps, Q} = {0.01 g/cm ms2, 0.8}; (g–i) {Ps, Q} = {0.01 g/cm ms2, 0.754}; (j–l) {Ps, Q} = {0.009 g/cm ms2, 0.598}; (m–o) {Ps, Q} = {0.01 g/cm ms2, 0.598}. (a, d, g, j, m) display the three-dimensional trajectory in state space. (b, e, h, k, n) represent the time series of the volume flow, whereas (c, f, i, l, o) are the power spectra corresponding to the time-series data.

  1. •  Stable equilibrium: No oscillation of the vocal folds occurs in the stable equilibrium state (single point in the state space of Figure 2a). Because the system state does not change in time, the time series indicates a constant value (Figure 2b). The corresponding spectrum shows no power at any frequency (Figure 2c). In phonation, this state corresponds to aphonia (i.e. no sound).
  2. •  Limit cycle: Periodic oscillation, which indicates a closed circle in the state space of the vocal folds (Figure 2d), corresponds to a limit cycle. The time series gives rise to a periodic waveform (Figure 2e). Its spectrum is composed of the fundamental frequency, which is given by the inverse of the oscillation period of the vocal folds, and higher harmonics that are integer multiples of the fundamental frequency (Figure 2f). A normal voiced sound, e.g. sustained phonation of vowels, provides an example of this state.
  3. •  Subharmonics: In the state space, subharmonics represents a closed circle with several rotations passing close to each other (Figure 2g). The time series produces a periodic waveform composed of similar but slightly different patterns, where each pattern corresponds to one cycle of the vocal-fold oscillation (Figure 2h). In the power spectrum, in addition to the fundamental frequency of the vocal-fold oscillation, spectral components appear in the harmonic stack, typically at multiples of 1/2 or 1/3 of the fundamental frequency (Figure 2i). This state is often observed in a transition from one type of vocalization to another.
  4. •  Biphonation: Biphonation (also called ‘torus’ in terms of non-linear dynamics) produces a tube-like form, where the dynamic trajectory never returns to the same state (Figure 2j). The time series gives rise to a non-periodic waveform (Figure 2k). The corresponding power spectrum shows two independent fundamental frequencies and their harmonics (Figure 2l). An example of this state is represented by desynchronized oscillations of the left and right vocal folds that have different frequencies.
  5. •  Chaos: Chaos represents a strange attractor, on which an irregular dynamic trajectory never returns to the same state (Figure 2m). The corresponding time series gives rise to a non-periodic irregular waveform (Figure 2n). The power spectrum shows a broadband segment with no particular harmonics in the spectrum (Figure 2o). An example of this state is a pathologically rough voice, e.g. induced by papilloma of the vocal folds (Titze et al., 1993). One of the main interests of bioacoustics has been to clarify how noisy irregular sounds, which are ubiquitous in animal vocalizations, arise from low-dimensional non-linear dynamics. The mechanisms known to induce chaotic dynamics in the human voice, such as asymmetry between the left and right vocal folds (Steinecke and Herzel, 1995), desynchronization of anterior–posterior modes of the vocal folds (Neubauer et al., 2001), excessively high subglottal pressure (Jiang et al., 2001), and source–filter interaction (Hatzikirou et al., 2006), may also play a very important role in animal voices.
  6. •  Bifurcations: Bifurcations are transitions between different types of dynamic states. A slight change in the system parameter can induce such transitions. In the spectrogram shown in Figure 3a, the asymmetry parameter was decreased from Q = 0.6 to Q = 0.37. Transitions from the harmonic segment (limit cycle) to the subharmonic segment and the chaotic broadband segment are discernible without intervening silent intervals. Such bifurcations are observed in humans (Mende et al., 1990; Herbst et al., 2013) as well as in bioacoustic data including birds (Fee et al., 1998) and mammals (Wilden et al., 1998; Fitch et al., 2002). A sound example measured from a chimpanzee call is shown in Figure 3b, and resembles the qualitative features of bifurcations produced by the two-mass model.

Figure 3

(a) Bifurcation diagrams of the asymmetric two-mass model. The asymmetry parameter was decreased from Q = 0.6 to Q = 0.37. (b) Spectrogram of an acoustic signal measured from a chimpanzee call.

Source–filter interaction

As described in the introductory section, the source–filter theory provides a basis for the mechanism of sound production in humans (Titze, 1994) and animals (Taylor and Reby, 2010). The sound source, generated by the vocal-fold vibrations, is characterized by the fundamental frequency (f0) and its higher harmonics. The vocal tract functions as a filter to amplify the harmonics of f0 near the vocal-tract resonances. The sound wave is radiated from the lips of the mouth and is partially reflected back to the glottis through the vocal tract. The source–filter theory has been applied successfully to normal human speech, wherein vocal-fold vibration is assumed to be only weakly influenced by the vocal tract. With such weak interaction between the source and the filter, f0 can be freely controlled independently from the vocal-tract acoustics and vice versa. This can be advantageous for acoustic communications with language, which requires expression of various phonemes with a flexible maneuver of the vocal-tract configuration. By contrast, rigid source–filter interaction, as seen in some musical instruments, e.g. woodwinds (Fletcher and Rossing, 1998), implies that vocal-fold vibration can be influenced by the vocal tract, the resonances of which primarily determine f0. Such a strong interaction prevents flexible and sophisticated modifications of the timbre of the voice as seen in human speech. Whether or not animal vocalizations allow such flexibility in vocal-tract control has been disputed (Lieberman, 2007; Fitch et al., 2017). Recent findings suggest that many animal vocalizations belong to a class of relatively weak source–filter interactions. Experiments with heliox showed that no major change in f0 was induced by change in vocal-tract acoustics (Koda et al., 2012).

Even in the human voice, the source–filter interaction is known to be strengthened under certain conditions (Flanagan, 1968; Rothenberg, 1981; Titze, 2006, 2008). For instance, during glissando singing, f0 can sometimes cross one of the resonances of the vocal tract. Such a situation is called ‘resonance tuning.’ When the sources of f0 and the lowest vocal-tract resonance are in close proximity to each other, the reflection from the vocal tract to the vocal folds becomes non-negligible and the interaction between source and filter, which are coupled through a non-linear function, plays an important role. This resonance tuning produces both linear and non-linear effects. The linear effect utilizes the vocal-tract acoustics as a linear filter. The power of the source sound is concentrated on the resonance frequency, through which the vocal tract can transfer the source sound waves most efficiently. Without losing the source power, such a pure-tone loud voice produced by trained singers can be heard clearly in a large concert hall. In addition to the linear effect, the non-linear effect influences the vocal-fold oscillations, thereby possibly making the source sound even louder. The technique of resonance tuning has been found not only in singing voices (Joliveau et al., 2004) but also in animal vocalizations (Riede et al., 2006; Koda et al., 2012). The technique is considered to be efficient for long-distance alarm calls and pure-tone singing.

Nonlinear interaction has two aspects (Story et al., 2000). On one hand, vocal-tract acoustics facilitate vocal-fold oscillations and contribute to the production of a loud vocal sound, which is highly efficient in terms of acoustic energy. On the other hand, vocal-tract acoustics inhibit vocal-fold oscillations and consequently induce voice instability. Non-linear phenomena such as bifurcations into subharmonics and chaos have been observed here. To demonstrate the effect of the source–filter interaction, we present experimental results using a physical silicone model of the vocal folds (Murray and Thomson, 2012). The physical model has the advantages of enabling well-controlled experiments and reproducibility. To realize the body-cover structure of the vocal folds, a so-called M5 model having symmetric left–right geometry (Scherer et al., 2001) with a two-layer structure, was constructed. The stiffness of each layer can be adjusted by changing the mixing ratio of the silicone materials. Next, a physical model of the vocal tract was made from two polyvinyl chloride (PVC) tubes. By connecting them, the total tube length could be adjusted from 60 to 105 cm (Figure 4a). Accordingly, the lowest resonance frequency changed from 72 to 137 Hz (Figure 4b). The f0 of the vocal-fold physical model was approximately 120 Hz, which may cross the resonance frequency F1. Figure 4a shows the experimental setup. A compressor (Hitachi SC820) sends air to the system. The flow rate is controlled by a regulator (Fairchild 10202U) and measured by a digital mass flow controller (Azbil CMQ-V). In this experiment, the aerodynamic input to the vocal-fold model was set to be constant. Only the tube length, L, was changed during the experiment. The tube length was increased so that the first resonance frequency, F1, decreased. The subglottal pressure and the acoustic sounds were measured as the major parameters of the experiment.

Figure 4

Experiment to investigate source–filter interaction. (a) Experimental setup. Oscillation of the physical vocal-fold model is induced by flow coming from the left side of this figure. The length of the vocal-tract tube was varied. The microphone records sound produced from the vocal-fold model. (b) During the course of the experiment, the vocal-tract tube was extended. Accordingly, the first resonance frequency was lowered where it crosses the fundamental frequency at 6 s. (c) Time series of subglottal pressure. At the point when the resonance frequency crosses the fundamental frequency, the oscillation amplitude was strongly enhanced by the source–filter interaction. After the crossing, the oscillation was stopped by the quenching effect. (d) In the simulation study, the vocal tract was lengthened so that the resonance frequency (solid line) crosses the fundamental frequency (dotted line). (e) At a vocal tract length of 70 cm, the oscillation amplitude of the two-mass model was enhanced. A further increase in vocal-tract length resulted in quenching.

Figure 4c shows subglottal pressure recorded from the M5 model under a flow rate of 95 L/min (because of the simplified layer structure of the M5 model, a relatively high flow rate is needed to induce vocal-fold oscillations). At approximately 5 s, the oscillation amplitude was at its largest, implying the occurrence of resonance. After 8 s, the amplitude of the pressure was at its smallest, where the oscillation stopped. This corresponds to aphonia or, in terms of nonlinear dynamics, ‘quenching’ (‘oscillation death’). It should be noted here that source–filter coupling has a strong effect in the present experiment. Its straightforward interpretation in the human voice, however, needs caution, because the M5 model, which typically exhibits incomplete glottal closure and minor mucosal wave-like motions, is a simplified version of the vocal-fold model. The real human vocal folds, supported by a detailed multilayer surface structure, are capable of much more dynamic movements and should be more susceptible to the effect of source–filter interaction.

To elucidate the mechanism that underlies the observed non-linear phenomena, we show that a mathematical model is useful. To reproduce the experiment, vocal-fold vibration was simulated by the two-mass model (Steinecke and Herzel, 1995), whereas vocal-tract acoustics were modeled by the wave-reflection system, a time-domain model of the propagation of one-dimensional planar acoustic waves through a collection of uniform cylindrical tubes (Titze, 2006). The area functions for the sub- and supraglottal tract were designed to match the experiment. To couple the sub-and supraglottal systems to the vocal-fold model, an interactive source–filter coupling was realized according to the standard formula (Titze, 2008). The vocal-tract length was increased from 50 to 100 cm (Figure 4d). The simulation result (Figure 4e) agrees quite well with our experiment. As the lowest resonance frequency approached f0, the oscillatory amplitude of the vocal folds was enhanced. After F1 crossed f0, vocal-fold oscillation stopped. In the sense that vocal-tract acoustics are the only factor that was changed during this simulation, reproduction of the experiment by using the mathematical model implies that the source–filter interaction is the main regulator of the non-linear oscillations observed in the physical model of the vocal folds.

Discussion

In this review article, the concept of non-linear dynamics has been described and its importance for understanding animal vocalization was then reviewed. Simulations and experimental data presented examples of distinctive dynamic states (aphonia, limit cycles, subharmonics, biphonation, and chaos) and their bifurcations. Among various regulators of non-linear dynamics, source–filter interaction was discussed in depth using a physical model of the vocal folds.

When noisy, irregular sounds are measured from animal vocalizations, the idea of non-linear dynamics provides a possibility that they are not necessarily from high-dimensional turbulent noise but may come from a non-linear system with only a few degrees of freedom. When abrupt changes are observed, they have often been considered to be the result of complex neural control of the vocal production system (e.g. respiratory system, larynx, and vocal tract). For instance, in songbirds, superfast syringeal muscles have been found to control sound production (Elmans et al., 2008). In contrast, studies of non-linear dynamics revealed that, even without complex nervous system control, the vocal system can exhibit a variety of non-linear phenomena such as chaos and bifurcations (Titze et al., 1993; Wilden et al., 1998; Fitch et al., 2002). Observations of low-dimensional dynamics therefore imply that various complex features of animal vocalizations are traceable primarily to the biomechanical properties of the vocal production systems rather than their neural control.

In the experiment investigating non-linear source–filter interaction using a physical model of the vocal folds, both resonance and quenching were observed. This is consistent with the theory (Story et al., 2000) that vocal-fold oscillation is facilitated by vocal-tract acoustics when f0 is slightly smaller than F1. Conversely, vocal-fold oscillation is inhibited when f0 is slightly larger than F1. This suggests that perfectly matching f0 and the F1 is not desirable but having f0 slightly different to F1 is more beneficial for energy feedback to the sound source. Although the former usage of source–filter interaction is highly advantageous for long-distance calls, it should be noted that the resonance tuning produces a pure-tone sound because it emphasizes only the first resonance F1 and not others. This is inconvenient for more complicated communications using first, second, and higher resonances or formants. For instance, humans manipulate the first and second formants through speech articulation that varies the cross-sectional area and the length of the vocal tract. It has been questioned whether animals use formants for vocal communications, despite the limited space in their supraglottal systems (Lieberman, 2007). Recent findings, however, have suggested that their larynxes may descend to extend the vocal-tract system (Nishimura et al., 2003) and, moreover, formant structure may actually change in animal vocalizations (Riede and Zuberbuehler, 2003; Fitch et al., 2016). For flexible articulatory control of the formants, the f0 should be located away from the formants to avoid interference between source and filter, which may lead to voice instabilities.

Finally, non-linear analysis of bioacoustic signals should be carried out carefully in some situations. Non-stationarity, which is an inherent characteristic of animal vocalizations, makes reliable analysis difficult. Recording noise and non-linear distortion in acoustic signals can increase errors in the analysis. These factors may be misleading and possibly result in an inaccurate understanding of non-linear properties in animal vocalizations. For a correct interpretation of complex vocalizations, combined use of different approaches such as accompanying simulations of biomechanical models (Fee et al., 1998; Mergell et al., 1999), physical experiments (Riede et al., 2008), and excised larynx experiments (Herbst et al., 2012) should be of great help.

Acknowledgments

The author would like to thank Mr. Kishin Migimatsu for providing the experimental data on physical modeling of the vocal folds. This work was partially supported by Grant-in-Aids for Scientific Research (Nos. 16H04848, 25540074, 23300071) from the Japan Society for the Promotion of Science (JSPS) and by the SPIRITS program from Kyoto University.

References
 
© 2018 The Anthropological Society of Nippon
feedback
Top