Speech synthesis based on hidden Markov models (HMMs) processes both segmental and prosodic features of speech together in a frame-by-frame manner. One benefit of this method is that time alignment of both features is kept automatically. However, when the training data are limited, frame-by-frame representation is not appropriate for prosodic features, which tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F0) contour generation by HMM-based speech synthesis. A method is developed to modify F0 contours in the framework of generation process model (henceforth, F0 model) by referring to linguistic information of input text (word boundary and accent type). It takes F0 variances obtained through HMM-based speech synthesis into account during the process. Through a listening experiment on synthetic speech, the method is proved to generate better quality as compared to the HMM-based speech synthesis on average. Since the F0 model can clearly relate its commands and linguistic (and para-/non- linguistic) information, the method has an additional advantage; changing speech styles, and/or adding further information (such as emphasis) can be easily done through manipulating the commands.
In popular music, the drums and bass guitar create the rhythm. In 2008, we developed a drum pattern database including thousands of music excerpts to investigate the role of drums. However, no studies have been reported on using a large number of samples to investigate the role of the bass guitar in popular music. Moreover, it has been difficult to generate appropriate phrases on automatic-arrangement systems. We propose a method that we have developed for identifying the bass part using Musical Instrument Digital Interface (MIDI) excerpts. In the method, each one-bar length is identified as either a bass guitar part or not, and if it is, it is appropriately named a ``bass guitar pattern.'' The basic information of the bass guitar pattern, such as onset, interval, and dynamics profiles, was extracted from it and a database comprising these profiles was constructed in order to extract common features among patterns. We use principal component analysis (PCA) to introduce several automatic-arrangement parameters, including those in the proposed database, and call these parameters ``eigenphrases of the bass guitar.'' We propose a method of arranging the bass guitar part to generate an appropriate pattern as a musical phrase by multiplying each of the principal component vectors from the eigenphrase of the bass guitar with relative weights. The method is confirmed to be effective as a means of generating bass guitar parts in a natural rather than an artificial way.
In this paper, we describe an interferometric optical fiber hydrophone using a pair of fiber Bragg gratings (FBGs) with a polarization-maintaining fiber (PMF). Signal fading induced by random fluctuations in the state of polarization for interfering beams of a fiber interferometer is a common problem for all interferometric optical fiber sensors. To overcome this signal fading problem, a PMF was utilized to construct an interferometric optical fiber hydrophone using a pair of FBGs. Then the performance of the PMF hydrophone was compared with that of a conventional single-mode optical fiber hydrophone. In our experiment, we adopted a 3×3 coupler scheme for the demodulation of acoustic signals to reduce the computational load required for the demodulation scheme, and the effectiveness of the proposed optical fiber hydrophone with the 3×3 coupler scheme was confirmed.
We describe the analytical results for pitch patterns of sentence utterances in English spoken by Japanese (Japanese English, henceforth). In this study, the difference between Japanese English and native English speakers is evaluated in terms of sentence structure to classify words depending on their positions and functions. Results suggest that the intonation patterns in Japanese English are flat, such that most function words are high pitch at any position in a sentence, whereas content words are low pitch at the beginning of sentences and at the ends of interrogative sentences, but high pitch at the ends of declarative sentences.
Listening to a piece of music evokes many physiological responses including changes in respiration. The present study shows that respiration is entrained to the musical timing by comparing the changes in respiration timing and respiration period along gaining in listening experience. Participants listened to the same track once a day for 10 days. The distribution of the timings of respiration on the music track was calculated using kernel density estimation, and probability of the coincidence was evaluated statistically using surrogate data. The results show that participants unconsciously changed their respiration timing to coincide with the music track as they gain experience in listening. On the other hand, respiration period did not change.