Journal of the Acoustical Society of Japan (E)
Online ISSN : 2185-3509
Print ISSN : 0388-2861
ISSN-L : 0388-2861
Volume 7, Issue 1
Displaying 1-10 of 10 articles from this issue
  • Ken'iti Kido
    1986 Volume 7 Issue 1 Pages 1
    Published: 1986
    Released on J-STAGE: February 17, 2011
    JOURNAL FREE ACCESS
    Download PDF (79K)
  • Shizuo Hiki
    1986 Volume 7 Issue 1 Pages 3-4
    Published: 1986
    Released on J-STAGE: February 17, 2011
    JOURNAL FREE ACCESS
    Download PDF (336K)
  • Yorinobu Sonoda, Kazuto Nakakido
    1986 Volume 7 Issue 1 Pages 5-12
    Published: 1986
    Released on J-STAGE: May 20, 2011
    JOURNAL FREE ACCESS
    The purpose of this experiment is to study the effect of the speaking rate on the articulatory movements of jaw. Two subjects read a list of nonsense words of the form /V1 V2 V3/, where V1 and V3 were all possible combination of /i, o, e/ and V2 was one ofthe five vowels /i, u, o, a, e/. We measured articulatory dynamics (displacement, transition time and velocity) of jaw movements at the two speaking rates and estimated dynamic characteristics of jaw articulatory system. The results of this study show that the adjustments of speaking rate were achieved by the different strategies between the subjects: with the increase in speaking rate, (1) the velocity of movement did not change, but the articulatory displacement was decreased, and (2) the velocity was increased and the displacement did not vary. The effect of speaking rate on the magnitude of articulatory efforts was clearly observed during the closing movements. The subject who demonstrated rapid movement at the fast speaking rate might increase the articulatory force in the jaw muscular system.
    Download PDF (1176K)
  • Koji Tajima, Akio Tanaka, Mitsuo Komura
    1986 Volume 7 Issue 1 Pages 13-20
    Published: 1986
    Released on J-STAGE: May 20, 2011
    JOURNAL FREE ACCESS
    We propose a novel algorithm for connected word recognition. We introduce the overlap and split of reference patterns in matching them to portions of a test pattern. As a result, recognition errors due to coarticulation, word shortening and silence between words, which are inherent in connected speech, are reduced. Experiments with Japanese 4-digit strings (8 speakers × 350 strings) show that our method obtains better scores of recognition accuracy (as much as 7.85 %) than that obtained by Sakoe's two-level DP-matching algorithm.
    Download PDF (1233K)
  • Nobuhiro Miki, Sato Saga, Yoshikazu Miyanaga, Nobuo Nagai
    1986 Volume 7 Issue 1 Pages 21-28
    Published: 1986
    Released on J-STAGE: May 20, 2011
    JOURNAL FREE ACCESS
    Recently, a model identification system (MIS) has been studied for ARMA parameter estimation. Since in MIS an estimator with a high-order ARMA parameter and a model reduction algorithm are used, the tracking ability of the spectral estimation is good. However in real speech analysis, estimation errors are caused by the use of a low-pass filter for A/D conversion. In order to compensate for this estimation error we propose a weighted MIS (WMIS) which includes a compensation for the characteristics of the low-pass filter. The WMIS estimates the input of the reference model and time-varying ARMA parameters, and achieves the property of rapid convergence by using a highorder model. Furthermore the algorithm for minimum realization is shown as a model reduction algorithm. The proposed algorithms are applied to synthetic speech and real speech, and it is shown that the estimated spectra sufficiently represent the variation of formants without jitters in the high-frequency part.
    Download PDF (946K)
  • Yutaka Kobayashi, Yasuhiro Ohmori, Yasuhisa Niimi
    1986 Volume 7 Issue 1 Pages 29-38
    Published: 1986
    Released on J-STAGE: February 17, 2011
    JOURNAL FREE ACCESS
    In this paper the authors propose a method of vowel recognition in continuous speech and give some experimental results. The dynamic characteristics of speech are analyzed in order to enhance the intended vowels and to prune spurious ones. Physical parameters of vowels in continuous speech hardly reach at their preset target values because of the smoothing effects of coarticulation. However, the direction of the targets are detectable in many cases. The analysis algorithm uses the temporal movements of the Weighted Likelihood Ratios between the input speech and 6 templates: 5 Japanese vowels and a nasal group. Recognition experiments were carried out for two sets of speech data spoken by two male speakers. The sets contain 35 and 53 sentences, respectively. Using the speaker-dependent templates, 76.2 % and 80.7 % of vowels were correctly recognized and the effectiveness of the enhancement algorithm was proved. Major problems left for further improvement are treatments of long vowels, diphthongs, semi-vowels, devocalization, and nasalization.
    Download PDF (1431K)
  • Kiyoshi Hashimoto, Shinsuke Suga
    1986 Volume 7 Issue 1 Pages 39-46
    Published: 1986
    Released on J-STAGE: February 17, 2011
    JOURNAL FREE ACCESS
    This paper is concerned with application of a three-dimensional model of the tongue to the problem of estimating a set of muscular tensions of the human tongue from the given X-ray outline. In order to inspect the motional aspects of the muscular tensions and also in order to obtain the more accurate estimations than the previous single-stage estimation, the present experiment adopts the multistage fitting. From a given tongue outline, the consecutively varied outlines from the neutral shape are produced by interpolation, and a set of muscular tensions of the model tongue is obtained each time the model is fitted best to the interpolated outline. In comparisons with the previous single-stage estimations, the present results are nearer to the actual human EMG data. The present multistage results also reveal the nonlinear time patterns for the muscular tensions, which are presumed due to their participation in pressing the tongue upward to the palate or in pressing it down to the floor.
    Download PDF (933K)
  • Masuzo Yanagida, Youichi Yamashita, Osamu Kakusho, Donatus Graham-Stua ...
    1986 Volume 7 Issue 1 Pages 47-56
    Published: 1986
    Released on J-STAGE: February 17, 2011
    JOURNAL FREE ACCESS
    Described here are some physical interpretations for the computational process of the Givens' reduction, an efficient algorithm for obtaining the least-squares solution of a set of over-determined linear equations. These physical interpretations are brought into view by applying the Givens' reduction to the linear predictive analysis of speech, and would not have suggested themselves if the Givens' reduction were considered only in connection with the solution of an ordinary set of over-determined linear equations. The attractive advantages found in the computational process of the Givens' reduction are the following:(1) The forward and the backward prediction errors for each data sample are automatically obtained in the working matrix of the augmented Givens' reduction.(2) The direct time-update recursion for the K-parameter, the reflection coefficients in linear prediction analysis, is implicitly performed sample by sample.(3) An index representing the rank of the covariance matrix of the input data sequence is available in the working vector. These advantages come out from the physical interpretation of the corresponding parts of the working areas for the Givens' reduction. The behavior of these values in the working areas are empirically confirmed by actual analysis of speech.
    Download PDF (1184K)
  • Hiroya Fujisaki, Keikichi Hirose, Miyoko Sugito
    1986 Volume 7 Issue 1 Pages 57-63
    Published: 1986
    Released on J-STAGE: February 17, 2011
    JOURNAL FREE ACCESS
    In order to determine the universal and language-specific characteristics for wordaccent, its acoustic manifestations were analyzed and compared in the disyllabic words of English (“permit, ” “record, ” “object” etc.) and in the two-mora words of Japanese (“ame”). Analyses on the fundamental frequency contours (F0 contours) of these words were made using the functional model of F0 contour generation proposed by one of the authors. While a marked similarity was observed between F0 contour characteristics of English and Japanese in cases of both first-syllable accented and second-syllable accented, individual differences were much greater in the accent command for English words with an accented first syllable. Segmental and syllabic durations were measured on the speech waveform, and it was found that the accentual changes in duration occur mainly in the second syllable in Japanese, while in English they tend to be complementary in the first and the second syllables. The intensity and formant frequencies of syllabic nuclei were also analyzed. The results of these analyses indicated that duration, intensity and formant frequencies are less stable than F0 contour as correlates for the word accent both in English and in Japanese.
    Download PDF (644K)
  • Yoshinori Sagisaka, Hirokazu Sato
    1986 Volume 7 Issue 1 Pages 65-74
    Published: 1986
    Released on J-STAGE: February 17, 2011
    JOURNAL FREE ACCESS
    This paper focuses on an analysis of accentual characteristics in Japanese phrases and in long compounds for the purpose of fine prosody control in Japanese text-to-speech conversion. With respect to Japanese phrase accent analysis, the secondary accent generation is shown to depend on (a) whether or not an anterior constituent word is accented, (b) phrase length and (c) the constituent word accent attributes. Moreover, the fundamental frequency patterns of phrases having a secondary accent are studied to demonstrate their systematic controllability. In terms of long compound analysis, characteristics regarding segmentation into smaller utterance groups are discussed in relation to their syntactic structure. Furthermore, their accentuation characteristics are analyzed by comparing with ordinary short compound accentuation. These results indicate the possibilities of further prosody control and the necessity for syntactic analysis for such control in Japanese text-to-speech conversion.
    Download PDF (1128K)
feedback
Top