ITE Technical Report

[title in Japanese]

Article type: Cover
1991 Volume 15 Issue 54 Pages Cover1-
Published: September 27, 1991
Released on J-STAGE: October 06, 2017

DOIhttps://doi.org/10.11485/tvtr.15.54_Cover1

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Download PDF (12K)
[title in Japanese]

Article type: Index
1991 Volume 15 Issue 54 Pages Toc1-
Published: September 27, 1991
Released on J-STAGE: October 06, 2017

DOIhttps://doi.org/10.11485/tvtr.15.54_Toc1

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Download PDF (31K)
Segment-based HMM Applied to Noisy Speech Recognition

Kazumi Ohkura, Masahide Sugiyama

Article type: Article
1991 Volume 15 Issue 54 Pages 1-6
Published: September 27, 1991
Released on J-STAGE: October 06, 2017

DOIhttps://doi.org/10.11485/tvtr.15.54_1

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

Segment-Quantization-based HMMs (SQ-HMMs), which are trained with Segment-Quantized codebook (SQ-codebook) using the following two approaches, show higher recognition performance than VQ-HMMS in a noisy environment : 1) phoneme category dependent SQ-codebook generation from categorized speech data using phoneme label information, 2) speaking-style dependent SQ-codebook generation from the task close to input speech or the same task as input speech. In a comparison of SQ with our VQ-based baseline method in 18 consonant recognition experiments, recognition rates for SQ-HMMs are improved 3.9%, 8.2% and 9.1% at SNR=∞. 30dB and 20dB, respectively. In Japanese phrase recognition experiments, phrase recognition rates are 88.2%, 84.2% and 52.7% at SNR=∞, 30dB and 20dB, respectively. In comparison with our VQ-based baseline method, these recognition rates are improved 0.7%, 10.0% and 11.5%.

View full abstract

Download PDF (769K)
Class Specific Observation Vector Orthogonalisation And Its Interpretation As A Form Of Tied Continuous Mixture HMM

D. Rainton, S. Sagayama

Article type: Article
1991 Volume 15 Issue 54 Pages 7-13
Published: September 27, 1991
Released on J-STAGE: October 06, 2017

DOIhttps://doi.org/10.11485/tvtr.15.54_7

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

It is common practice to pre-orthogonalise data prior to training diagonal covariance continuous mixture density Hidden Markov Models (HMMs); this data orthogonalisation process is typically seen as being separate and distinct from the HMM training itself. Thus training is a two stage process, first the data is orthogonalised via a linear transformation, and then the HMM parameters are re-estimated using this pre-orthogonalised data. In this paper we provide an alternative interpretation; we show that observation vector orthogonalisation can be interpreted as simply another form of HMM mixture tying. Hence an HMM trained using orthogonalised data is identically equivalent to a form of tied HMM. Extending the basic orthogonalisation scheme described above we introduce a model dependent observation vector orthogonalisation algorithm, where each HMM is associated with its own separate model dependent orthogonalisation matrix. Using the theoretical framework described above, the corresponding tied mixture interpretation is derived. Finally, experimental results are presented comparing all of the various approaches on a common Japanese phoneme recognition task.

View full abstract

Download PDF (553K)
Use of Correlation between Pitch and Spectral Parameters for HMM Phoneme Recognition

Harald SINGER, Shigeki SAGAYAMA

Article type: Article
1991 Volume 15 Issue 54 Pages 15-20
Published: September 27, 1991
Released on J-STAGE: October 06, 2017

DOIhttps://doi.org/10.11485/tvtr.15.54_15

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

This paper proposes a novel method of incorporating pitch information into an HMM phoneme recognizer by exploiting the correlation between pitch and spectral parameters, e.g. cep-strum. In contrast to previous research [Takahashi 90] pitch patterns are not used explicitly; instead, spectral parameters are normalized framewise according to the instantaneous pitch value. Several evidences are given to show that the use of pitch information consistently improves the recognition performance. Experiments with 24 phoneme labels showed that the phoneme error rate for fast continuous speech could be improved by about 8 %.

View full abstract

Download PDF (659K)
An Optimal Discriminative Training Method for Speaker-Independent Word Recognition Using Continuous Mixture Density Acoustic-Phonetic Segment HMMs

Shinobu MIZUTA, Kunio NAKAJIMA

Article type: Article
1991 Volume 15 Issue 54 Pages 21-28
Published: September 27, 1991
Released on J-STAGE: October 06, 2017

DOIhttps://doi.org/10.11485/tvtr.15.54_21

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

In this paper. we describe a training method for speaker-independent word recognition system using continuous mixture density HMMs, called optimal discriminative training (ODT). This method minimizes the recongition error rate of training data, so the trained HMMs can be obtained high recognition accurasy. Here. at first, we show the effectiveness of this method using word data of same vocabulary as input word. We also describe the influence of training data's property balance. Next. we discuss the effectiveness of this method using data of different vocabulary from input word. When using these data. to reduce the dependency of the vocabulary, we devide the training data into small phonetic segments. In this case, we propose the training method using phonetic Information of pseaudo-input vocabulary.

View full abstract

Download PDF (736K)
PHONEMES RECOGNITION OF JAPANESE VOWELS USING NEURAL NETWORK

Akiko TSUBOTA, Nobukazu IIJIMA, Mototaka SONE, Hideo MITSUI, Yukio YOS ...

Article type: Article
1991 Volume 15 Issue 54 Pages 29-36
Published: September 27, 1991
Released on J-STAGE: October 06, 2017

DOIhttps://doi.org/10.11485/tvtr.15.54_29

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Show abstractHide abstract

It was tried to recognize Japanese Vowels using Neural Networks which consists of three layers with Back Probagation. Normal power spectrum which are analyzed by EFT are used as input data. These input data are not even. because FFT was used and phonemes have personal characteristics. These uneven data effect on learning and recognition of Neural Networks. However. learning data which ware selected according to a method lead to improve recognition rates. The average of recognition rate was 94.4%. This rate is very good even if using N.N. which is a simple structure and slight-modified input data.

View full abstract

Download PDF (777K)
[title in Japanese]

Article type: Appendix
1991 Volume 15 Issue 54 Pages App1-
Published: September 27, 1991
Released on J-STAGE: October 06, 2017

DOIhttps://doi.org/10.11485/tvtr.15.54_App1

RESEARCH REPORT / TECHNICAL REPORT FREE ACCESS

Download PDF (56K)

Register with J-STAGE for free!