IEICE Transactions on Information and Systems

Special Section on Robust Speech Processing in Realistic Environments

FOREWORD

Kazuya TAKEDA

2008 Volume E91.D Issue 3 Pages 391-392
Published: March 01, 2008
Released on J-STAGE: July 01, 2018

DOIhttps://doi.org/10.1587/transinf.2008EDF0002

JOURNAL FREE ACCESS

Download PDF (62K)
Signal Processing Techniques for Robust Speech Recognition

Futoshi ASANO

Article type: INVITED PAPER
Subject area: INVITED
2008 Volume E91.D Issue 3 Pages 393-401
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.393

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, signal processing techniques which can be applied to automatic speech recognition to improve its robustness are reviewed. The choice of signal processing techniques is strongly dependent on the scenario of the applications. The analysis of scenario and the choice of suitable signal processing techniques are shown through two examples.

View full abstract

Download PDF (7533K)
Noise Suppression Based on Multi-Model Compositions Using Multi-Pass Search with Multi-Label N-gram Models

Takatoshi JITSUHIRO, Tomoji TORIYAMA, Kiyoshi KOGURE

Article type: PAPER
Subject area: Noisy Speech Recognition
2008 Volume E91.D Issue 3 Pages 402-410
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.402

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a noise suppression method based on multi-model compositions and multi-pass search. In real environments, input speech for speech recognition includes many kinds of noise signals. To obtain good recognized candidates, suppressing many kinds of noise signals at once and finding target speech is important. Before noise suppression, to find speech and noise label sequences, we introduce multi-pass search with acoustic models including many kinds of noise models and their compositions, their n-gram models, and their lexicon. Noise suppression is frame-synchronously performed using the multiple models selected by recognized label sequences with time alignments. We evaluated this method using the E-Nightingale task, which contains voice memoranda spoken by nurses during actual work at hospitals. The proposed method obtained higher performance than the conventional method.

View full abstract

Download PDF (8518K)
Noisy Speech Recognition Based on Integration/Selection of Multiple Noise Suppression Methods Using Noise GMMs

Norihide KITAOKA, Souta HAMAGUCHI, Seiichi NAKAGAWA

Article type: PAPER
Subject area: Noisy Speech Recognition
2008 Volume E91.D Issue 3 Pages 411-421
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.411

JOURNAL FREE ACCESS

Show abstractHide abstract

To achieve high recognition performance for a wide variety of noise and for a wide range of signal-to-noise ratio, this paper presents methods for integration of four noise reduction algorithms: spectral subtraction with smoothing of time direction, temporal domain SVD-based speech enhancement, GMM-based speech estimation and KLT-based comb-filtering. In this paper, we proposed two types of combination methods of noise suppression algorithms: selection of front-end processor and combination of results from multiple recognition processes. Recognition results on the CENSREC-1 task showed the effectiveness of our proposed methods.kn-abstract=

View full abstract

Download PDF (3508K)
Robust Speech Recognition by Model Adaptation and Normalization Using Pre-Observed Noise

Satoshi KOBASHIKAWA, Satoshi TAKAHASHI

Article type: PAPER
Subject area: Noisy Speech Recognition
2008 Volume E91.D Issue 3 Pages 422-429
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.422

JOURNAL FREE ACCESS

Show abstractHide abstract

Users require speech recognition systems that offer rapid response and high accuracy concurrently. Speech recognition accuracy is degraded by additive noise, imposed by ambient noise, and convolutional noise, created by space transfer characteristics, especially in distant talking situations. Against each type of noise, existing model adaptation techniques achieve robustness by using HMM-composition and CMN (cepstral mean normalization). Since they need an additive noise sample as well as a user speech sample to generate the models required, they can not achieve rapid response, though it may be possible to catch just the additive noise in a previous step. In the previous step, the technique proposed herein uses just the additive noise to generate an adapted and normalized model against both types of noise. When the user's speech sample is captured, only online-CMN need be performed to start the recognition processing, so the technique offers rapid response. In addition, to cover the unpredictable S/N values possible in real applications, the technique creates several S/N HMMs. Simulations using artificial speech data show that the proposed technique increased the character correct rate by 11.62% compared to CMN.

View full abstract

Download PDF (2706K)
Feature Compensation Employing Multiple Environmental Models for Robust In-Vehicle Speech Recognition

Wooil KIM, John H. L. HANSEN

Article type: PAPER
Subject area: Noisy Speech Recognition
2008 Volume E91.D Issue 3 Pages 430-438
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.430

JOURNAL FREE ACCESS

Show abstractHide abstract

An effective feature compensation method is developed for reliable speech recognition in real-life in-vehicle environments. The CU-Move corpus, used for evaluation, contains a range of speech and noise signals collected for a number of speakers under actual driving conditions. PCGMM-based feature compensation, considered in this paper, utilizes parallel model combination to generate noise-corrupted speech model by combining clean speech and the noise model. In order to address unknown time-varying background noise, an interpolation method of multiple environmental models is employed. To alleviate computational expenses due to multiple models, an Environment Transition Model is employed, which is motivated from Noise Language Model used in Environmental Sniffing. An environment dependent scheme of mixture sharing technique is proposed and shown to be more effective in reducing the computational complexity. A smaller environmental model set is determined by the environment transition model for mixture sharing. The proposed scheme is evaluated on the connected single digits portion of the CU-Move database using the Aurora2 evaluation toolkit. Experimental results indicate that our feature compensation method is effective for improving speech recognition in real-life in-vehicle conditions. A reduction of 73.10% of the computational requirements was obtained by employing the environment dependent mixture sharing scheme with only a slight change in recognition performance. This demonstrates that the proposed method is effective in maintaining the distinctive characteristics among the different environmental models, even when selecting a large number of Gaussian components for mixture sharing.

View full abstract

Download PDF (4403K)
Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation

Tran HUY DAT, Kazuya TAKEDA, Fumitada ITAKURA

Article type: PAPER
Subject area: Speech Enhancement
2008 Volume E91.D Issue 3 Pages 439-447
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.439

JOURNAL FREE ACCESS

Show abstractHide abstract

We present a multichannel speech enhancement method based on MAP speech spectral magnitude estimation using a generalized gamma model of speech prior distribution, where the model parameters are adapted from actual noisy speech in a frame-by-frame manner. The utilization of a more general prior distribution with its online adaptive estimation is shown to be effective for speech spectral estimation in noisy environments. Furthermore, the multi-channel information in terms of cross-channel statistics are shown to be useful to better adapt the prior distribution parameters to the actual observation, resulting in better performance of speech enhancement algorithm. We tested the proposed algorithm in an in-car speech database and obtained significant improvements of the speech recognition performance, particularly under non-stationary noise conditions such as music, air-conditioner and open window.

View full abstract

Download PDF (3891K)
Recognizing Reverberant Speech Based on Amplitude and Frequency Modulation

Yotaro KUBO, Shigeki OKAWA, Akira KUREMATSU, Katsuhiko SHIRAI

Article type: PAPER
Subject area: ASR under Reverberant Conditions
2008 Volume E91.D Issue 3 Pages 448-456
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.448

JOURNAL FREE ACCESS

Show abstractHide abstract

We have attempted to recognize reverberant speech using a novel speech recognition system that depends on not only the spectral envelope and amplitude modulation but also frequency modulation. Most of the features used by modern speech recognition systems, such as MFCC, PLP, and TRAPS, are derived from the energy envelopes of narrowband signals by discarding the information in the carrier signals. However, some experiments show that apart from the spectral/time envelope and its modulation, the information on the zero-crossing points of the carrier signals also plays a significant role in human speech recognition. In realistic environments, a feature that depends on the limited properties of the signal may easily be corrupted. In order to utilize an automatic speech recognizer in an unknown environment, using the information obtained from other signal properties and combining them is important to minimize the effects of the environment. In this paper, we propose a method to analyze carrier signals that are discarded in most of the speech recognition systems. Our system consists of two nonlinear discriminant analyzers that use multilayer perceptrons. One of the nonlinear discriminant analyzers is HATS, which can capture the amplitude modulation of narrowband signals efficiently. The other nonlinear discriminant analyzer is a pseudo-instantaneous frequency analyzer proposed in this paper. This analyzer can capture the frequency modulation of narrowband signals efficiently. The combination of these two analyzers is performed by the method based on the entropy of the feature introduced by Okawa et al. In this paper, in Sect. 2, we first introduce pseudo-instantaneous frequencies to capture a property of the carrier signal. The previous AM analysis method are described in Sect. 3. The proposed system is described in Sect. 4. The experimental setup is presented in Sect. 5, and the results are discussed in Sect. 6. We evaluate the performance of the proposed method by continuous digit recognition of reverberant speech. The proposed system exhibits considerable improvement with regard to the MFCC feature extraction system.

View full abstract

Download PDF (2520K)
Robust Speech Recognition by Combining Short-Term and Long-Term Spectrum Based Position-Dependent CMN with Conventional CMN

Longbiao WANG, Seiichi NAKAGAWA, Norihide KITAOKA

Article type: PAPER
Subject area: ASR under Reverberant Conditions
2008 Volume E91.D Issue 3 Pages 457-466
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.457

JOURNAL FREE ACCESS

Show abstractHide abstract

In a distant-talking environment, the length of channel impulse response is longer than the short-term spectral analysis window. Conventional short-term spectrum based Cepstral Mean Normalization (CMN) is therefore, not effective under these conditions. In this paper, we propose a robust speech recognition method by combining a short-term spectrum based CMN with a long-term one. We assume that a static speech segment (such as a vowel, for example) affected by reverberation, can be modeled by a long-term cepstral analysis. Thus, the effect of long reverberation on a static speech segment may be compensated by the long-term spectrum based CMN. The cepstral distance of neighboring frames is used to discriminate the static speech segment (long-term spectrum) and the non-static speech segment (short-term spectrum). The cepstra of the static and non-static speech segments are normalized by the corresponding cepstral means. In a previous study, we proposed an environmentally robust speech recognition method based on Position-Dependent CMN (PDCMN) to compensate for channel distortion depending on speaker position, and which is more efficient than conventional CMN. In this paper, the concept of combining short-term and long-term spectrum based CMN is extended to PDCMN. We call this Variable Term spectrum based PDCMN (VTPDCMN). Since PDCMN/VT-PDCMN cannot normalize speaker variations because a position-dependent cepstral mean contains the average speaker characteristics over all speakers, we also combine PDCMN/VT-PDCMN with conventional CMN in this study. We conducted the experiments based on our proposed method using limited vocabulary (100 words) distant-talking isolated word recognition in a real environment. The proposed method achieved a relative error reduction rate of 60.9% over the conventional short-term spectrum based CMN and 30.6% over the shortterm spectrum based PDCMN.

View full abstract

Download PDF (4855K)
Noise Robust Voice Activity Detection Based on Switching Kalman Filter

Masakiyo FUJIMOTO, Kentaro ISHIZUKA

Article type: PAPER
Subject area: Voice Activity Detection
2008 Volume E91.D Issue 3 Pages 467-477
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.467

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper addresses the problem of voice activity detection (VAD) in noisy environments. The VAD method proposed in this paper is based on a statistical model approach, and estimates statistical models sequentially without a priori knowledge of noise. Namely, the proposed method constructs a clean speech/silence state transition model beforehand, and sequentially adapts the model to the noisy environment by using a switching Kalman filter when a signal is observed. In this paper, we carried out two evaluations. In the first, we observed that the proposed method significantly outperforms conventional methods as regards voice activity detection accuracy in simulated noise environments. Second, we evaluated the proposed method on a VAD evaluation framework, CENSREC-1-C. The evaluation results revealed that the proposed method significantly outperforms the baseline results of CENSREC-1-C as regards VAD accuracy in real environments. In addition, we confirmed that the proposed method helps to improve the accuracy of concatenated speech recognition in real environments.

View full abstract

Download PDF (2745K)
Linear Discriminant Analysis Using a Generalized Mean of Class Covariances and Its Application to Speech Recognition

Makoto SAKAI, Norihide KITAOKA, Seiichi NAKAGAWA

Article type: PAPER
Subject area: Feature Extraction
2008 Volume E91.D Issue 3 Pages 478-487
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.478

JOURNAL FREE ACCESS

Show abstractHide abstract

To precisely model the time dependency of features is one of the important issues for speech recognition. Segmental unit input HMM with a dimensionality reduction method has been widely used to address this issue. Linear discriminant analysis (LDA) and heteroscedastic extensions, e. g., heteroscedastic linear discriminant analysis (HLDA) or heteroscedastic discriminant analysis (HDA), are popular approaches to reduce dimensionality. However, it is difficult to find one particular criterion suitable for any kind of data set in carrying out dimensionality reduction while preserving discriminative information. In this paper, we propose a new framework which we call power linear discriminant analysis (PLDA). PLDA can be used to describe various criteria including LDA, HLDA, and HDA with one control parameter. In addition, we provide an efficient selection method using a control parameter without training HMMs nor testing recognition performance on a development data set. Experimental results show that the PLDA is more effective than conventional methods for various data sets.

View full abstract

Download PDF (2736K)
Canonicalization of Feature Parameters for Robust Speech Recognition Based on Distinctive Phonetic Feature (DPF) Vectors

Mohammad NURUL HUDA, Muhammad GHULAM, Takashi FUKUDA, Kouichi KATSURAD ...

Article type: PAPER
Subject area: Feature Extraction
2008 Volume E91.D Issue 3 Pages 488-498
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.488

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper describes a robust automatic speech recognition (ASR) system with less computation. Acoustic models of a hidden Markov model (HMM)-based classifier include various types of hidden factors such as speaker-specific characteristics, coarticulation, and an acoustic environment, etc. If there exists a canonicalization process that can recover the degraded margin of acoustic likelihoods between correct phonemes and other ones caused by hidden factors, the robustness of ASR systems can be improved. In this paper, we introduce a canonicalization method that is composed of multiple distinctive phonetic feature (DPF) extractors corresponding to each hidden factor canonicalization, and a DPF selector which selects an optimum DPF vector as an input of the HMM-based classifier. The proposed method resolves gender factors and speaker variability, and eliminates noise factors by applying the canonicalzation based on the DPF extractors and two-stage Wiener filtering. In the experiment on AURORA-2J, the proposed method provides higher word accuracy under clean training and significant improvement of word accuracy in low signal-to-noise ratio (SNR) under multi-condition training compared to a standard ASR system with mel frequency ceptral coeffient (MFCC) parameters. Moreover, the proposed method requires a reduced, two-fifth, Gaussian mixture components and less memory to achieve accurate ASR.

View full abstract

Download PDF (5440K)
Cost Reduction of Acoustic Modeling for Real-Environment Applications Using Unsupervised and Selective Training

Tobias CINCAREK, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO

Article type: PAPER
Subject area: Acoustic Modeling
2008 Volume E91.D Issue 3 Pages 499-507
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.499

JOURNAL FREE ACCESS

Show abstractHide abstract

Development of an ASR application such as a speech-oriented guidance system for a real environment is expensive. Most of the costs are due to human labeling of newly collected speech data to construct the acoustic model for speech recognition. Employment of existing models or sharing models across multiple applications is often difficult, because the characteristics of speech depend on various factors such as possible users, their speaking style and the acoustic environment. Therefore, this paper proposes a combination of unsupervised learning and selective training to reduce the development costs. The employment of unsupervised learning alone is problematic due to the task-dependency of speech recognition and because automatic transcription of speech is error-prone. A theoretically well-defined approach to automatic selection of high quality and task-specific speech data from an unlabeled data pool is presented. Only those unlabeled data which increase the model likelihood given the labeled data are employed for unsupervised training. The effectivity of the proposed method is investigated with a simulation experiment to construct adult and child acoustic models for a speech-oriented guidance system. A completely human-labeled database which contains real-environment data collected over two years is available for the development simulation. It is shown experimentally that the employment of selective training alleviates the problems of unsupervised learning, i. e. it is possible to select speech utterances of a certain speaker group but discard noise inputs and utterances with lower recognition accuracy. The simulation experiment is carried out for several selected combinations of data collection and human transcription period. It is found empirically that the proposed method is especially effective if only relatively few of the collected data can be labeled and transcribed by humans.

View full abstract

Download PDF (4418K)
Using Mutual Information Criterion to Design an Efficient Phoneme Set for Chinese Speech Recognition

Jin-Song ZHANG, Xin-Hui HU, Satoshi NAKAMURA

Article type: PAPER
Subject area: Acoustic Modeling
2008 Volume E91.D Issue 3 Pages 508-513
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.508

JOURNAL FREE ACCESS

Show abstractHide abstract

Chinese is a representative tonal language, and it has been an attractive topic of how to process tone information in the state-of-the-art large vocabulary speech recognition system. This paper presents a novel way to derive an efficient phoneme set of tone-dependent units to build a recognition system, by iteratively merging a pair of tone-dependent units according to the principle of minimal loss of the Mutual Information (MI). The mutual information is measured between the word tokens and their phoneme transcriptions in a training text corpus, based on the system lexical and language model. The approach has a capability to keep discriminative tonal (and phoneme) contrasts that are most helpful for disambiguating homophone words due to lack of tones, and merge those tonal (and phoneme) contrasts that are not important for word disambiguation for the recognition task. This enables a flexible selection of phoneme set according to a balance between the MI information amount and the number of phonemes. We applied the method to traditional phoneme set of Initial/Finals, and derived several phoneme sets with different number of units. Speech recognition experiments using the derived sets showed its effectiveness.

View full abstract

Download PDF (1611K)
Development of a Mandarin-English Bilingual Speech Recognition System for Real World Music Retrieval

Qingqing ZHANG, Jielin PAN, Yang LIN, Jian SHAO, Yonghong YAN

Article type: PAPER
Subject area: Acoustic Modeling
2008 Volume E91.D Issue 3 Pages 514-521
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.514

JOURNAL FREE ACCESS

Show abstractHide abstract

In recent decades, there has been a great deal of research into the problem of bilingual speech recognition-to develop a recognizer that can handle inter- and intra-sentential language switching between two languages. This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual Speech Recognition System (MESRS) for real world music retrieval. Two of the main difficult issues in handling the bilingual speech recognition systems for real world applications are tackled in this paper. One is to balance the performance and the complexity of the bilingual speech recognition system; the other is to effectively deal with the matrix language accents in embedded language**. In order to process the intra-sentential language switching and reduce the amount of data required to robustly estimate statistical models, a compact single set of bilingual acoustic models derived by phone set merging and clustering is developed instead of using two separate monolingual models for each language. In our study, a novel Two-pass phone clustering method based on Confusion Matrix (TCM) is presented and compared with the log-likelihood measure method. Experiments testify that TCM can achieve better performance. Since potential system users' native language is Mandarin which is regarded as a matrix language in our application, their pronunciations of English as the embedded language usually contain Mandarin accents. In order to deal with the matrix language accents in embedded language, different non-native adaptation approaches are investigated. Experiments show that model retraining method outperforms the other common adaptation methods such as Maximum A Posteriori (MAP). With the effective incorporation of approaches on phone clustering and non-native adaptation, the Phrase Error Rate (PER) of MESRS for English utterances was reduced by 24.47% relatively compared to the baseline monolingual English system while the PER on Mandarin utterances was comparable to that of the baseline monolingual Mandarin system. The performance for bilingual utterances achieved 22.37% relative PER reduction.

View full abstract

Download PDF (3472K)
Language Modeling Using PLSA-Based Topic HMM

Atsushi SAKO, Tetsuya TAKIGUCHI, Yasuo ARIKI

Article type: PAPER
Subject area: Language Modeling
2008 Volume E91.D Issue 3 Pages 522-528
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.522

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a PLSA-based language model for sports-related live speech. This model is implemented using a unigram rescaling technique that combines a topic model and an n-gram. In the conventional method, unigram rescaling is performed with a topic distribution estimated from a recognized transcription history. This method can improve the performance, but it cannot express topic transition. By incorporating the concept of topic transition, it is expected that the recognition performance will be improved. Thus, the proposed method employs a “Topic HMM” instead of a history to estimate the topic distribution. The Topic HMM is an Ergodic HMM that expresses typical topic distributions as well as topic transition probabilities. Word accuracy results from our experiments confirmed the superiority of the proposed method over a trigram and a PLSA-based conventional method that uses a recognized history.

View full abstract

Download PDF (2271K)
A One-Pass Real-Time Decoder Using Memory-Efficient State Network

Jian SHAO, Ta LI, Qingqing ZHANG, Qingwei ZHAO, Yonghong YAN

Article type: PAPER
Subject area: ASR System Architecture
2008 Volume E91.D Issue 3 Pages 529-537
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.529

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents our developed decoder which adopts the idea of statically optimizing part of the knowledge sources while handling the others dynamically. The lexicon, phonetic contexts and acoustic model are statically integrated to form a memory-efficient state network, while the language model (LM) is dynamically incorporated on the fly by means of extended tokens. The novelties of our approach for constructing the state network are (1) introducing two layers of dummy nodes to cluster the cross-word (CW) context dependent fan-in and fan-out triphones, (2) introducing a so-called “WI layer” to store the word identities and putting the nodes of this layer in the non-shared mid-part of the network, (3) optimizing the network at state level by a sufficient forward and backward node-merge process. The state network is organized as a multi-layer structure for distinct token propagation at each layer. By exploiting the characteristics of the state network, several techniques including LM look-ahead, LM cache and beam pruning are specially designed for search efficiency. Especially in beam pruning, a layer-dependent pruning method is proposed to further reduce the search space. The layer-dependent pruning takes account of the neck-like characteristics of WI layer and the reduced variety of word endings, which enables tighter beam without introducing much search errors. In addition, other techniques including LM compression, lattice-based bookkeeping and lattice garbage collection are also employed to reduce the memory requirements. Experiments are carried out on a Mandarin spontaneous speech recognition task where the decoder involves a trigram LM and CW triphone models. A comparison with HDecode of HTK toolkits shows that, within 1% performance deviation, our decoder can run 5 times faster with half of the memory footprint.

View full abstract

Download PDF (2650K)
Selection of Optimum Vocabulary and Dialog Strategy for Noise-Robust Spoken Dialog Systems

Akinori ITO, Takanobu OBA, Takashi KONASHI, Motoyuki SUZUKI, Shozo MAK ...

Article type: PAPER
Subject area: ASR System Architecture
2008 Volume E91.D Issue 3 Pages 538-548
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.538

JOURNAL FREE ACCESS

Show abstractHide abstract

Speech recognition in a noisy environment is one of the hottest topics in the speech recognition research. Noise-tolerant acoustic models or noise reduction techniques are often used to improve recognition accuracy. In this paper, we propose a method to improve accuracy of spoken dialog system from a language model point of view. In the proposed method, the dialog system automatically changes its language model and dialog strategy according to the estimated recognition accuracy in a noisy environment in order to keep the performance of the system high. In a noise-free environment, the system accepts any utterance from a user. On the other hand, the system restricts its grammar and vocabulary in a noisy environment. To realize this strategy, we investigated a method to avoid the user's out-of-grammar utterances through an instruction given by the system to a user. Furthermore, we developed a method to estimate recognition accuracy from features extracted from noise signals. Finally, we realized a proposed dialog system according to these investigations.

View full abstract

Download PDF (6069K)
Evaluation of a Noise-Robust Multi-Stream Speaker Verification Method Using F₀ Information

Taichi ASAMI, Koji IWANO, Sadaoki FURUI

Article type: PAPER
Subject area: Speaker Verification
2008 Volume E91.D Issue 3 Pages 549-557
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.549

JOURNAL FREE ACCESS

Show abstractHide abstract

We have previously proposed a noise-robust speaker verification method using fundamental frequency (F₀) extracted using the Hough transform. The method also incorporates an automatic stream-weight and decision threshold estimation technique. It has been confirmed that the proposed method is effective for white noise at various SNR conditions. This paper evaluates the proposed method in more practical in-car and elevator-hall noise conditions. The paper first describes the noise-robust F₀ extraction method and details of our robust speaker verification method using multi-stream HMMs for integrating the extracted F₀ and cepstral features. Details of the automatic stream-weight and threshold estimation method for multi-stream speaker verification framework are also explained. This method simultaneously optimizes stream-weights and a decision threshold by combining the linear discriminant analysis (LDA) and the Adaboost technique. Experiments were conducted using Japanese connected digit speech contaminated by white, in-car, or elevator-hall noise at various SNRs. Experimental results show that the F₀ features improve the verification performance in various noisy environments, and that our stream-weight and threshold optimization method effectively estimates control parameters so that FARs and FRRs are adjusted to achieve equal error rates (EERs) under various noisy conditions.

View full abstract

Download PDF (2276K)
Speaker Verification in Realistic Noisy Environment in Forensic Science

Toshiaki KAMADA, Nobuaki MINEMATSU, Takashi OSANAI, Hisanori MAKINAE, ...

Article type: PAPER
Subject area: Speaker Verification
2008 Volume E91.D Issue 3 Pages 558-566
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.558

JOURNAL FREE ACCESS

Show abstractHide abstract

In forensic voice telephony speaker verification, we may be requested to identify a speaker in a very noisy environment, unlike the conditions in general research. In a noisy environment, we process speech first by clarifying it. However, the previous study of speaker verification from clarified speech did not yield satisfactory results. In this study, we experimented on speaker verification with clarification of speech in a noisy environment, and we examined the relationship between improving acoustic quality and speaker verification results. Moreover, experiments with realistic noise such as a crime prevention alarm and power supply noise was conducted, and speaker verification accuracy in a realistic environment was examined. We confirmed the validity of speaker verification with clarification of speech in a realistic noisy environment.

View full abstract

Download PDF (2316K)
Automatic Language Identification with Discriminative Language Characterization Based on SVM

Hongbin SUO, Ming LI, Ping LU, Yonghong YAN

Article type: PAPER
Subject area: Language Identification
2008 Volume E91.D Issue 3 Pages 567-575
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.567

JOURNAL FREE ACCESS

Show abstractHide abstract

Robust automatic language identification (LID) is the task of identifying the language from a short utterance spoken by an unknown speaker. The mainstream approaches include parallel phone recognition language modeling (PPRLM), support vector machine (SVM) and the general Gaussian mixture models (GMMs). These systems map the cepstral features of spoken utterances into high level scores by classifiers. In this paper, in order to increase the dimension of the score vector and alleviate the inter-speaker variability within the same language, multiple data groups based on supervised speaker clustering are employed to generate the discriminative language characterization score vectors (DLCSV). The back-end SVM classifiers are used to model the probability distribution of each target language in the DLCSV space. Finally, the output scores of back-end classifiers are calibrated by a pair-wise posterior probability estimation (PPPE) algorithm. The proposed language identification frameworks are evaluated on 2003 NIST Language Recognition Evaluation (LRE) databases and the experiments show that the system described in this paper produces comparable results to the existing systems. Especially, the SVM framework achieves an equal error rate (EER) of 4.0% in the 30-second task and outperforms the state-of-art systems by more than 30% relative error reduction. Besides, the performances of proposed PPRLM and GMMs algorithms achieve an EER of 5.1% and 5.0% respectively.

View full abstract

Download PDF (2196K)
Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System

Tobias CINCAREK, Hiromichi KAWANAMI, Ryuichi NISIMURA, Akinobu LEE, Hi ...

Article type: PAPER
Subject area: Applications
2008 Volume E91.D Issue 3 Pages 576-587
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.576

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, the development, long-term operation and portability of a practical ASR application in a real environment is investigated. The target application is a speech-oriented guidance system installed at the local community center. The system has been exposed to ordinary people since November 2002. More than 300 hours or more than 700,000 inputs have been collected during four years. The outcome is a rare example of a large scale real-environment speech database. A simulation experiment is carried out with this database to investigate how the system's performance improves during the first two years of operation. The purpose is to determine empirically the amount of real-environment data which has to be prepared to build a system with reasonable speech recognition performance and response accuracy. Furthermore, the relative importance of developing the main system components, i. e. speech recognizer and the response generation module, is assessed. Although depending on the system's modeling capacities and domain complexity, experimental results show that overall performance stagnates after employing about 10-15k utterances for training the acoustic model, 40-50k utterances for training the language model and 40k-50k utterances for compiling the question and answer database. The Q & A database was most important for improving the system's response accuracy. Finally, the portability of the well-trained first system prototype for a different environment, a local subway station, is investigated. Since collection and preparation of large amounts of real data is impractical in general, only one month of data from the new environment is employed for system adaptation. While the speech recognition component of the first prototype has a high degree of portability, the response accuracy is lower than in the first environment. The main reason is a domain difference between the two systems, since they are installed in different environments. This implicates that it is imperative to take the behavior of users under real conditions into account to build a system with high user satisfaction.

View full abstract

Download PDF (5567K)
Bilingual Cluster Based Models for Statistical Machine Translation

Hirofumi YAMAMOTO, Eiichiro SUMITA

Article type: PAPER
Subject area: Applications
2008 Volume E91.D Issue 3 Pages 588-597
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.588

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a domain specific model for statistical machine translation. It is well-known that domain specific language models perform well in automatic speech recognition. We show that domain specific language and translation models also benefit statistical machine translation. However, there are two problems with using domain specific models. The first is the data sparseness problem. We employ an adaptation technique to overcome this problem. The second issue is domain prediction. In order to perform adaptation, the domain must be provided, however in many cases, the domain is not known or changes dynamically. For these cases, not only the translation target sentence but also the domain must be predicted. This paper focuses on the domain prediction problem for statistical machine translation. In the proposed method, a bilingual training corpus, is automatically clustered into sub-corpora. Each sub-corpus is deemed to be a domain. The domain of a source sentence is predicted by using its similarity to the sub-corpora. The predicted domain (sub-corpus) specific language and translation models are then used for the translation decoding. This approach gave an improvement of 2.7 in BLEU score on the IWSLT05 Japanese to English evaluation corpus (improving the score from 52.4 to 55.1). This is a substantial gain and indicates the validity of the proposed bilingual cluster based models.

View full abstract

Download PDF (3196K)
Omnidirectional Audio-Visual Talker Localization Based on Dynamic Fusion of Audio-Visual Features Using Validity and Reliability Criteria

Yuki DENDA, Takanobu NISHIURA, Yoichi YAMASHITA

Article type: PAPER
Subject area: Applications
2008 Volume E91.D Issue 3 Pages 598-606
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.598

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a robust omnidirectional audio-visual (AV) talker localizer for AV applications. The proposed localizer consists of two innovations. One of them is robust omnidirectional audio and visual features. The direction of arrival (DOA) estimation using an equilateral triangular microphone array, and human position estimation using an omnidirectional video camera extract the AV features. The other is a dynamic fusion of the AV features. The validity criterion, called the audioor visual-localization counter, validates each audio- or visual-feature. The reliability criterion, called the speech arriving evaluator, acts as a dynamic weight to eliminate any prior statistical properties from its fusion procedure. The proposed localizer can compatibly achieve talker localization in a speech activity and user localization in a non-speech activity under the identical fusion rule. Talker localization experiments were conducted in an actual room to evaluate the effectiveness of the proposed localizer. The results confirmed that the talker localization performance of the proposed AV localizer using the validity and reliability criteria is superior to that of conventional localizers.

View full abstract

Download PDF (5504K)
Building an Effective Speech Corpus by Utilizing Statistical Multidimensional Scaling Method

Goshu NAGINO, Makoto SHOZAKAI, Tomoki TODA, Hiroshi SARUWATARI, Kiyohi ...

Article type: PAPER
Subject area: Corpus
2008 Volume E91.D Issue 3 Pages 607-614
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.607

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a technique for building an effective speech corpus with lower cost by utilizing a statistical multidimensional scaling method. The statistical multidimensional scaling method visualizes multiple HMM acoustic models into two-dimensional space. At first, a small number of voice samples per speaker is collected; speaker adapted acoustic models trained with collected utterances, are mapped into two-dimensional space by utilizing the statistical multidimensional scaling method. Next, speakers located in the periphery of the distribution, in a plotted map are selected; a speech corpus is built by collecting enough voice samples for the selected speakers. In an experiment for building an isolated-word speech corpus, the performance of an acoustic model trained with 200 selected speakers was equivalent to that of an acoustic model trained with 533 non-selected speakers. It means that a cost reduction of more than 62% was achieved. In an experiment for building a continuous word speech corpus, the performance of an acoustic model trained with 500 selected speakers was equivalent to that of an acoustic model trained with 1179 non-selected speakers. It means that a cost reduction of more than 57% was achieved.

View full abstract

Download PDF (2918K)
An Improved Greedy Search Algorithm for the Development of a Phonetically Rich Speech Corpus

Jin-Song ZHANG, Satoshi NAKAMURA

Article type: PAPER
Subject area: Corpus
2008 Volume E91.D Issue 3 Pages 615-630
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.615

JOURNAL FREE ACCESS

Show abstractHide abstract

An efficient way to develop large scale speech corpora is to collect phonetically rich ones that have high coverage of phonetic contextual units. The sentence set, usually called as the minimum set, should have small text size in order to reduce the collection cost. It can be selected by a greedy search algorithm from a large mother text corpus. With the inclusion of more and more phonetic contextual effects, the number of different phonetic contextual units increased dramatically, making the search not a trivial issue. In order to improve the search efficiency, we previously proposed a so-called least-to-most-ordered greedy search based on the conventional algorithms. This paper evaluated these algorithms in order to show their different characteristics. The experimental results showed that the least-to-most-ordered methods successfully achieved smaller objective sets at significantly less computation time, when compared with the conventional ones. This algorithm has already been applied to the development a number of speech corpora, including a large scale phonetically rich Chinese speech corpus ATRPTH which played an important role in developing our multi-language translation system.

View full abstract

Download PDF (3742K)
Bi-Spectral Acoustic Features for Robust Speech Recognition

Kazuo ONOE, Shoei SATO, Shinichi HOMMA, Akio KOBAYASHI, Toru IMAI, Toh ...

Article type: LETTER
2008 Volume E91.D Issue 3 Pages 631-634
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.631

JOURNAL FREE ACCESS

Show abstractHide abstract

The extraction of acoustic features for robust speech recognition is very important for improving its performance in realistic environments. The bi-spectrum based on the Fourier transformation of the third-order cumulants expresses the non-Gaussianity and the phase information of the speech signal, showing the dependency between frequency components. In this letter, we propose a method of extracting short-time bispectral acoustic features with averaging features in a single frame. Merged with the conventional Mel frequency cepstral coefficients (MFCC) based on the power spectrum by the principal component analysis (PCA), the proposed features gave a 6.9% relative lower a word error rate in Japanese broadcast news transcription experiments.

View full abstract

Download PDF (2300K)
Local Peak Enhancement for In-Car Speech Recognition in Noisy Environment

Osamu ICHIKAWA, Takashi FUKUDA, Masafumi NISHIMURA

Article type: LETTER
2008 Volume E91.D Issue 3 Pages 635-639
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.635

JOURNAL FREE ACCESS

Show abstractHide abstract

The accuracy of automatic speech recognition in a car is significantly degraded in a very low SNR (Signal to Noise Ratio) situation such as “Fan high” or “Window open”. In such cases, speech signals are often buried in broadband noise. Although several existing noise reduction algorithms are known to improve the accuracy, other approaches that can work with them are still required for further improvement. One of the candidates is enhancement of the harmonic structures in human voices. However, most conventional approaches are based on comb filtering, and it is difficult to use them in practical situations, because their assumptions for F0 detection and for voiced/unvoiced detection are not accurate enough in realistic noisy environments. In this paper, we propose a new approach that does not rely on such detection. An observed power spectrum is directly converted into a filter for speech enhancement, by retaining only the local peaks considered to be harmonic structures in the human voice. In our experiments, this approach reduced the word error rate by 17% in realistic automobile environments. Also, it showed further improvement when used with existing noise reduction methods.

View full abstract

Download PDF (1510K)

Special Section on Test and Verification of VLSIs

FOREWORD

Seiji KAJIHARA, Michiko INOUE

2008 Volume E91.D Issue 3 Pages 640-641
Published: March 01, 2008
Released on J-STAGE: July 01, 2018

DOIhttps://doi.org/10.1587/transinf.2008EDF0003

JOURNAL FREE ACCESS

Download PDF (79K)
A Conservative Framework for Safety-Failure Checking

Frédéric BÉAL, Tomohiro YONEDA, Chris J. MYERS

Article type: PAPER
Subject area: Verification and Timing Analysis
2008 Volume E91.D Issue 3 Pages 642-654
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.642

JOURNAL FREE ACCESS

Show abstractHide abstract

We present a new framework for checking safety failures. The approach is based on the conservative inference of the internal states of a system by the observation of the interaction with its environment. It is based on two similar mechanisms: forward implication, which performs the analysis of the consequences of an input applied to the system, and backward implication, that performs the same task for an output transition. While being a very simple approach, it is general and we believe it can yield efficient algorithms in different safety-failure checking problems. As a case study, we have applied this framework to an existing problem, the hazard checking in (speed-independent) asynchronous circuits. Our new methodology yields an efficient algorithm that performs better or as well as all existing algorithms, while being more general than the fastest one.

View full abstract

Download PDF (2814K)
Timing Analysis Considering Temporal Supply Voltage Fluctuation

Masanori HASHIMOTO, Junji YAMAGUCHI, Takashi SATO, Hidetoshi ONODERA

Article type: PAPER
Subject area: Verification and Timing Analysis
2008 Volume E91.D Issue 3 Pages 655-660
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.655

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes an approach to cope with temporal power/ground voltage fluctuation for static timing analysis. The proposed approach replaces temporal noise with an equivalent power/ground voltage. This replacement reduces complexity that comes from the variety in noise waveform shape, and improves compatibility of power/ground noise aware timing analysis with conventional timing analysis framework. Experimental results show that the proposed approach can compute gate propagation delay considering temporal noise within 10% error in maximum and 0.5% in average.

View full abstract

Download PDF (1971K)
A Method of Locating Open Faults on Incompletely Identified Pass/Fail Information

Koji YAMAZAKI, Yuzo TAKAMATSU

Article type: PAPER
Subject area: Fault Diagnosis
2008 Volume E91.D Issue 3 Pages 661-666
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.661

JOURNAL FREE ACCESS

Show abstractHide abstract

In order to reduce the test cost, built-in self test (BIST) is widely used. One of the serious problems of BIST is that the compacted signature in BIST has very little information for fault diagnosis. Especially, it is difficult to determine which tests detect a fault. Therefore, it is important to develop an efficient fault diagnosis method by using incompletely identified pass/fail information. Where the incompletely identified pass/fail information means that a failing test block consists of at least one failing test and some passing tests, and all of the tests in passing test blocks are the passing test. In this paper, we propose a method to locate open faults by using incompletely identified pass/fail information. Experimental results for ISCAS'85 and ITC'99 benchmark circuits show that the number of candidate faults becomes less than 5 in many cases.

View full abstract

Download PDF (1632K)
A Novel Per-Test Fault Diagnosis Method Based on the Extended X-Fault Model for Deep-Submicron LSI Circuits

Yuta YAMATO, Yusuke NAKAMURA, Kohei MIYASE, Xiaoqing WEN, Seiji KAJIHA ...

Article type: PAPER
Subject area: Fault Diagnosis
2008 Volume E91.D Issue 3 Pages 667-674
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.667

JOURNAL FREE ACCESS

Show abstractHide abstract

Per-test diagnosis based on the X-fault model is an effective approach for a circuit with physical defects of non-deterministic logic behavior. However, the extensive use of vias and buffers in a deep-submicron circuit and the unpredictable order relation among threshold voltages at the fanout branches of a gate have not been fully addressed by conventional per-test X-fault diagnosis. To take these factors into consideration, this paper proposes an improved per-test X-fault diagnosis method, featuring (1) an extended X-fault model to handle vias and buffers and (2) the use of occurrence probabilities of logic behaviors for a physical defect to handle the unpredictable relation among threshold voltages. Experimental results show the effectiveness of the proposed method.

View full abstract

Download PDF (2025K)
Fault Diagnosis on Multiple Fault Models by Using Pass/Fail Information

Yuzo TAKAMATSU, Hiroshi TAKAHASHI, Yoshinobu HIGAMI, Takashi AIKYO, Ko ...

Article type: PAPER
Subject area: Fault Diagnosis
2008 Volume E91.D Issue 3 Pages 675-682
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.675

JOURNAL FREE ACCESS

Show abstractHide abstract

In general, we do not know which fault model can explain the cause of the faulty values at the primary outputs in a circuit under test before starting diagnosis. Moreover, under Built-In Self Test (BIST) environment, it is difficult to know which primary output has a faulty value on the application of a failing test pattern. In this paper, we propose an effective diagnosis method on multiple fault models, based on only pass/fail information on the applied test patterns. The proposed method deduces both the fault model and the fault location based on the number of detections for the single stuck-at fault at each line, by performing single stuck-at fault simulation with both passing and failing test patterns. To improve the ability of fault diagnosis, our method uses the logic values of lines and the condition whether the stuck-at faults at the lines are detected or not by passing and failing test patterns. Experimental results show that our method can accurately identify the fault models (stuck-at fault model, AND/OR bridging fault model, dominance bridging fault model, or open fault model) for 90% faulty circuits and that the faulty sites are located within two candidate faults.

View full abstract

Download PDF (2366K)
On Detection of Bridge Defects with Stuck-at Tests

Kohei MIYASE, Kenta TERASHIMA, Xiaoqing WEN, Seiji KAJIHARA, Sudhakar ...

Article type: PAPER
Subject area: Defect-Based Testing
2008 Volume E91.D Issue 3 Pages 683-689
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.683

JOURNAL FREE ACCESS

Show abstractHide abstract

If a test set for more complex faults than stuck-at faults is generated, higher defect coverage would be obtained. Such a test set, however, would have a large number of test vectors, and hence the test costs would go up. In this paper we propose a method to detect bridge defects with a test set initially generated for stuck-at faults in a full scan sequential circuit. The proposed method doesn't add new test vectors to the test set but modifies test vectors. Therefore there are no negative impacts on test data volume and test application time. The initial fault coverage for stuck-at faults of the test set is guaranteed with modified test vectors. In this paper we focus on detecting as many as possible non-feedback AND-type, OR-type and 4-way bridging faults, respectively. Experimental results show that the proposed method increases the defect coverage.

View full abstract

Download PDF (2033K)
Fault Simulation and Test Generation for Transistor Shorts Using Stuck-at Test Tools

Yoshinobu HIGAMI, Kewal K. SALUJA, Hiroshi TAKAHASHI, Shin-ya KOBAYASH ...

Article type: PAPER
Subject area: Defect-Based Testing
2008 Volume E91.D Issue 3 Pages 690-699
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.690

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents methods for detecting transistor short faults using logic level fault simulation and test generation. The paper considers two types of transistor level faults, namely strong shorts and weak shorts, which were introduced in our previous research. These faults are defined based on the values of outputs of faulty gates. The proposed fault simulation and test generation are performed using gate-level tools designed to deal with stuck-at faults, and no transistor-level tools are required. In the test generation process, a circuit is modified by inserting inverters, and a stuck-at test generator is used. The modification of a circuit does not mean a design-for-testability technique, as the modified circuit is used only during the test generation process. Further, generated test patterns are compacted by fault simulation. Also, since the weak short model involves uncertainty in its behavior, we define fault coverage and fault efficiency in three different way, namely, optimistic, pessimistic and probabilistic and assess them. Finally, experimental results for ISCAS benchmark circuits are used to demonstrate the effectiveness of the proposed methods.

View full abstract

Download PDF (2815K)
Ramp Voltage Testing for Detecting Interconnect Open Faults

Yukiya MIURA

Article type: PAPER
Subject area: Defect-Based Testing
2008 Volume E91.D Issue 3 Pages 700-705
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.700

JOURNAL FREE ACCESS

Show abstractHide abstract

A method for detecting interconnect open faults of CMOS combinational circuits by applying a ramp voltage to the power supply terminal is proposed. The method can assign a known logic value to a fault location automatically by applying a ramp voltage and as a result, it requires only one test vector to detect a fault as a delay fault or an erroneous logic value at primary outputs. In this paper, we show fault detectability and effectiveness of the proposed method by simulation-based and theoretical analysis. We also expose that the method can be applicable to every fault location in a circuit and open faults with any value. Finally, we show ATPG results that are suitable to the proposed method.

View full abstract

Download PDF (1911K)
Study on Expansion of Convolutional Compactors over Galois Field

Masayuki ARAI, Satoshi FUKUMOTO, Kazuhiko IWASAKI

Article type: PAPER
Subject area: Test Compression
2008 Volume E91.D Issue 3 Pages 706-712
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.706

JOURNAL FREE ACCESS

Show abstractHide abstract

Convolutional compactors offer a promising technique of compacting test responses. In this study we expand the architecture of convolutional compactor onto a Galois field in order to improve compaction ratio as well as reduce X-masking probability, namely, the probability that an error is masked by unknown values. While each scan chain is independently connected by EOR gates in the conventional arrangement, the proposed scheme treats q signals as an element over GF(2^q), and the connections are configured on the same field. We show the arrangement of the proposed compactors and the equivalent expression over GF(2). We then evaluate the effectiveness of the proposed expansion in terms of X-masking probability by simulations with uniform distribution of X-values, as well as reduction of hardware overheads. Furthermore, we evaluate a multi-weight arrangement of the proposed compactors for non-uniform X distributions.

View full abstract

Download PDF (2801K)
An Architecture of Embedded Decompressor with Reconfigurability for Test Compression

Hideyuki ICHIHARA, Tomoyuki SAIKI, Tomoo INOUE

Article type: PAPER
Subject area: Test Compression
2008 Volume E91.D Issue 3 Pages 713-719
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.713

JOURNAL FREE ACCESS

Show abstractHide abstract

Test compression/decompression scheme for reducing the test application time and memory requirement of an LSI tester has been proposed. In the scheme, the employed coding algorithms are tailored to a given test data, so that the tailored coding algorithm can highly compress the test data. However, these methods have some drawbacks, e. g., the coding algorithm is ineffective in extra test data except for the given test data. In this paper, we introduce an embedded decompressor that is reconfigurable according to coding algorithms and given test data. Its reconfigurability can overcome the drawbacks of conventional decompressors with keeping high compression ratio. Moreover, we propose an architecture of reconfigurable decompressors for four variable-length codings. In the proposed architecture, the common functions for four codings are implemented as fixed (or non-reconfigurable) components so as to reduce the configuration data, which is stored on an ATE and sent to a CUT. Experimental results show that (1) the configuration data size becomes reasonably small by reducing the configuration part of the decompressor, (2) the reconfigurable decompressor is effective for SoC testing in respect of the test data size, and (3) it can achieve an optimal compression of test data by Huffman coding.

View full abstract

Download PDF (2200K)
Study on Test Data Reduction Combining Illinois Scan and Bit Flipping

Masayuki ARAI, Satoshi FUKUMOTO, Kazuhiko IWASAKI

Article type: PAPER
Subject area: Test Compression
2008 Volume E91.D Issue 3 Pages 720-725
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.720

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a scheme for test data reduction which uses broadcaster along with bit-flipping circuit. The proposed scheme can reduce test data without degrading the fault coverage of ATPG, and without requiring or modifying the arrangement of CUT. We theoretically analyze the test data size by the proposed scheme. The numerical examples obtained by the analysis and experimental results show that our scheme can effectively reduce test data if the care-bit rate is not so much low according to the number of scan chains. We also discuss the hybrid scheme of random-pattern-based flipping and single-input-based flipping.

View full abstract

Download PDF (1261K)
Test Data Compression for Scan-Based BIST Aiming at 100x Compression Rate

Masayuki ARAI, Satoshi FUKUMOTO, Kazuhiko IWASAKI, Tatsuru MATSUO, Tak ...

Article type: PAPER
Subject area: Test Compression
2008 Volume E91.D Issue 3 Pages 726-735
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.726

JOURNAL FREE ACCESS

Show abstractHide abstract

We developed test data compression scheme for scan-based BIST, aiming to compress test stimuli and responses by more than 100 times. As scan-BIST architecture, we adopt BIST-Aided Scan Test (BAST), and combines four techniques: the invert-and-shift operation, run-length compression, scan address partitioning, and LFSR pre-shifting. Our scheme achieved a 100x compression rate in environments where Xs do not occur without reducing the fault coverage of the original ATPG vectors. Furthermore, we enhanced the masking logic to reduce data for X-masking so that test data is still compressed to 1/100 in a practical environment where Xs occur. We applied our scheme to five real VLSI chips, and the technique compressed the test data by 100x for scan-based BIST.

View full abstract

Download PDF (5144K)
Scheduling Power-Constrained Tests through the SoC Functional Bus

Fawnizu Azmadi HUSSIN, Tomokazu YONEDA, Alex ORAILOǦLU, Hideo FUJ ...

Article type: PAPER
Subject area: High-Level Testing
2008 Volume E91.D Issue 3 Pages 736-746
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.736

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a test methodology for core-based testing of System-on-Chips by utilizing the functional bus as a test access mechanism. The functional bus is used as a transportation channel for the test stimuli and responses from a tester to the cores under test (CUT). To enable test concurrency, local test buffers are added to all CUTs. In order to limit the buffer area overhead while minimizing the test application time, we propose a packet-based scheduling algorithm called PAcket Set Scheduling (PASS), which finds the complete packet delivery schedule under a given power constraint. The utilization of test packets, consisting of a small number of bits of test data, for test data delivery allow an efficient sharing of bus bandwidth with the help of an effective buffer-based test architecture. The experimental results show that the methodology is highly effective, especially for smaller bus widths, compared to previous approaches that do not use the functional bus.

View full abstract

Download PDF (3438K)
Test Scheduling for Multi-Clock Domain SoCs under Power Constraint

Tomokazu YONEDA, Kimihiko MASUDA, Hideo FUJIWARA

Article type: PAPER
Subject area: High-Level Testing
2008 Volume E91.D Issue 3 Pages 747-755
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.747

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a power-constrained test scheduling method for multi-clock domain SoCs that consist of cores operating at different clock frequencies during test. In the proposed method, we utilize virtual TAM to solve the frequency gaps between cores and the ATE. Moreover, we present a technique to reduce power consumption of cores during test while the test time of the cores remain the same or increase a little by using virtual TAM. Experimental results show the effectiveness of the proposed method.

View full abstract

Download PDF (3423K)
A Self-Test of Dynamically Reconfigurable Processors with Test Frames

Tomoo INOUE, Takashi FUJII, Hideyuki ICHIHARA

Article type: PAPER
Subject area: High-Level Testing
2008 Volume E91.D Issue 3 Pages 756-762
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.756

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a self-test method of coarse grain dynamically reconfigurable processors (DRPs) without hardware overhead. In the method, processor elements (PEs) compose a test frame, which consists of test pattern generators (TPGs), processor elements under test (PEUTs) and response analyzers (RAs), while testing themselves one another by changing test frames appropriately. We design several test frames with different structures, and discuss the relationship of the structures to the numbers of contexts and test frames for testing all the functions of PEs. A case study shows that there exists an optimal test frame which minimizes the test application time under a constraint.

View full abstract

Download PDF (1714K)
Design for Testability Method to Avoid Error Masking of Software-Based Self-Test for Processors

Masato NAKAZATO, Michiko INOUE, Satoshi OHTAKE, Hideo FUJIWARA

Article type: PAPER
Subject area: High-Level Testing
2008 Volume E91.D Issue 3 Pages 763-770
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.763

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a design for testability method for test programs of software-based self test using test program templates. Software-based self-test using templates has a problem of error masking where some faults detected in a test generation for a module are not detected by the test program synthesized from the test. The proposed method achieves 100% template level fault efficiency, that is, it completely avoids the error masking. Moreover, the proposed method has no performance degradation (adds only observation points) and enables at-speed testing.

View full abstract

Download PDF (2717K)
Post-BIST Fault Diagnosis for Multiple Faults

Hiroshi TAKAHASHI, Yoshinobu HIGAMI, Shuhei KADOYAMA, Yuzo TAKAMATSU, ...

Article type: LETTER
2008 Volume E91.D Issue 3 Pages 771-775
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.771

JOURNAL FREE ACCESS

Show abstractHide abstract

With the increasing complexity of LSI, Built-In Self Test (BIST) is a promising technique for production testing. We herein propose a method for diagnosing multiple stuck-at faults based on the compressed responses from BIST. We refer to fault diagnosis based on the ambiguous test pattern set obtained by the compressed responses of BIST as post-BIST fault diagnosis [1]. In the present paper, we propose an effective method by which to perform post-BIST fault diagnosis for multiple stuck-at faults. The efficiency of the success ratio and the feasibility of diagnosing large circuits are discussed.

View full abstract

Download PDF (784K)
A Secure Test Technique for Pipelined Advanced Encryption Standard

Youhua SHI, Nozomu TOGAWA, Masao YANAGISAWA, Tatsuo OHTSUKI

Article type: LETTER
2008 Volume E91.D Issue 3 Pages 776-780
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.776

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we presented a Design-for-Secure-Test (DFST) technique for pipelined AES to guarantee both the security and the test quality during testing. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits.

View full abstract

Download PDF (891K)

Regular Section

A Randomness Based Analysis on the Data Size Needed for Removing Deceptive Patterns

Kazuya HARAGUCHI, Mutsunori YAGIURA, Endre BOROS, Toshihide IBARAKI

Article type: PAPER
Subject area: Algorithm Theory
2008 Volume E91.D Issue 3 Pages 781-788
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.781

JOURNAL FREE ACCESS

Show abstractHide abstract

We consider a data set in which each example is an n-dimensional Boolean vector labeled as true or false. A pattern is a cooccurrence of a particular value combination of a given subset of the variables. If a pattern appears frequently in the true examples and infrequently in the false examples, we consider it a good pattern. In this paper, we discuss the problem of determining the data size needed for removing “deceptive” good patterns; in a data set of a small size, many good patterns may appear superficially, simply by chance, independently of the underlying structure. Our hypothesis is that, in order to remove such deceptive good patterns, the data set should contain a greater number of examples than that at which a random data set contains few good patterns. We justify this hypothesis by computational studies. We also derive a theoretical upper bound on the needed data size in view of our hypothesis.

View full abstract

Download PDF (1656K)
Near-Optimal Block Alignments

Kuo-Tsung TSENG, Chang-Biau YANG, Kuo-Si HUANG, Yung-Hsing PENG

Article type: PAPER
Subject area: Algorithm Theory
2008 Volume E91.D Issue 3 Pages 789-795
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.789

JOURNAL FREE ACCESS

Show abstractHide abstract

The optimal alignment of two given biosequences is mathematically optimal, but it may not be a biologically optimal one. To investigate more possible alignments with biological meaning, one can relax the scoring functions to get near-optimal alignments. Though the near optimal alignments increase the possibility of finding the correct alignment, they may confuse the biologists because the size of candidates is large. In this paper, we present the filter scheme for the near-optimal alignments. An easy method for tracing the near-optimal alignments and an algorithm for filtering those alignments are proposed. The time complexity of our algorithm is O(dmn) in the worst case, where d is the maximum distance between the near-optimal alignments and the optimal alignment, and m and n are the lengths of the input sequences, respectively.

View full abstract

Download PDF (2093K)
Dynamic Scheduling Real-Time Task Using Primary-Backup Overloading Strategy for Multiprocessor Systems

Wei SUN, Chen YU, Xavier DÉFAGO, Yasushi INOGUCHI

Article type: PAPER
Subject area: Dependable Computing
2008 Volume E91.D Issue 3 Pages 796-806
Published: March 01, 2008
Released on J-STAGE: March 01, 2010

DOIhttps://doi.org/10.1093/ietisy/e91-d.3.796

JOURNAL FREE ACCESS

Show abstractHide abstract

The scheduling of real-time tasks with fault-tolerant requirements has been an important problem in multiprocessor systems. The primary-backup (PB) approach is often used as a fault-tolerant technique to guarantee the deadlines of tasks despite the presence of faults. In this paper we propose a dynamic PB-based task scheduling approach, wherein an allocation parameter is used to search the available time slots for a newly arriving task, and the previously scheduled tasks can be re-scheduled when there is no available time slot for the newly arriving task. In order to improve the schedulability we also propose an overloading strategy for PB-overloading and Backup-backup (BB) overloading. Our proposed task scheduling algorithm is compared with some existing scheduling algorithms in the literature through simulation studies. The results have shown that the task rejection ratio of our real-time task scheduling algorithm is almost 50% lower than the compared algorithms.

View full abstract

Download PDF (4328K)

Register with J-STAGE for free!