IEICE Transactions on Information and Systems

Special Section on Processing Natural Speech Variability for Improved Verbal Human-Computer Interaction

FOREWORD

Takao KOBAYASHI

2010 Volume E93.D Issue 9 Pages 2347
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2347

JOURNAL FREE ACCESS

Download PDF (50K)
Acoustic Model Adaptation for Speech Recognition

Koichi SHINODA

Article type: INVITED PAPER
2010 Volume E93.D Issue 9 Pages 2348-2362
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2348

JOURNAL FREE ACCESS

Show abstractHide abstract

Statistical speech recognition using continuous-density hidden Markov models (CDHMMs) has yielded many practical applications. However, in general, mismatches between the training data and input data significantly degrade recognition accuracy. Various acoustic model adaptation techniques using a few input utterances have been employed to overcome this problem. In this article, we survey these adaptation techniques, including maximum a posteriori (MAP) estimation, maximum likelihood linear regression (MLLR), and eigenvoice. We also present a schematic view called the adaptation pyramid to illustrate how these methods relate to each other.

View full abstract

Download PDF (312K)
Unsupervised Speaker Adaptation Using Speaker-Class Models for Lecture Speech Recognition

Tetsuo KOSAKA, Yuui TAKEDA, Takashi ITO, Masaharu KATO, Masaki KOHDA

Article type: PAPER
Subject area: Adaptation
2010 Volume E93.D Issue 9 Pages 2363-2369
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2363

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a new speaker-class modeling and its adaptation method for the LVCSR system and evaluate the method on the Corpus of Spontaneous Japanese (CSJ). In this method, closer speakers are selected from training speakers and the acoustic models are trained by using their utterances for each evaluation speaker. One of the major issues of the speaker-class model is determining the selection range of speakers. In order to solve the problem, several models which have a variety of speaker range are prepared for each evaluation speaker in advance, and the most proper model is selected on a likelihood basis in the recognition step. In addition, we improved the recognition performance using unsupervised speaker adaptation with the speaker-class models. In the recognition experiments, a significant improvement could be obtained by using the proposed speaker adaptation based on speaker-class models compared with the conventional adaptation method.

View full abstract

Download PDF (394K)
Learning Speech Variability in Discriminative Acoustic Model Adaptation

Shoei SATO, Takahiro OKU, Shinichi HOMMA, Akio KOBAYASHI, Toru IMAI

Article type: PAPER
Subject area: Adaptation
2010 Volume E93.D Issue 9 Pages 2370-2378
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2370

JOURNAL FREE ACCESS

Show abstractHide abstract

We present a new discriminative method of acoustic model adaptation that deals with a task-dependent speech variability. We have focused on differences of expressions or speaking styles between tasks and set the objective of this method as improving the recognition accuracy of indistinctly pronounced phrases dependent on a speaking style.The adaptation appends subword models for frequently observable variants of subwords in the task. To find the task-dependent variants, low-confidence words are statistically selected from words with higher frequency in the task's adaptation data by using their word lattices. HMM parameters of subword models dependent on the words are discriminatively trained by using linear transforms with a minimum phoneme error (MPE) criterion. For the MPE training, subword accuracy discriminating between the variants and the originals is also investigated. In speech recognition experiments, the proposed adaptation with the subword variants reduced the word error rate by 12.0% relative in a Japanese conversational broadcast task.

View full abstract

Download PDF (668K)
A Hybrid Acoustic and Pronunciation Model Adaptation Approach for Non-native Speech Recognition

Yoo Rhee OH, Hong Kook KIM

Article type: PAPER
Subject area: Adaptation
2010 Volume E93.D Issue 9 Pages 2379-2387
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2379

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.

View full abstract

Download PDF (584K)
Adaptation to Pronunciation Variations in Indonesian Spoken Query-Based Information Retrieval

Dessi Puji LESTARI, Sadaoki FURUI

Article type: PAPER
Subject area: Adaptation
2010 Volume E93.D Issue 9 Pages 2388-2396
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2388

JOURNAL FREE ACCESS

Show abstractHide abstract

Recognition errors of proper nouns and foreign words significantly decrease the performance of ASR-based speech applications such as voice dialing systems, speech summarization, spoken document retrieval, and spoken query-based information retrieval (IR). The reason is that proper nouns and words that come from other languages are usually the most important key words. The loss of such words due to misrecognition in turn leads to a loss of significant information from the speech source. This paper focuses on how to improve the performance of Indonesian ASR by alleviating the problem of pronunciation variation of proper nouns and foreign words (English words in particular). To improve the proper noun recognition accuracy, proper-noun specific acoustic models are created by supervised adaptation using maximum likelihood linear regression (MLLR). To improve English word recognition, the pronunciation of English words contained in the lexicon is fixed by using rule-based English-to-Indonesian phoneme mapping. The effectiveness of the proposed method was confirmed through spoken query based Indonesian IR. We used Inference Network-based (IN-based) IR and compared its results with those of the classical Vector Space Model (VSM) IR, both using a tf-idf weighting schema. Experimental results show that IN-based IR outperforms VSM IR.

View full abstract

Download PDF (289K)
Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions

Longbiao WANG, Kazue MINAMI, Kazumasa YAMAMOTO, Seiichi NAKAGAWA

Article type: PAPER
Subject area: Speaker Recognition
2010 Volume E93.D Issue 9 Pages 2397-2406
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2397

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationary/non-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.

View full abstract

Download PDF (858K)
Speech Recognition under Multiple Noise Environment Based on Multi-Mixture HMM and Weight Optimization by the Aspect Model

Seong-Jun HAHM, Yuichi OHKAWA, Masashi ITO, Motoyuki SUZUKI, Akinori I ...

Article type: PAPER
Subject area: Robust Speech Recognition
2010 Volume E93.D Issue 9 Pages 2407-2416
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2407

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose an acoustic model that is robust to multiple noise environments, as well as a method for adapting the acoustic model to an environment to improve the model. The model is called “the multi-mixture model, ” which is based on a mixture of different HMMs each of which is trained using speech under different noise conditions. Speech recognition experiments showed that the proposed model performs better than the conventional multi-condition model. The method for adaptation is based on the aspect model, which is a “mixture-of-mixture” model. To realize adaptation using extremely small amount of adaptation data (i.e., a few seconds), we train a small number of mixture models, which can be interpreted as models for “clusters” of noise environments. Then, the models are mixed using weights, which are determined according to the adaptation data. The experimental results showed that the adaptation based on the aspect model improved the word accuracy in a heavy noise environment and showed no performance deterioration for all noise conditions, while the conventional methods either did not improve the performance or showed both improvement and degradation of recognition performance according to noise conditions.

View full abstract

Download PDF (669K)
Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition

Yanqing SUN, Yu ZHOU, Qingwei ZHAO, Yonghong YAN

Article type: PAPER
Subject area: Robust Speech Recognition
2010 Volume E93.D Issue 9 Pages 2417-2430
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2417

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1kHz and 3kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15dB and 0dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.

View full abstract

Download PDF (1256K)
Enhancing the Robustness of the Posterior-Based Confidence Measures Using Entropy Information for Speech Recognition

Yanqing SUN, Yu ZHOU, Qingwei ZHAO, Pengyuan ZHANG, Fuping PAN, Yongho ...

Article type: PAPER
Subject area: Robust Speech Recognition
2010 Volume E93.D Issue 9 Pages 2431-2439
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2431

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, the robustness of the posterior-based confidence measures is improved by utilizing entropy information, which is calculated for speech-unit-level posteriors using only the best recognition result, without requiring a larger computational load than conventional methods. Using different normalization methods, two posterior-based entropy confidence measures are proposed. Practical details are discussed for two typical levels of hidden Markov model (HMM)-based posterior confidence measures, and both levels are compared in terms of their performances. Experiments show that the entropy information results in significant improvements in the posterior-based confidence measures. The absolute improvements of the out-of-vocabulary (OOV) rejection rate are more than 20% for both the phoneme-level confidence measures and the state-level confidence measures for our embedded test sets, without a significant decline of the in-vocabulary accuracy.

View full abstract

Download PDF (1265K)
Intentional Voice Command Detection for Trigger-Free Speech Interface

Yasunari OBUCHI, Takashi SUMIYOSHI

Article type: PAPER
Subject area: Robust Speech Recognition
2010 Volume E93.D Issue 9 Pages 2440-2450
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2440

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.

View full abstract

Download PDF (834K)
Distant Speech Recognition Using a Microphone Array Network

Alberto Yoshihiro NAKANO, Seiichi NAKAGAWA, Kazumasa YAMAMOTO

Article type: PAPER
Subject area: Microphone Array
2010 Volume E93.D Issue 9 Pages 2451-2462
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2451

JOURNAL FREE ACCESS

Show abstractHide abstract

In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.

View full abstract

Download PDF (451K)
Multiple Sound Source Localization Based on Inter-Channel Correlation Using a Distributed Microphone System in a Real Environment

Kook CHO, Hajime OKUMURA, Takanobu NISHIURA, Yoichi YAMASHITA

Article type: PAPER
Subject area: Microphone Array
2010 Volume E93.D Issue 9 Pages 2463-2471
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2463

JOURNAL FREE ACCESS

Show abstractHide abstract

In real environments, the presence of ambient noise and room reverberations seriously degrades the accuracy in sound source localization. In addition, conventional sound source localization methods cannot localize multiple sound sources accurately in real noisy environments. This paper proposes a new method of multiple sound source localization using a distributed microphone system that is a recording system with multiple microphones dispersed to a wide area. The proposed method localizes a sound source by finding the position that maximizes the accumulated correlation coefficient between multiple channel pairs. After the estimation of the first sound source, a typical pattern of the accumulated correlation for a single sound source is subtracted from the observed distribution of the accumulated correlation. Subsequently, the second sound source is searched again. To evaluate the effectiveness of the proposed method, experiments of two sound source localization were carried out in an office room. The result shows that sound source localization accuracy is about 99.7%. The proposed method could realize the multiple sound source localization robustly and stably.

View full abstract

Download PDF (908K)
Esophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models

Hironori DOI, Keigo NAKAMURA, Tomoki TODA, Hiroshi SARUWATARI, Kiyohir ...

Article type: PAPER
Subject area: Voice Conversion
2010 Volume E93.D Issue 9 Pages 2472-2482
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2472

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices usually sound unnatural compared with normal speech. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech. A spectral parameter and excitation parameters of target normal speech are separately estimated from a spectral parameter of the esophageal speech based on Gaussian mixture models. The experimental results demonstrate that the proposed method yields significant improvements in intelligibility and naturalness. We also apply one-to-many eigenvoice conversion to esophageal speech enhancement to make it possible to flexibly control the voice quality of enhanced speech.

View full abstract

Download PDF (849K)
HMM-Based Voice Conversion Using Quantized F0 Context

Takashi NOSE, Yuhei OTA, Takao KOBAYASHI

Article type: PAPER
Subject area: Voice Conversion
2010 Volume E93.D Issue 9 Pages 2483-2490
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2483

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.

View full abstract

Download PDF (563K)
Improvements of the One-to-Many Eigenvoice Conversion System

Yamato OHTANI, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO

Article type: PAPER
Subject area: Voice Conversion
2010 Volume E93.D Issue 9 Pages 2491-2499
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2491

JOURNAL FREE ACCESS

Show abstractHide abstract

We have developed a one-to-many eigenvoice conversion (EVC) system that allows us to convert a single source speaker's voice into an arbitrary target speaker's voice using an eigenvoice Gaussian mixture model (EV-GMM). This system is capable of effectively building a conversion model for an arbitrary target speaker by adapting the EV-GMM using only a small amount of speech data uttered by the target speaker in a text-independent manner. However, the conversion performance is still insufficient for the following reasons: 1) the excitation signal is not precisely modeled; 2) the oversmoothing of the converted spectrum causes muffled sounds in converted speech; and 3) the conversion model is affected by redundant acoustic variations among a lot of pre-stored target speakers used for building the EV-GMM. In order to address these problems, we apply the following promising techniques to one-to-many EVC: 1) mixed excitation; 2) a conversion algorithm considering global variance; and 3) adaptive training of the EV-GMM. The experimental results demonstrate that the conversion performance of one-to-many EVC is significantly improved by integrating all of these techniques into the one-to-many EVC system.

View full abstract

Download PDF (1095K)

Regular Section

A Low Power SOC Architecture for the V2.0+EDR Bluetooth Using a Unified Verification Platform

Jeonghun KIM, Suki KIM, Kwang-Hyun BAEK

Article type: PAPER
Subject area: Computer System
2010 Volume E93.D Issue 9 Pages 2500-2508
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2500

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a low-power System on Chip (SOC) architecture for the v2.0+EDR (Enhanced Data Rate) Bluetooth and its applications. Our design includes a link controller, modem, RF transceiver, Sub-Band Codec (SBC), Expanded Instruction Set Computer (ESIC) processor, and peripherals. To decrease power consumption of the proposed SOC, we reduce data transfer using a dual-port memory, including a power management unit, and a clock gated approach. We also address some of issues and benefits of reusable and unified environment on a centralized data structure and SOC verification platform. This includes flexibility in meeting the final requirements using technology-independent tools wherever possible in various processes and for projects. The other aims of this work are to minimize design efforts by avoiding the same work done twice by different people and to reuse the similar environment and platform for different projects. This chip occupies a die size of 30mm² in 0.18µm CMOS, and the worst-case current of the total chip is 54mA.

View full abstract

Download PDF (600K)
Nested Interrupt Analysis of Low Cost and High Performance Embedded Systems Using GSPN Framework

Cheng-Min LIN

Article type: PAPER
Subject area: Software System
2010 Volume E93.D Issue 9 Pages 2509-2519
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2509

JOURNAL FREE ACCESS

Show abstractHide abstract

Interrupt service routines are a key technology for embedded systems. In this paper, we introduce the standard approach for using Generalized Stochastic Petri Nets (GSPNs) as a high-level model for generating CTMC Continuous-Time Markov Chains (CTMCs) and then use Markov Reward Models (MRMs) to compute the performance for embedded systems. This framework is employed to analyze two embedded controllers with low cost and high performance, ARM7 and Cortex-M3. Cortex-M3 is designed with a tail-chaining mechanism to improve the performance of ARM7 when a nested interrupt occurs on an embedded controller. The Platform Independent Petri net Editor 2 (PIPE2) tool is used to model and evaluate the controllers in terms of power consumption and interrupt overhead performance. Using numerical results, in spite of the power consumption or interrupt overhead, Cortex-M3 performs better than ARM7.

View full abstract

Download PDF (809K)
MV-OPES: Multivalued-Order Preserving Encryption Scheme: A Novel Scheme for Encrypting Integer Value to Many Different Values

Hasan KADHEM, Toshiyuki AMAGASA, Hiroyuki KITAGAWA

Article type: PAPER
Subject area: Data Engineering, Web Information Systems
2010 Volume E93.D Issue 9 Pages 2520-2533
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2520

JOURNAL FREE ACCESS

Show abstractHide abstract

Encryption can provide strong security for sensitive data against inside and outside attacks. This is especially true in the “Database as Service” model, where confidentiality and privacy are important issues for the client. In fact, existing encryption approaches are vulnerable to a statistical attack because each value is encrypted to another fixed value. This paper presents a novel database encryption scheme called MV-OPES (Multivalued — Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level. Our idea is to encrypt a value to different multiple values to prevent statistical attacks. At the same time, MV-OPES preserves the order of the integer values to allow comparison operations to be directly applied on encrypted data. Using calculated distance (range), we propose a novel method that allows a join query between relations based on inequality over encrypted values. We also present techniques to offload query execution load to a database server as much as possible, thereby making a better use of server resources in a database outsourcing environment. Our scheme can easily be integrated with current database systems as it is designed to work with existing indexing structures. It is robust against statistical attack and the estimation of true values. MV-OPES experiments show that security for sensitive data can be achieved with reasonable overhead, establishing the practicability of the scheme.

View full abstract

Download PDF (852K)
DIWSAN: Distributed Intelligent Wireless Sensor and Actuator Network for Heterogeneous Environment

Cheng-Min LIN, Jyh-Horng LIN, Jen-Cheng CHIU

Article type: PAPER
Subject area: Information Network
2010 Volume E93.D Issue 9 Pages 2534-2543
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2534

JOURNAL FREE ACCESS

Show abstractHide abstract

In a WSAN (Wireless Sensor and Actuator Network), most resources, including sensors and actuators, are designed for certain applications in a dedicated environment. Many researchers have proposed to use of gateways to infer and annotate heterogeneous data; however, such centralized methods produce a bottlenecking network and computation overhead on the gateways that causes longer response time in activity processing, worsening performance. This work proposes two distribution inference mechanisms: regionalized and sequential inference mechanisms to reduce the response time in activity processing. Finally, experimental results for the proposed inference mechanisms are presented, and it shows that our mechanisms outperform the traditional centralized inference mechanism.

View full abstract

Download PDF (1942K)
A Comparative Study of Unsupervised Anomaly Detection Techniques Using Honeypot Data

Jungsuk SONG, Hiroki TAKAKURA, Yasuo OKABE, Daisuke INOUE, Masashi ETO ...

Article type: PAPER
Subject area: Information Network
2010 Volume E93.D Issue 9 Pages 2544-2554
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2544

JOURNAL FREE ACCESS

Show abstractHide abstract

Intrusion Detection Systems (IDS) have been received considerable attention among the network security researchers as one of the most promising countermeasures to defend our crucial computer systems or networks against attackers on the Internet. Over the past few years, many machine learning techniques have been applied to IDSs so as to improve their performance and to construct them with low cost and effort. Especially, unsupervised anomaly detection techniques have a significant advantage in their capability to identify unforeseen attacks, i.e., 0-day attacks, and to build intrusion detection models without any labeled (i.e., pre-classified) training data in an automated manner. In this paper, we conduct a set of experiments to evaluate and analyze performance of the major unsupervised anomaly detection techniques using real traffic data which are obtained at our honeypots deployed inside and outside of the campus network of Kyoto University, and using various evaluation criteria, i.e., performance evaluation by similarity measurements and the size of training data, overall performance, detection ability for unknown attacks, and time complexity. Our experimental results give some practical and useful guidelines to IDS researchers and operators, so that they can acquire insight to apply these techniques to the area of intrusion detection, and devise more effective intrusion detection models.

View full abstract

Download PDF (1225K)
Least Absolute Policy Iteration — A Robust Approach to Value Function Approximation

Masashi SUGIYAMA, Hirotaka HACHIYA, Hisashi KASHIMA, Tetsuro MORIMURA

Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2010 Volume E93.D Issue 9 Pages 2555-2565
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2555

JOURNAL FREE ACCESS

Show abstractHide abstract

Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.

View full abstract

Download PDF (2186K)
Structural Models that Manage IT Portfolio Affecting Business Value of Enterprise Architecture

Takaaki KAMOGAWA

Article type: PAPER
Subject area: Office Information Systems, e-Business Modeling
2010 Volume E93.D Issue 9 Pages 2566-2576
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2566

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper examines the structural relationships between Information Technology (IT) governance and Enterprise Architecture (EA), with the objective of enhancing business value in the enterprise society. Structural models consisting of four related hypotheses reveal the relationship between IT governance and EA in the improvement of business values. We statistically examined the hypotheses by analyzing validated questionnaire items from respondents within firms listed on the Japanese stock exchange who were qualified to answer them. We concluded that firms which have organizational ability controlled by IT governance are more likely to deliver business value based on IT portfolio management.

View full abstract

Download PDF (177K)
Color Independent Components Based SIFT Descriptors for Object/Scene Classification

Dan-ni AI, Xian-hua HAN, Xiang RUAN, Yen-wei CHEN

Article type: PAPER
Subject area: Pattern Recognition
2010 Volume E93.D Issue 9 Pages 2577-2586
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2577

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we present a novel color independent components based SIFT descriptor (termed CIC-SIFT) for object/scene classification. We first learn an efficient color transformation matrix based on independent component analysis (ICA), which is adaptive to each category in a database. The ICA-based color transformation can enhance contrast between the objects and the background in an image. Then we compute CIC-SIFT descriptors over all three transformed color independent components. Since the ICA-based color transformation can boost the objects and suppress the background, the proposed CIC-SIFT can extract more effective and discriminative local features for object/scene classification. The comparison is performed among seven SIFT descriptors, and the experimental classification results show that our proposed CIC-SIFT is superior to other conventional SIFT descriptors.

View full abstract

Download PDF (1275K)
Position-Invariant Robust Features for Long-Term Recognition of Dynamic Outdoor Scenes

Aram KAWEWONG, Sirinart TANGRUAMSUB, Osamu HASEGAWA

Article type: PAPER
Subject area: Image Recognition, Computer Vision
2010 Volume E93.D Issue 9 Pages 2587-2601
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2587

JOURNAL FREE ACCESS

Show abstractHide abstract

A novel Position-Invariant Robust Feature, designated as PIRF, is presented to address the problem of highly dynamic scene recognition. The PIRF is obtained by identifying existing local features (i.e. SIFT) that have a wide baseline visibility within a place (one place contains more than one sequential images). These wide-baseline visible features are then represented as a single PIRF, which is computed as an average of all descriptors associated with the PIRF. Particularly, PIRFs are robust against highly dynamical changes in scene: a single PIRF can be matched correctly against many features from many dynamical images. This paper also describes an approach to using these features for scene recognition. Recognition proceeds by matching an individual PIRF to a set of features from test images, with subsequent majority voting to identify a place with the highest matched PIRF. The PIRF system is trained and tested on 2000+ outdoor omnidirectional images and on COLD datasets. Despite its simplicity, PIRF offers a markedly better rate of recognition for dynamic outdoor scenes (ca. 90%) than the use of other features. Additionally, a robot navigation system based on PIRF (PIRF-Nav) can outperform other incremental topological mapping methods in terms of time (70% less) and memory. The number of PIRFs can be reduced further to reduce the time while retaining high accuracy, which makes it suitable for long-term recognition and localization.

View full abstract

Download PDF (2664K)
Computing Spatio-Temporal Multiple View Geometry from Mutual Projections of Multiple Cameras

Cheng WAN, Jun SATO

Article type: PAPER
Subject area: Image Recognition, Computer Vision
2010 Volume E93.D Issue 9 Pages 2602-2613
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2602

JOURNAL FREE ACCESS

Show abstractHide abstract

The spatio-temporal multiple view geometry can represent the geometry of multiple images in the case where non-rigid arbitrary motions are viewed from multiple translational cameras. However, it requires many corresponding points and is sensitive to the image noise. In this paper, we investigate mutual projections of cameras in four-dimensional space and show that it enables us to reduce the number of corresponding points required for computing the spatio-temporal multiple view geometry. Surprisingly, take three views for instance, we no longer need any corresponding point to calculate the spatio-temporal multiple view geometry, if all the cameras are projected to the other cameras mutually for two time intervals. We also show that the stability of the computation of spatio-temporal multiple view geometry is drastically improved by considering the mutual projections of cameras.

View full abstract

Download PDF (1322K)
Multi-Objective Genetic Programming with Redundancy-Regulations for Automatic Construction of Image Feature Extractors

Ukrit WATCHAREERUETAI, Tetsuya MATSUMOTO, Yoshinori TAKEUCHI, Hiroaki ...

Article type: PAPER
Subject area: Biocybernetics, Neurocomputing
2010 Volume E93.D Issue 9 Pages 2614-2625
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2614

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a new multi-objective genetic programming (MOGP) for automatic construction of image feature extraction programs (FEPs). The proposed method was originated from a well known multi-objective evolutionary algorithm (MOEA), i.e., NSGA-II. The key differences are that redundancy-regulation mechanisms are applied in three main processes of the MOGP, i.e., population truncation, sampling, and offspring generation, to improve population diversity as well as convergence rate. Experimental results indicate that the proposed MOGP-based FEP construction system outperforms the two conventional MOEAs (i.e., NSGA-II and SPEA2) for a test problem. Moreover, we compared the programs constructed by the proposed MOGP with four human-designed object recognition programs. The results show that the constructed programs are better than two human-designed methods and are comparable with the other two human-designed methods for the test problem.

View full abstract

Download PDF (1042K)
Signal and Noise Covariance Estimation Based on ICA for High-Resolution Cortical Dipole Imaging

Junichi HORI, Kentarou SUNAGA, Satoru WATANABE

Article type: PAPER
Subject area: Biological Engineering
2010 Volume E93.D Issue 9 Pages 2626-2634
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2626

JOURNAL FREE ACCESS

Show abstractHide abstract

We investigated suitable spatial inverse filters for cortical dipole imaging from the scalp electroencephalogram (EEG). The effects of incorporating statistical information of signal and noise into inverse procedures were examined by computer simulations and experimental studies. The parametric projection filter (PPF) and parametric Wiener filter (PWF) were applied to an inhomogeneous three-sphere volume conductor head model. The noise covariance matrix was estimated by applying independent component analysis (ICA) to scalp potentials. The present simulation results suggest that the PPF and the PWF provided excellent performance when the noise covariance was estimated from the differential noise between EEG and the separated signal using ICA and the signal covariance was estimated from the separated signal. Moreover, the spatial resolution of the cortical dipole imaging was improved while the influence of noise was suppressed by including the differential noise at the instant of the imaging and by adjusting the duration of noise sample according to the signal to noise ratio. We applied the proposed imaging technique to human experimental data of visual evoked potential and obtained reasonable results that coincide to physiological knowledge.

View full abstract

Download PDF (705K)
DDR3 SDRAM with a Complete Predictor

Vladimir V. STANKOVIC, Nebojsa Z. MILENKOVIC

Article type: LETTER
Subject area: Computer System
2010 Volume E93.D Issue 9 Pages 2635-2638
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2635

JOURNAL FREE ACCESS

Show abstractHide abstract

In the arsenal of resources for improving computer memory system performance, predictors have gained an increasing role in the past few years. They enable hiding the latencies when accessing cache or main memory. In our previous work we proposed a DDR SDRAM controller with predictors that not only close the opened DRAM row but also predict the next row to be opened. In this paper we explore the possibilities of trying the same techniques on the latest type of DRAM memory, DDR3 SDRAM, with further improvements of the predictors.

View full abstract

Download PDF (187K)
Indexing of Tagged Moving Objects over Localized Trajectory with Time Intervals in RFID Systems

Jongwan KIM, Dukshin OH, Keecheon KIM

Article type: LETTER
Subject area: Data Engineering, Web Information Systems
2010 Volume E93.D Issue 9 Pages 2639-2642
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2639

JOURNAL FREE ACCESS

Show abstractHide abstract

Since a radio frequency identification (RFID) transponder (tag) generates both location and time information when it enters and leaves a reader, the trajectory of a moving, tagged object can be traced. Due to the time intervals between entries to successive readers, during which tags are not tracked, accurate tracing of complete trajectories can be difficult. To overcome this problem, we propose a tag trajectory indexing scheme called TR-tree (R-tree-based tag trajectory index) that can trace tags by combining the local trajectories at each reader. In experiments, this scheme showed superior performance compared with other indices.

View full abstract

Download PDF (635K)
Sexual Dimorphism Analysis and Gender Classification in 3D Human Face

Yuan HU, Li LU, Jingqi YAN, Zhi LIU, Pengfei SHI

Article type: LETTER
Subject area: Pattern Recognition
2010 Volume E93.D Issue 9 Pages 2643-2646
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2643

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we present the sexual dimorphism analysis in 3D human face and perform gender classification based on the result of sexual dimorphism analysis. Four types of features are extracted from a 3D human-face image. By using statistical methods, the existence of sexual dimorphism is demonstrated in 3D human face based on these features. The contributions of each feature to sexual dimorphism are quantified according to a novel criterion. The best gender classification rate is 94% by using SVMs and Matcher Weighting fusion method.This research adds to the knowledge of 3D faces in sexual dimorphism and affords a foundation that could be used to distinguish between male and female in 3D faces.

View full abstract

Download PDF (628K)
Hellinger Distance-Based Parameter Tuning for ε-Filter

Noriaki SUETAKE, Go TANAKA, Hayato HASHII, Eiji UCHINO

Article type: LETTER
Subject area: Image Processing and Video Processing
2010 Volume E93.D Issue 9 Pages 2647-2650
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2647

JOURNAL FREE ACCESS

Show abstractHide abstract

In this letter, we propose a new tuning method of ε value, which is a parameter in the ε-filter, using a metric between signal distributions, i.e., Hellinger distance. The difference between the input and output signals is evaluated using Hellinger distance and used for the parameter tuning in the proposed method.

View full abstract

Download PDF (542K)
Commercial Shot Classification Based on Multiple Features Combination

Nan LIU, Yao ZHAO, Zhenfeng ZHU, Rongrong NI

Article type: LETTER
Subject area: Image Processing and Video Processing
2010 Volume E93.D Issue 9 Pages 2651-2655
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2651

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a commercial shot classification scheme combining well-designed visual and textual features to automatically detect TV commercials. To identify the inherent difference between commercials and general programs, a special mid-level textual descriptor is proposed, aiming to capture the spatio-temporal properties of the video texts typical of commercials. In addition, we introduce an ensemble-learning based combination method, named Co-AdaBoost, to interactively exploit the intrinsic relations between the visual and textual features employed.

View full abstract

Download PDF (943K)
Feature Interaction Descriptor for Pedestrian Detection

Hui CAO, Koichiro YAMAGUCHI, Mitsuhiko OHTA, Takashi NAITO, Yoshiki NI ...

Article type: LETTER
Subject area: Image Recognition, Computer Vision
2010 Volume E93.D Issue 9 Pages 2656-2659
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2656

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a novel representation called Feature Interaction Descriptor (FIND) to capture high-level properties of object appearance by computing pairwise interactions of adjacent region-level features. In order to deal with pedestrian detection task, we employ localized oriented gradient histograms as region-level features and measure interactions between adjacent histogram elements with a suitable histogram-similarity function. The experimental results show that our descriptor improves upon HOG significantly and outperforms related high-level features such as GLAC and CoHOG.

View full abstract

Download PDF (320K)
Correcting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar

Yoshihide KATO, Shigeki MATSUBARA

Article type: LETTER
Subject area: Natural Language Processing
2010 Volume E93.D Issue 9 Pages 2660-2663
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2660

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.

View full abstract

Download PDF (232K)

Errata

Erratum: Improving Automatic English Writing Assessment Using Regression Trees and Error-Weighting [IEICE Transactions on Information and Systems E93.D (2010) , No. 8 pp.2281-2290]

Kong-Joo LEE, Jee-Eun KIM

2010 Volume E93.D Issue 9 Pages 2669_e1
Published: September 01, 2010
Released on J-STAGE: September 01, 2010

DOIhttps://doi.org/10.1587/transinf.E93.D.2669_e1

JOURNAL FREE ACCESS

Download PDF (430K)

Register with J-STAGE for free!