IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Volume E93.D, Issue 9
Displaying 1-36 of 36 articles from this issue
Special Section on Processing Natural Speech Variability for Improved Verbal Human-Computer Interaction
  • Takao KOBAYASHI
    2010 Volume E93.D Issue 9 Pages 2347
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    Download PDF (50K)
  • Koichi SHINODA
    Article type: INVITED PAPER
    2010 Volume E93.D Issue 9 Pages 2348-2362
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    Statistical speech recognition using continuous-density hidden Markov models (CDHMMs) has yielded many practical applications. However, in general, mismatches between the training data and input data significantly degrade recognition accuracy. Various acoustic model adaptation techniques using a few input utterances have been employed to overcome this problem. In this article, we survey these adaptation techniques, including maximum a posteriori (MAP) estimation, maximum likelihood linear regression (MLLR), and eigenvoice. We also present a schematic view called the adaptation pyramid to illustrate how these methods relate to each other.
    Download PDF (312K)
  • Tetsuo KOSAKA, Yuui TAKEDA, Takashi ITO, Masaharu KATO, Masaki KOHDA
    Article type: PAPER
    Subject area: Adaptation
    2010 Volume E93.D Issue 9 Pages 2363-2369
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In this paper, we propose a new speaker-class modeling and its adaptation method for the LVCSR system and evaluate the method on the Corpus of Spontaneous Japanese (CSJ). In this method, closer speakers are selected from training speakers and the acoustic models are trained by using their utterances for each evaluation speaker. One of the major issues of the speaker-class model is determining the selection range of speakers. In order to solve the problem, several models which have a variety of speaker range are prepared for each evaluation speaker in advance, and the most proper model is selected on a likelihood basis in the recognition step. In addition, we improved the recognition performance using unsupervised speaker adaptation with the speaker-class models. In the recognition experiments, a significant improvement could be obtained by using the proposed speaker adaptation based on speaker-class models compared with the conventional adaptation method.
    Download PDF (394K)
  • Shoei SATO, Takahiro OKU, Shinichi HOMMA, Akio KOBAYASHI, Toru IMAI
    Article type: PAPER
    Subject area: Adaptation
    2010 Volume E93.D Issue 9 Pages 2370-2378
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    We present a new discriminative method of acoustic model adaptation that deals with a task-dependent speech variability. We have focused on differences of expressions or speaking styles between tasks and set the objective of this method as improving the recognition accuracy of indistinctly pronounced phrases dependent on a speaking style.The adaptation appends subword models for frequently observable variants of subwords in the task. To find the task-dependent variants, low-confidence words are statistically selected from words with higher frequency in the task's adaptation data by using their word lattices. HMM parameters of subword models dependent on the words are discriminatively trained by using linear transforms with a minimum phoneme error (MPE) criterion. For the MPE training, subword accuracy discriminating between the variants and the originals is also investigated. In speech recognition experiments, the proposed adaptation with the subword variants reduced the word error rate by 12.0% relative in a Japanese conversational broadcast task.
    Download PDF (668K)
  • Yoo Rhee OH, Hong Kook KIM
    Article type: PAPER
    Subject area: Adaptation
    2010 Volume E93.D Issue 9 Pages 2379-2387
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In this paper, we propose a hybrid model adaptation approach in which pronunciation and acoustic models are adapted by incorporating the pronunciation and acoustic variabilities of non-native speech in order to improve the performance of non-native automatic speech recognition (ASR). Specifically, the proposed hybrid model adaptation can be performed at either the state-tying or triphone-modeling level, depending at which acoustic model adaptation is performed. In both methods, we first analyze the pronunciation variant rules of non-native speakers and then classify each rule as either a pronunciation variant or an acoustic variant. The state-tying level hybrid method then adapts pronunciation models and acoustic models by accommodating the pronunciation variants in the pronunciation dictionary and by clustering the states of triphone acoustic models using the acoustic variants, respectively. On the other hand, the triphone-modeling level hybrid method initially adapts pronunciation models in the same way as in the state-tying level hybrid method; however, for the acoustic model adaptation, the triphone acoustic models are then re-estimated based on the adapted pronunciation models and the states of the re-estimated triphone acoustic models are clustered using the acoustic variants. From the Korean-spoken English speech recognition experiments, it is shown that ASR systems employing the state-tying and triphone-modeling level adaptation methods can relatively reduce the average word error rates (WERs) by 17.1% and 22.1% for non-native speech, respectively, when compared to a baseline ASR system.
    Download PDF (584K)
  • Dessi Puji LESTARI, Sadaoki FURUI
    Article type: PAPER
    Subject area: Adaptation
    2010 Volume E93.D Issue 9 Pages 2388-2396
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    Recognition errors of proper nouns and foreign words significantly decrease the performance of ASR-based speech applications such as voice dialing systems, speech summarization, spoken document retrieval, and spoken query-based information retrieval (IR). The reason is that proper nouns and words that come from other languages are usually the most important key words. The loss of such words due to misrecognition in turn leads to a loss of significant information from the speech source. This paper focuses on how to improve the performance of Indonesian ASR by alleviating the problem of pronunciation variation of proper nouns and foreign words (English words in particular). To improve the proper noun recognition accuracy, proper-noun specific acoustic models are created by supervised adaptation using maximum likelihood linear regression (MLLR). To improve English word recognition, the pronunciation of English words contained in the lexicon is fixed by using rule-based English-to-Indonesian phoneme mapping. The effectiveness of the proposed method was confirmed through spoken query based Indonesian IR. We used Inference Network-based (IN-based) IR and compared its results with those of the classical Vector Space Model (VSM) IR, both using a tf-idf weighting schema. Experimental results show that IN-based IR outperforms VSM IR.
    Download PDF (289K)
  • Longbiao WANG, Kazue MINAMI, Kazumasa YAMAMOTO, Seiichi NAKAGAWA
    Article type: PAPER
    Subject area: Speaker Recognition
    2010 Volume E93.D Issue 9 Pages 2397-2406
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationary/non-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.
    Download PDF (858K)
  • Seong-Jun HAHM, Yuichi OHKAWA, Masashi ITO, Motoyuki SUZUKI, Akinori I ...
    Article type: PAPER
    Subject area: Robust Speech Recognition
    2010 Volume E93.D Issue 9 Pages 2407-2416
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In this paper, we propose an acoustic model that is robust to multiple noise environments, as well as a method for adapting the acoustic model to an environment to improve the model. The model is called “the multi-mixture model, ” which is based on a mixture of different HMMs each of which is trained using speech under different noise conditions. Speech recognition experiments showed that the proposed model performs better than the conventional multi-condition model. The method for adaptation is based on the aspect model, which is a “mixture-of-mixture” model. To realize adaptation using extremely small amount of adaptation data (i.e., a few seconds), we train a small number of mixture models, which can be interpreted as models for “clusters” of noise environments. Then, the models are mixed using weights, which are determined according to the adaptation data. The experimental results showed that the adaptation based on the aspect model improved the word accuracy in a heavy noise environment and showed no performance deterioration for all noise conditions, while the conventional methods either did not improve the performance or showed both improvement and degradation of recognition performance according to noise conditions.
    Download PDF (669K)
  • Yanqing SUN, Yu ZHOU, Qingwei ZHAO, Yonghong YAN
    Article type: PAPER
    Subject area: Robust Speech Recognition
    2010 Volume E93.D Issue 9 Pages 2417-2430
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    This paper focuses on the problem of performance degradation in mismatched speech recognition. The F-Ratio analysis method is utilized to analyze the significance of different frequency bands for speech unit classification, and we find that frequencies around 1kHz and 3kHz, which are the upper bounds of the first and the second formants for most of the vowels, should be emphasized in comparison to the Mel-frequency cepstral coefficients (MFCC). The analysis result is further observed to be stable in several typical mismatched situations. Similar to the Mel-Frequency scale, another frequency scale called the F-Ratio-scale is thus proposed to optimize the filter bank design for the MFCC features, and make each subband contains equal significance for speech unit classification. Under comparable conditions, with the modified features we get a relative 43.20% decrease compared with the MFCC in sentence error rate for the emotion affected speech recognition, 35.54%, 23.03% for the noisy speech recognition at 15dB and 0dB SNR (signal to noise ratio) respectively, and 64.50% for the three years' 863 test data. The application of the F-Ratio analysis on the clean training set of the Aurora2 database demonstrates its robustness over languages, texts and sampling rates.
    Download PDF (1256K)
  • Yanqing SUN, Yu ZHOU, Qingwei ZHAO, Pengyuan ZHANG, Fuping PAN, Yongho ...
    Article type: PAPER
    Subject area: Robust Speech Recognition
    2010 Volume E93.D Issue 9 Pages 2431-2439
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In this paper, the robustness of the posterior-based confidence measures is improved by utilizing entropy information, which is calculated for speech-unit-level posteriors using only the best recognition result, without requiring a larger computational load than conventional methods. Using different normalization methods, two posterior-based entropy confidence measures are proposed. Practical details are discussed for two typical levels of hidden Markov model (HMM)-based posterior confidence measures, and both levels are compared in terms of their performances. Experiments show that the entropy information results in significant improvements in the posterior-based confidence measures. The absolute improvements of the out-of-vocabulary (OOV) rejection rate are more than 20% for both the phoneme-level confidence measures and the state-level confidence measures for our embedded test sets, without a significant decline of the in-vocabulary accuracy.
    Download PDF (1265K)
  • Yasunari OBUCHI, Takashi SUMIYOSHI
    Article type: PAPER
    Subject area: Robust Speech Recognition
    2010 Volume E93.D Issue 9 Pages 2440-2450
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In this paper we introduce a new framework of audio processing, which is essential to achieve a trigger-free speech interface for home appliances. If the speech interface works continually in real environments, it must extract occasional voice commands and reject everything else. It is extremely important to reduce the number of false alarms because the number of irrelevant inputs is much larger than the number of voice commands even for heavy users of appliances. The framework, called Intentional Voice Command Detection, is based on voice activity detection, but enhanced by various speech/audio processing techniques such as emotion recognition. The effectiveness of the proposed framework is evaluated using a newly-collected large-scale corpus. The advantages of combining various features were tested and confirmed, and the simple LDA-based classifier demonstrated acceptable performance. The effectiveness of various methods of user adaptation is also discussed.
    Download PDF (834K)
  • Alberto Yoshihiro NAKANO, Seiichi NAKAGAWA, Kazumasa YAMAMOTO
    Article type: PAPER
    Subject area: Microphone Array
    2010 Volume E93.D Issue 9 Pages 2451-2462
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In this work, spatial information consisting of the position and orientation angle of an acoustic source is estimated by an artificial neural network (ANN). The estimated position of a speaker in an enclosed space is used to refine the estimated time delays for a delay-and-sum beamformer, thus enhancing the output signal. On the other hand, the orientation angle is used to restrict the lexicon used in the recognition phase, assuming that the speaker faces a particular direction while speaking. To compensate the effect of the transmission channel inside a short frame analysis window, a new cepstral mean normalization (CMN) method based on a Gaussian mixture model (GMM) is investigated and shows better performance than the conventional CMN for short utterances. The performance of the proposed method is evaluated through Japanese digit/command recognition experiments.
    Download PDF (451K)
  • Kook CHO, Hajime OKUMURA, Takanobu NISHIURA, Yoichi YAMASHITA
    Article type: PAPER
    Subject area: Microphone Array
    2010 Volume E93.D Issue 9 Pages 2463-2471
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In real environments, the presence of ambient noise and room reverberations seriously degrades the accuracy in sound source localization. In addition, conventional sound source localization methods cannot localize multiple sound sources accurately in real noisy environments. This paper proposes a new method of multiple sound source localization using a distributed microphone system that is a recording system with multiple microphones dispersed to a wide area. The proposed method localizes a sound source by finding the position that maximizes the accumulated correlation coefficient between multiple channel pairs. After the estimation of the first sound source, a typical pattern of the accumulated correlation for a single sound source is subtracted from the observed distribution of the accumulated correlation. Subsequently, the second sound source is searched again. To evaluate the effectiveness of the proposed method, experiments of two sound source localization were carried out in an office room. The result shows that sound source localization accuracy is about 99.7%. The proposed method could realize the multiple sound source localization robustly and stably.
    Download PDF (908K)
  • Hironori DOI, Keigo NAKAMURA, Tomoki TODA, Hiroshi SARUWATARI, Kiyohir ...
    Article type: PAPER
    Subject area: Voice Conversion
    2010 Volume E93.D Issue 9 Pages 2472-2482
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn't require any external devices, generated voices usually sound unnatural compared with normal speech. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech. A spectral parameter and excitation parameters of target normal speech are separately estimated from a spectral parameter of the esophageal speech based on Gaussian mixture models. The experimental results demonstrate that the proposed method yields significant improvements in intelligibility and naturalness. We also apply one-to-many eigenvoice conversion to esophageal speech enhancement to make it possible to flexibly control the voice quality of enhanced speech.
    Download PDF (849K)
  • Takashi NOSE, Yuhei OTA, Takao KOBAYASHI
    Article type: PAPER
    Subject area: Voice Conversion
    2010 Volume E93.D Issue 9 Pages 2483-2490
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.
    Download PDF (563K)
  • Yamato OHTANI, Tomoki TODA, Hiroshi SARUWATARI, Kiyohiro SHIKANO
    Article type: PAPER
    Subject area: Voice Conversion
    2010 Volume E93.D Issue 9 Pages 2491-2499
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    We have developed a one-to-many eigenvoice conversion (EVC) system that allows us to convert a single source speaker's voice into an arbitrary target speaker's voice using an eigenvoice Gaussian mixture model (EV-GMM). This system is capable of effectively building a conversion model for an arbitrary target speaker by adapting the EV-GMM using only a small amount of speech data uttered by the target speaker in a text-independent manner. However, the conversion performance is still insufficient for the following reasons: 1) the excitation signal is not precisely modeled; 2) the oversmoothing of the converted spectrum causes muffled sounds in converted speech; and 3) the conversion model is affected by redundant acoustic variations among a lot of pre-stored target speakers used for building the EV-GMM. In order to address these problems, we apply the following promising techniques to one-to-many EVC: 1) mixed excitation; 2) a conversion algorithm considering global variance; and 3) adaptive training of the EV-GMM. The experimental results demonstrate that the conversion performance of one-to-many EVC is significantly improved by integrating all of these techniques into the one-to-many EVC system.
    Download PDF (1095K)
Regular Section
  • Jeonghun KIM, Suki KIM, Kwang-Hyun BAEK
    Article type: PAPER
    Subject area: Computer System
    2010 Volume E93.D Issue 9 Pages 2500-2508
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    This paper presents a low-power System on Chip (SOC) architecture for the v2.0+EDR (Enhanced Data Rate) Bluetooth and its applications. Our design includes a link controller, modem, RF transceiver, Sub-Band Codec (SBC), Expanded Instruction Set Computer (ESIC) processor, and peripherals. To decrease power consumption of the proposed SOC, we reduce data transfer using a dual-port memory, including a power management unit, and a clock gated approach. We also address some of issues and benefits of reusable and unified environment on a centralized data structure and SOC verification platform. This includes flexibility in meeting the final requirements using technology-independent tools wherever possible in various processes and for projects. The other aims of this work are to minimize design efforts by avoiding the same work done twice by different people and to reuse the similar environment and platform for different projects. This chip occupies a die size of 30mm2 in 0.18µm CMOS, and the worst-case current of the total chip is 54mA.
    Download PDF (600K)
  • Cheng-Min LIN
    Article type: PAPER
    Subject area: Software System
    2010 Volume E93.D Issue 9 Pages 2509-2519
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    Interrupt service routines are a key technology for embedded systems. In this paper, we introduce the standard approach for using Generalized Stochastic Petri Nets (GSPNs) as a high-level model for generating CTMC Continuous-Time Markov Chains (CTMCs) and then use Markov Reward Models (MRMs) to compute the performance for embedded systems. This framework is employed to analyze two embedded controllers with low cost and high performance, ARM7 and Cortex-M3. Cortex-M3 is designed with a tail-chaining mechanism to improve the performance of ARM7 when a nested interrupt occurs on an embedded controller. The Platform Independent Petri net Editor 2 (PIPE2) tool is used to model and evaluate the controllers in terms of power consumption and interrupt overhead performance. Using numerical results, in spite of the power consumption or interrupt overhead, Cortex-M3 performs better than ARM7.
    Download PDF (809K)
  • Hasan KADHEM, Toshiyuki AMAGASA, Hiroyuki KITAGAWA
    Article type: PAPER
    Subject area: Data Engineering, Web Information Systems
    2010 Volume E93.D Issue 9 Pages 2520-2533
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    Encryption can provide strong security for sensitive data against inside and outside attacks. This is especially true in the “Database as Service” model, where confidentiality and privacy are important issues for the client. In fact, existing encryption approaches are vulnerable to a statistical attack because each value is encrypted to another fixed value. This paper presents a novel database encryption scheme called MV-OPES (Multivalued — Order Preserving Encryption Scheme), which allows privacy-preserving queries over encrypted databases with an improved security level. Our idea is to encrypt a value to different multiple values to prevent statistical attacks. At the same time, MV-OPES preserves the order of the integer values to allow comparison operations to be directly applied on encrypted data. Using calculated distance (range), we propose a novel method that allows a join query between relations based on inequality over encrypted values. We also present techniques to offload query execution load to a database server as much as possible, thereby making a better use of server resources in a database outsourcing environment. Our scheme can easily be integrated with current database systems as it is designed to work with existing indexing structures. It is robust against statistical attack and the estimation of true values. MV-OPES experiments show that security for sensitive data can be achieved with reasonable overhead, establishing the practicability of the scheme.
    Download PDF (852K)
  • Cheng-Min LIN, Jyh-Horng LIN, Jen-Cheng CHIU
    Article type: PAPER
    Subject area: Information Network
    2010 Volume E93.D Issue 9 Pages 2534-2543
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In a WSAN (Wireless Sensor and Actuator Network), most resources, including sensors and actuators, are designed for certain applications in a dedicated environment. Many researchers have proposed to use of gateways to infer and annotate heterogeneous data; however, such centralized methods produce a bottlenecking network and computation overhead on the gateways that causes longer response time in activity processing, worsening performance. This work proposes two distribution inference mechanisms: regionalized and sequential inference mechanisms to reduce the response time in activity processing. Finally, experimental results for the proposed inference mechanisms are presented, and it shows that our mechanisms outperform the traditional centralized inference mechanism.
    Download PDF (1942K)
  • Jungsuk SONG, Hiroki TAKAKURA, Yasuo OKABE, Daisuke INOUE, Masashi ETO ...
    Article type: PAPER
    Subject area: Information Network
    2010 Volume E93.D Issue 9 Pages 2544-2554
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    Intrusion Detection Systems (IDS) have been received considerable attention among the network security researchers as one of the most promising countermeasures to defend our crucial computer systems or networks against attackers on the Internet. Over the past few years, many machine learning techniques have been applied to IDSs so as to improve their performance and to construct them with low cost and effort. Especially, unsupervised anomaly detection techniques have a significant advantage in their capability to identify unforeseen attacks, i.e., 0-day attacks, and to build intrusion detection models without any labeled (i.e., pre-classified) training data in an automated manner. In this paper, we conduct a set of experiments to evaluate and analyze performance of the major unsupervised anomaly detection techniques using real traffic data which are obtained at our honeypots deployed inside and outside of the campus network of Kyoto University, and using various evaluation criteria, i.e., performance evaluation by similarity measurements and the size of training data, overall performance, detection ability for unknown attacks, and time complexity. Our experimental results give some practical and useful guidelines to IDS researchers and operators, so that they can acquire insight to apply these techniques to the area of intrusion detection, and devise more effective intrusion detection models.
    Download PDF (1225K)
  • Masashi SUGIYAMA, Hirotaka HACHIYA, Hisashi KASHIMA, Tetsuro MORIMURA
    Article type: PAPER
    Subject area: Artificial Intelligence, Data Mining
    2010 Volume E93.D Issue 9 Pages 2555-2565
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    Least-squares policy iteration is a useful reinforcement learning method in robotics due to its computational efficiency. However, it tends to be sensitive to outliers in observed rewards. In this paper, we propose an alternative method that employs the absolute loss for enhancing robustness and reliability. The proposed method is formulated as a linear programming problem which can be solved efficiently by standard optimization software, so the computational advantage is not sacrificed for gaining robustness and reliability. We demonstrate the usefulness of the proposed approach through a simulated robot-control task.
    Download PDF (2186K)
  • Takaaki KAMOGAWA
    Article type: PAPER
    Subject area: Office Information Systems, e-Business Modeling
    2010 Volume E93.D Issue 9 Pages 2566-2576
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    This paper examines the structural relationships between Information Technology (IT) governance and Enterprise Architecture (EA), with the objective of enhancing business value in the enterprise society. Structural models consisting of four related hypotheses reveal the relationship between IT governance and EA in the improvement of business values. We statistically examined the hypotheses by analyzing validated questionnaire items from respondents within firms listed on the Japanese stock exchange who were qualified to answer them. We concluded that firms which have organizational ability controlled by IT governance are more likely to deliver business value based on IT portfolio management.
    Download PDF (177K)
  • Dan-ni AI, Xian-hua HAN, Xiang RUAN, Yen-wei CHEN
    Article type: PAPER
    Subject area: Pattern Recognition
    2010 Volume E93.D Issue 9 Pages 2577-2586
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In this paper, we present a novel color independent components based SIFT descriptor (termed CIC-SIFT) for object/scene classification. We first learn an efficient color transformation matrix based on independent component analysis (ICA), which is adaptive to each category in a database. The ICA-based color transformation can enhance contrast between the objects and the background in an image. Then we compute CIC-SIFT descriptors over all three transformed color independent components. Since the ICA-based color transformation can boost the objects and suppress the background, the proposed CIC-SIFT can extract more effective and discriminative local features for object/scene classification. The comparison is performed among seven SIFT descriptors, and the experimental classification results show that our proposed CIC-SIFT is superior to other conventional SIFT descriptors.
    Download PDF (1275K)
  • Aram KAWEWONG, Sirinart TANGRUAMSUB, Osamu HASEGAWA
    Article type: PAPER
    Subject area: Image Recognition, Computer Vision
    2010 Volume E93.D Issue 9 Pages 2587-2601
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    A novel Position-Invariant Robust Feature, designated as PIRF, is presented to address the problem of highly dynamic scene recognition. The PIRF is obtained by identifying existing local features (i.e. SIFT) that have a wide baseline visibility within a place (one place contains more than one sequential images). These wide-baseline visible features are then represented as a single PIRF, which is computed as an average of all descriptors associated with the PIRF. Particularly, PIRFs are robust against highly dynamical changes in scene: a single PIRF can be matched correctly against many features from many dynamical images. This paper also describes an approach to using these features for scene recognition. Recognition proceeds by matching an individual PIRF to a set of features from test images, with subsequent majority voting to identify a place with the highest matched PIRF. The PIRF system is trained and tested on 2000+ outdoor omnidirectional images and on COLD datasets. Despite its simplicity, PIRF offers a markedly better rate of recognition for dynamic outdoor scenes (ca. 90%) than the use of other features. Additionally, a robot navigation system based on PIRF (PIRF-Nav) can outperform other incremental topological mapping methods in terms of time (70% less) and memory. The number of PIRFs can be reduced further to reduce the time while retaining high accuracy, which makes it suitable for long-term recognition and localization.
    Download PDF (2664K)
  • Cheng WAN, Jun SATO
    Article type: PAPER
    Subject area: Image Recognition, Computer Vision
    2010 Volume E93.D Issue 9 Pages 2602-2613
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    The spatio-temporal multiple view geometry can represent the geometry of multiple images in the case where non-rigid arbitrary motions are viewed from multiple translational cameras. However, it requires many corresponding points and is sensitive to the image noise. In this paper, we investigate mutual projections of cameras in four-dimensional space and show that it enables us to reduce the number of corresponding points required for computing the spatio-temporal multiple view geometry. Surprisingly, take three views for instance, we no longer need any corresponding point to calculate the spatio-temporal multiple view geometry, if all the cameras are projected to the other cameras mutually for two time intervals. We also show that the stability of the computation of spatio-temporal multiple view geometry is drastically improved by considering the mutual projections of cameras.
    Download PDF (1322K)
  • Ukrit WATCHAREERUETAI, Tetsuya MATSUMOTO, Yoshinori TAKEUCHI, Hiroaki ...
    Article type: PAPER
    Subject area: Biocybernetics, Neurocomputing
    2010 Volume E93.D Issue 9 Pages 2614-2625
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    We propose a new multi-objective genetic programming (MOGP) for automatic construction of image feature extraction programs (FEPs). The proposed method was originated from a well known multi-objective evolutionary algorithm (MOEA), i.e., NSGA-II. The key differences are that redundancy-regulation mechanisms are applied in three main processes of the MOGP, i.e., population truncation, sampling, and offspring generation, to improve population diversity as well as convergence rate. Experimental results indicate that the proposed MOGP-based FEP construction system outperforms the two conventional MOEAs (i.e., NSGA-II and SPEA2) for a test problem. Moreover, we compared the programs constructed by the proposed MOGP with four human-designed object recognition programs. The results show that the constructed programs are better than two human-designed methods and are comparable with the other two human-designed methods for the test problem.
    Download PDF (1042K)
  • Junichi HORI, Kentarou SUNAGA, Satoru WATANABE
    Article type: PAPER
    Subject area: Biological Engineering
    2010 Volume E93.D Issue 9 Pages 2626-2634
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    We investigated suitable spatial inverse filters for cortical dipole imaging from the scalp electroencephalogram (EEG). The effects of incorporating statistical information of signal and noise into inverse procedures were examined by computer simulations and experimental studies. The parametric projection filter (PPF) and parametric Wiener filter (PWF) were applied to an inhomogeneous three-sphere volume conductor head model. The noise covariance matrix was estimated by applying independent component analysis (ICA) to scalp potentials. The present simulation results suggest that the PPF and the PWF provided excellent performance when the noise covariance was estimated from the differential noise between EEG and the separated signal using ICA and the signal covariance was estimated from the separated signal. Moreover, the spatial resolution of the cortical dipole imaging was improved while the influence of noise was suppressed by including the differential noise at the instant of the imaging and by adjusting the duration of noise sample according to the signal to noise ratio. We applied the proposed imaging technique to human experimental data of visual evoked potential and obtained reasonable results that coincide to physiological knowledge.
    Download PDF (705K)
  • Vladimir V. STANKOVIC, Nebojsa Z. MILENKOVIC
    Article type: LETTER
    Subject area: Computer System
    2010 Volume E93.D Issue 9 Pages 2635-2638
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In the arsenal of resources for improving computer memory system performance, predictors have gained an increasing role in the past few years. They enable hiding the latencies when accessing cache or main memory. In our previous work we proposed a DDR SDRAM controller with predictors that not only close the opened DRAM row but also predict the next row to be opened. In this paper we explore the possibilities of trying the same techniques on the latest type of DRAM memory, DDR3 SDRAM, with further improvements of the predictors.
    Download PDF (187K)
  • Jongwan KIM, Dukshin OH, Keecheon KIM
    Article type: LETTER
    Subject area: Data Engineering, Web Information Systems
    2010 Volume E93.D Issue 9 Pages 2639-2642
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    Since a radio frequency identification (RFID) transponder (tag) generates both location and time information when it enters and leaves a reader, the trajectory of a moving, tagged object can be traced. Due to the time intervals between entries to successive readers, during which tags are not tracked, accurate tracing of complete trajectories can be difficult. To overcome this problem, we propose a tag trajectory indexing scheme called TR-tree (R-tree-based tag trajectory index) that can trace tags by combining the local trajectories at each reader. In experiments, this scheme showed superior performance compared with other indices.
    Download PDF (635K)
  • Yuan HU, Li LU, Jingqi YAN, Zhi LIU, Pengfei SHI
    Article type: LETTER
    Subject area: Pattern Recognition
    2010 Volume E93.D Issue 9 Pages 2643-2646
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In this paper, we present the sexual dimorphism analysis in 3D human face and perform gender classification based on the result of sexual dimorphism analysis. Four types of features are extracted from a 3D human-face image. By using statistical methods, the existence of sexual dimorphism is demonstrated in 3D human face based on these features. The contributions of each feature to sexual dimorphism are quantified according to a novel criterion. The best gender classification rate is 94% by using SVMs and Matcher Weighting fusion method.This research adds to the knowledge of 3D faces in sexual dimorphism and affords a foundation that could be used to distinguish between male and female in 3D faces.
    Download PDF (628K)
  • Noriaki SUETAKE, Go TANAKA, Hayato HASHII, Eiji UCHINO
    Article type: LETTER
    Subject area: Image Processing and Video Processing
    2010 Volume E93.D Issue 9 Pages 2647-2650
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    In this letter, we propose a new tuning method of ε value, which is a parameter in the ε-filter, using a metric between signal distributions, i.e., Hellinger distance. The difference between the input and output signals is evaluated using Hellinger distance and used for the parameter tuning in the proposed method.
    Download PDF (542K)
  • Nan LIU, Yao ZHAO, Zhenfeng ZHU, Rongrong NI
    Article type: LETTER
    Subject area: Image Processing and Video Processing
    2010 Volume E93.D Issue 9 Pages 2651-2655
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    This paper presents a commercial shot classification scheme combining well-designed visual and textual features to automatically detect TV commercials. To identify the inherent difference between commercials and general programs, a special mid-level textual descriptor is proposed, aiming to capture the spatio-temporal properties of the video texts typical of commercials. In addition, we introduce an ensemble-learning based combination method, named Co-AdaBoost, to interactively exploit the intrinsic relations between the visual and textual features employed.
    Download PDF (943K)
  • Hui CAO, Koichiro YAMAGUCHI, Mitsuhiko OHTA, Takashi NAITO, Yoshiki NI ...
    Article type: LETTER
    Subject area: Image Recognition, Computer Vision
    2010 Volume E93.D Issue 9 Pages 2656-2659
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    We propose a novel representation called Feature Interaction Descriptor (FIND) to capture high-level properties of object appearance by computing pairwise interactions of adjacent region-level features. In order to deal with pedestrian detection task, we employ localized oriented gradient histograms as region-level features and measure interactions between adjacent histogram elements with a suitable histogram-similarity function. The experimental results show that our descriptor improves upon HOG significantly and outperforms related high-level features such as GLAC and CoHOG.
    Download PDF (320K)
  • Yoshihide KATO, Shigeki MATSUBARA
    Article type: LETTER
    Subject area: Natural Language Processing
    2010 Volume E93.D Issue 9 Pages 2660-2663
    Published: September 01, 2010
    Released on J-STAGE: September 01, 2010
    JOURNAL FREE ACCESS
    This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
    Download PDF (232K)
Errata
feedback
Top