IEICE Transactions on Information and Systems

Special Section on Information-Based Induction Sciences and Machine Learning

FOREWORD

Masashi SUGIYAMA

2011 Volume E94.D Issue 10 Pages 1845
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1845

JOURNAL FREE ACCESS

Download PDF (58K)
Kernel Methods for Chemical Compounds: From Classification to Design

Tatsuya AKUTSU, Hiroshi NAGAMOCHI

Article type: INVITED PAPER
2011 Volume E94.D Issue 10 Pages 1846-1853
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1846

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we briefly review kernel methods for analysis of chemical compounds with focusing on the authors' works. We begin with a brief review of existing kernel functions that are used for classification of chemical compounds and prediction of their activities. Then, we focus on the pre-image problem for chemical compounds, which is to infer a chemical structure that is mapped to a given feature vector, and has a potential application to design of novel chemical compounds. In particular, we consider the pre-image problem for feature vectors consisting of frequencies of labeled paths of length at most K. We present several time complexity results that include: NP-hardness result for a general case, polynomial time algorithm for tree structured compounds with fixed K, and polynomial time algorithm for K=1 based on graph detachment. Then we review practical algorithms for the pre-image problem, which are based on enumeration of chemical structures satisfying given constraints. We also briefly review related results which include efficient enumeration of stereoisomers of tree-like chemical compounds and efficient enumeration of outerplanar graphs.

View full abstract

Download PDF (459K)
A Short Introduction to Learning to Rank

Hang LI

Article type: INVITED PAPER
2011 Volume E94.D Issue 10 Pages 1854-1862
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1854

JOURNAL FREE ACCESS

Show abstractHide abstract

Learning to rank refers to machine learning techniques for training the model in a ranking task. Learning to rank is useful for many applications in Information Retrieval, Natural Language Processing, and Data Mining. Intensive studies have been conducted on the problem and significant progress has been made[1],[2]. This short paper gives an introduction to learning to rank, and it specifically explains the fundamental problems, existing approaches, and future work of learning to rank. Several learning to rank methods using SVM techniques are described in details.

View full abstract

Download PDF (312K)
Boosting Learning Algorithm for Pattern Recognition and Beyond

Osamu KOMORI, Shinto EGUCHI

Article type: INVITED PAPER
2011 Volume E94.D Issue 10 Pages 1863-1869
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1863

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper discusses recent developments for pattern recognition focusing on boosting approach in machine learning. The statistical properties such as Bayes risk consistency for several loss functions are discussed in a probabilistic framework. There are a number of loss functions proposed for different purposes and targets. A unified derivation is given by a generator function U which naturally defines entropy, divergence and loss function. The class of U-loss functions associates with the boosting learning algorithms for the loss minimization, which includes AdaBoost and LogitBoost as a twin generated from Kullback-Leibler divergence, and the (partial) area under the ROC curve. We expand boosting to unsupervised learning, typically density estimation employing U-loss function. Finally, a future perspective in machine learning is discussed.

View full abstract

Download PDF (155K)
Dimensionality Reduction for Histogram Features Based on Supervised Non-negative Matrix Factorization

Mitsuru AMBAI, Nugraha P. UTAMA, Yuichi YOSHIDA

Article type: PAPER
2011 Volume E94.D Issue 10 Pages 1870-1879
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1870

JOURNAL FREE ACCESS

Show abstractHide abstract

Histogram-based image features such as HoG, SIFT and histogram of visual words are generally represented as high-dimensional, non-negative vectors. We propose a supervised method of reducing the dimensionality of histogram-based features by using non-negative matrix factorization (NMF). We define a cost function for supervised NMF that consists of two terms. The first term is the generalized divergence term between an input matrix and a product of factorized matrices. The second term is the penalty term that reflects prior knowledge on a training set by assigning predefined constants to cannot-links and must-links in pairs of training data. A multiplicative update rule for minimizing the newly-defined cost function is also proposed. We tested our method on a task of scene classification using histograms of visual words. The experimental results revealed that each of the low-dimensional basis vectors obtained from the proposed method only appeared in a single specific category in most cases. This interesting characteristic not only makes it easy to interpret the meaning of each basis but also improves the power of classification.

View full abstract

Download PDF (930K)
Augmenting Training Samples with a Large Number of Rough Segmentation Datasets

Mitsuru AMBAI, Yuichi YOSHIDA

Article type: PAPER
2011 Volume E94.D Issue 10 Pages 1880-1888
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1880

JOURNAL FREE ACCESS

Show abstractHide abstract

We revisit the problem with generic object recognition from the point of view of human-computer interaction. While many existing algorithms for generic object recognition first try to detect target objects before features are extracted and classified in processing, our work is motivated by the belief that solving the task of detection by computer is not always necessary in many practical situations, such as those involving mobile recognition systems with touch displays and cameras. It is natural for these systems to ask users to input the segmentation data for targets through their touch displays. Speaking from the perspective of usability, such systems should involve rough segmentation to reduce the user workload. In this situation, different people would provide different segmentation data. Here, an interesting question arises - if multiple training samples are generated from a single image by using various segmentation data created by different people, what would happen to the accuracy of classification? We created “20 wild bird datasets” that had a large number of rough segmentation datasets made by 383 people in an attempt to answer this question. Our experiments revealed two interesting facts: (i) generating multiple training samples from a single image had positive effects on classification accuracies, especially when image features including spatial information were used and (ii) augmenting training samples with artificial segmentation data synthesized with a morphing technique also had slightly positive effects on classification accuracies.

View full abstract

Download PDF (1591K)
A Bayesian Model of Transliteration and Its Human Evaluation When Integrated into a Machine Translation System

Andrew FINCH, Keiji YASUDA, Hideo OKUMA, Eiichiro SUMITA, Satoshi NAKA ...

Article type: PAPER
2011 Volume E94.D Issue 10 Pages 1889-1900
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1889

JOURNAL FREE ACCESS

Show abstractHide abstract

The contribution of this paper is two-fold. Firstly, we conduct a large-scale real-world evaluation of the effectiveness of integrating an automatic transliteration system with a machine translation system. A human evaluation is usually preferable to an automatic evaluation, and in the case of this evaluation especially so, since the common machine translation evaluation methods are affected by the length of the translations they are evaluating, often being biassed towards translations in terms of their length rather than the information they convey. We evaluate our transliteration system on data collected in field experiments conducted all over Japan. Our results conclusively show that using a transliteration system can improve machine translation quality when translating unknown words. Our second contribution is to propose a novel Bayesian model for unsupervised bilingual character sequence segmentation of corpora for transliteration. The system is based on a Dirichlet process model trained using Bayesian inference through blocked Gibbs sampling implemented using an efficient forward filtering/backward sampling dynamic programming algorithm. The Bayesian approach is able to overcome the overfitting problem inherent in maximum likelihood training. We demonstrate the effectiveness of our Bayesian segmentation by using it to build a translation model for a phrase-based statistical machine translation (SMT) system trained to perform transliteration by monotonic transduction from character sequence to character sequence. The Bayesian segmentation was used to construct a phrase-table and we compared the quality of this phrase-table to one generated in the usual manner by the state-of-the-art GIZA++ word alignment process used in combination with phrase extraction heuristics from the MOSES statistical machine translation system, by using both to perform transliteration generation within an identical framework. In our experiments on English-Japanese data from the NEWS2010 transliteration generation shared task, we used our technique to bilingually co-segment the training corpus. We then derived a phrase-table from the segmentation from the sample at the final iteration of the training procedure, and the resulting phrase-table was used to directly substitute for the phrase-table extracted by using GIZA++/MOSES. The phrase-table resulting from our Bayesian segmentation model was approximately 30% smaller than that produced by the SMT system's training procedure, and gave an increase in transliteration quality measured in terms of both word accuracy and F-score.

View full abstract

Download PDF (620K)
Kernel Optimization Based Semi-Supervised KBDA Scheme for Image Retrieval

Xu YANG, Huilin XIONG, Xin YANG

Article type: PAPER
2011 Volume E94.D Issue 10 Pages 1901-1908
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1901

JOURNAL FREE ACCESS

Show abstractHide abstract

Kernel biased discriminant analysis (KBDA), as a subspace learning algorithm, has been an attractive approach for the relevance feedback in content-based image retrieval. Its performance, however, still suffers from the “small sample learning” problem and “kernel learning” problem. Aiming to solve these problems, in this paper, we present a new semi-supervised scheme of KBDA (S-KBDA), in which the projection learning and the “kernel learning” are interweaved into a constrained optimization framework. Specifically, S-KBDA learns a subspace that preserves both the biased discriminant structure among the labeled samples, and the geometric structure among all training samples. In kernel optimization, we directly optimize the kernel matrix, rather than a kernel function, which makes the kernel learning more flexible and appropriate for the retrieval task. To solve the constrained optimization problem, a fast algorithm based on gradient ascent is developed. The image retrieval experiments are given to show the effectiveness of the S-KBDA scheme in comparison with the original KBDA, and the other two state-of-the-art algorithms.

View full abstract

Download PDF (665K)
Enhancing Eigenspace-Based MLLR Speaker Adaptation Using a Fuzzy Logic Learning Control Scheme

Ing-Jr DING

Article type: PAPER
2011 Volume E94.D Issue 10 Pages 1909-1916
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1909

JOURNAL FREE ACCESS

Show abstractHide abstract

This study develops a fuzzy logic control mechanism in eigenspace-based MLLR speaker adaptation. Specifically, this mechanism can determine hidden Markov model parameters to enhance overall recognition performance despite ordinary or adverse conditions in both training and operating stages. The proposed mechanism regulates the influence of eigenspace-based MLLR adaptation given insufficient training data from a new speaker. This mechanism accounts for the amount of adaptation data available in transformation matrix parameter smoothing, and thus ensures the robustness of eigenspace-based MLLR adaptation against data scarcity. The proposed adaptive learning mechanism is computationally inexpensive. Experimental results show that eigenspace-based MLLR adaptation with fuzzy control outperforms conventional eigenspace-based MLLR, and especially when the adaptation data acquired from a new speaker is insufficient.

View full abstract

Download PDF (227K)
Adaptive Online Prediction Using Weighted Windows

Shin-ichi YOSHIDA, Kohei HATANO, Eiji TAKIMOTO, Masayuki TAKEDA

Article type: PAPER
2011 Volume E94.D Issue 10 Pages 1917-1923
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1917

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose online prediction algorithms for data streams whose characteristics might change over time. Our algorithms are applications of online learning with experts. In particular, our algorithms combine base predictors over sliding windows with different length as experts. As a result, our algorithms are guaranteed to be competitive with the base predictor with the best fixed-length sliding window in hindsight.

View full abstract

Download PDF (528K)
Multiscale Bagging and Its Applications

Hidetoshi SHIMODAIRA, Takafumi KANAMORI, Masayoshi AOKI, Kouta MINE

Article type: PAPER
2011 Volume E94.D Issue 10 Pages 1924-1932
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1924

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose multiscale bagging as a modification of the bagging procedure. In ordinary bagging, the bootstrap resampling is used for generating bootstrap samples. We replace it with the multiscale bootstrap algorithm. In multiscale bagging, the sample size m of bootstrap samples may be altered from the sample size n of learning dataset. For assessing the output of a classifier, we compute bootstrap probability of class label; the frequency of observing a specified class label in the outputs of classifiers learned from bootstrap samples. A scaling-law of bootstrap probability with respect to σ²=n/m has been developed in connection with the geometrical theory. We consider two different ways for using multiscale bagging of classifiers. The first usage is to construct a confidence set of class labels, instead of a single label. The second usage is to find inputs close to decision boundaries in the context of query by bagging for active learning. It turned out, interestingly, that an appropriate choice of m is m=-n, i.e., σ²=-1, for the first usage, and m=∞ , i.e., σ²=0, for the second usage.

View full abstract

Download PDF (1947K)
MQDF Retrained on Selected Sample Set

Yanwei WANG, Xiaoqing DING, Changsong LIU

Article type: LETTER
2011 Volume E94.D Issue 10 Pages 1933-1936
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1933

JOURNAL FREE ACCESS

Show abstractHide abstract

This letter has retrained an MQDF classifier on the retraining set, which is constructed by samples locating near classification boundary. The method is evaluated on HCL2000 and HCD Chinese handwriting sets. The results show that the retrained MQDF outperforms MQDF and cascade MQDF on all test sets.

View full abstract

Download PDF (168K)
Statistical Mechanics of Adaptive Weight Perturbation Learning

Ryosuke MIYOSHI, Yutaka MAEDA, Seiji MIYOSHI

Article type: LETTER
2011 Volume E94.D Issue 10 Pages 1937-1940
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1937

JOURNAL FREE ACCESS

Show abstractHide abstract

Weight perturbation learning was proposed as a learning rule in which perturbation is added to the variable parameters of learning machines. The generalization performance of weight perturbation learning was analyzed by statistical mechanical methods and was found to have the same asymptotic generalization property as perceptron learning. In this paper we consider the difference between perceptron learning and AdaTron learning, both of which are well-known learning rules. By applying this difference to weight perturbation learning, we propose adaptive weight perturbation learning. The generalization performance of the proposed rule is analyzed by statistical mechanical methods, and it is shown that the proposed learning rule has an outstanding asymptotic property equivalent to that of AdaTron learning.

View full abstract

Download PDF (204K)
Statistical Mechanics of On-Line Learning Using Correlated Examples

Kento NAKAO, Yuta NARUKAWA, Seiji MIYOSHI

Article type: LETTER
2011 Volume E94.D Issue 10 Pages 1941-1944
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1941

JOURNAL FREE ACCESS

Show abstractHide abstract

We consider a model composed of nonlinear perceptrons and analytically investigate its generalization performance using correlated examples in the framework of on-line learning by a statistical mechanical method. In Hebbian and AdaTron learning, the larger the number of examples used in an update, the slower the learning. In contrast, Perceptron learning does not exhibit such behaviors, and the learning becomes fast in some time region.

View full abstract

Download PDF (268K)

Regular Section

On the Generative Power of Cancel Minimal Linear Grammars with Single Nonterminal Symbol except the Start Symbol

Kaoru FUJIOKA, Hirofumi KATSUNO

Article type: PAPER
Subject area: Fundamentals of Information Systems
2011 Volume E94.D Issue 10 Pages 1945-1954
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1945

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper concerns cancel minimal linear grammars ([5]) that was introduced to generalize Geffert normal forms for phrase structure grammars. We consider the generative power of restricted cancel minimal linear grammars: the grammars have only one nonterminal symbol C except the start symbol S, and their productions consist of context-free type productions, the left-hand side of which is S and the right-hand side contains at most one occurrence of S, and a unique cancellation production C^m → ε that replaces the string C^m by the empty string ε. We show that, for any given positive integer m, the class of languages generated by cancel minimal linear grammars with C^m → ε, is properly included in the class of linear languages. Conversely, we show that for any linear language L, there exists some positive integer m such that a cancel minimal linear grammar with C^m → ε generates L. We also show how the generative power of cancel minimal linear grammars with a unique cancellation production C^m → ε vary according to changes of m and restrictions imposed on occurrences of terminal symbols in the right-hand side of productions.

View full abstract

Download PDF (231K)
Efficient and Secure Aggregation of Sensor Data against Multiple Corrupted Nodes

Atsuko MIYAJI, Kazumasa OMOTE

Article type: PAPER
Subject area: Information Network
2011 Volume E94.D Issue 10 Pages 1955-1965
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1955

JOURNAL FREE ACCESS

Show abstractHide abstract

Wireless Sensor Networks (WSNs) rely on in-network aggregation for efficiency, that is, readings from sensor nodes are aggregated at intermediate nodes to reduce the communication cost. However, the previous optimally secure in-network aggregation protocols against multiple corrupted nodes require two round-trip communications between each node and the base station, including the result-checking phase whose congestion is O(log n) where n is the total number of sensor nodes. In this paper, we propose an efficient and optimally secure sensor network aggregation protocol against multiple corrupted nodes by a random-walk adversary. Our protocol achieves one round-trip communication to satisfy optimal security without the result-checking phase, by conducting aggregation along with the verification, based on the idea of TESLA technique. Furthermore, we show that the congestion complexity, communication complexity and computational cost in our protocol are constant, i.e., O(1).

View full abstract

Download PDF (381K)
Query-Trail-Mediated Cooperative Behaviors of Peers in Unstructured P2P File Sharing Networks

Kei OHNISHI, Hiroshi YAMAMOTO, Masato UCHIDA, Yuji OIE

Article type: PAPER
Subject area: Information Network
2011 Volume E94.D Issue 10 Pages 1966-1980
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1966

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose two types of autonomic and distributed cooperative behaviors of peers for peer-to-peer (P2P) file-sharing networks. Cooperative behaviors of peers are mediated by query trails, and allows the exploration of better trade-off points between file search and storage load balancing performance. Query trails represent previous successful search paths and indicate which peers contributed to previous file searches and were at the same time exposed to the storage load. The first type of cooperative behavior is to determine the locations of replicas of files through the medium of query trails. Placement of replicas of files on strong query trails contributes to improvement of search performance, but a heavy load is generated due to writing files in storage to peers on the strong query trails. Therefore, we attempt to achieve storage load balancing between peers, while avoiding significant degradation of the search performance by creating replicas of files in peers adjacent to peers on strong query trails. The second type of cooperative behavior is to determine whether peers provide requested files through the medium of query trails. Provision of files by peers holding requested files on strong query trails contributes to better search performance, but such provision of files generates a heavy load for reading files from storage to peers on the strong query trails. Therefore, we attempt to achieve storage load balancing while making only small sacrifices in search performance by having peers on strong query trails refuse to provide files. Simulation results show that the first type of cooperative behavior provides equal or improved ability to explore trade-off points between storage load balancing and search performance in a static and nearly homogeneous P2P environment, without the need for fine tuning parameter values, compared to replication methods that require fine tuning of their parameters values. In addition, the combination of the second type and the first type of cooperative behavior yields better storage load balancing performance with little degradation of search performance. Moreover, even in a dynamic and heterogeneous P2P environment, the two types of cooperative behaviors yield good ability to explore trade-off points between storage load balancing and search performance.

View full abstract

Download PDF (833K)
Sensor Node Localization by Three Mobile Anchors in the Wireless Sensor Networks

Seunghak LEE, Namgi KIM, Heeyoul KIM, Younho LEE, Hyunsoo YOON

Article type: PAPER
Subject area: Information Network
2011 Volume E94.D Issue 10 Pages 1981-1988
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1981

JOURNAL FREE ACCESS

Show abstractHide abstract

For the deployment of sensor networks, the sensor localization, which finds the position of sensor nodes, is very important. Most previous localization schemes generally use the GPS signal for the sensor localization. However, the GPS signal is unavailable when there is an obstacle between the sensor nodes and satellites. Therefore, in this paper, we propose a new localization scheme which does not use the GPS signal. The proposed scheme localizes the sensors by using three mobile anchors. Because the three mobile anchors collaboratively move by themselves, it is self-localizable and can be adopted even when the sensors are randomly and sparsely deployed in the target field.

View full abstract

Download PDF (1067K)
Voting-Based Ensemble Classifiers to Detect Hedges and Their Scopes in Biomedical Texts

Huiwei ZHOU, Xiaoyan LI, Degen HUANG, Yuansheng YANG, Fuji REN

Article type: PAPER
Subject area: Artificial Intelligence, Data Mining
2011 Volume E94.D Issue 10 Pages 1989-1997
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1989

JOURNAL FREE ACCESS

Show abstractHide abstract

Previous studies of pattern recognition have shown that classifiers ensemble approaches can lead to better recognition results. In this paper, we apply the voting technique for the CoNLL-2010 shared task on detecting hedge cues and their scope in biomedical texts. Six machine learning-based systems are combined through three different voting schemes. We demonstrate the effectiveness of classifiers ensemble approaches and compare the performance of three different voting schemes for hedge cue and their scope detection. Experiments on the CoNLL-2010 evaluation data show that our best system achieves an F-score of 87.49% on hedge detection task and 60.87% on scope finding task respectively, which are significantly better than those of the previous systems.

View full abstract

Download PDF (394K)
Automatic Scale Detection for Contour Fragment Based on Difference of Curvature

Kei KAWAMURA, Daisuke ISHII, Hiroshi WATANABE

Article type: PAPER
Subject area: Pattern Recognition
2011 Volume E94.D Issue 10 Pages 1998-2005
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.1998

JOURNAL FREE ACCESS

Show abstractHide abstract

Scale-invariant features are widely used for image retrieval and shape classification. The curvature of a planar curve is a fundamental feature and it is geometrically invariant with respect it the coordinate system. The curvature-based feature varies in position when multi-scale analysis is performed. Therefore, it is important to recognize the scale in order to detect the feature point. Numerous shape descriptors based on contour shapes have been developed in the field of pattern recognition and computer vision. A curvature scale-space (CSS) representation cannot be applied to a contour fragment and requires the tracking of feature points. In a gradient-based curvature computation, although the gradient computation considers the scale, the curvature is normalized with respect to not the scale but the contour length. The scale-invariant feature transform algorithm that detects feature points from an image solves similar problems by using the difference of Gaussian (DoG). It is difficult to apply the SIFT algorithm to a planar curve for feature extraction. In this paper, an automatic scale detection method for a contour fragment is proposed. The proposed method detects the appropriate scales and their positions on the basis of the difference of curvature (DoC) without the tracking of feature points. To calculate the differences, scale-normalized curvature is introduced. An advantage of the DoC algorithm is that the appropriate scale can be obtained from a contour fragment as a local feature. It then extends the application area. The validity of the proposed method is confirmed by experiments. The proposed method provides the most stable and robust scales of feature points among conventional methods such as curvature scale-space and gradient-based curvature.

View full abstract

Download PDF (368K)
Probabilistic Concatenation Modeling for Corpus-Based Speech Synthesis

Shinsuke SAKAI, Tatsuya KAWAHARA, Hisashi KAWAI

Article type: PAPER
Subject area: Speech and Hearing
2011 Volume E94.D Issue 10 Pages 2006-2014
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.2006

JOURNAL FREE ACCESS

Show abstractHide abstract

The measure of the goodness, or inversely the cost, of concatenating synthesis units plays an important role in concatenative speech synthesis. In this paper, we present a probabilistic approach to concatenation modeling in which the goodness of concatenation is measured by the conditional probability of observing the spectral shape of the current candidate unit given the previous unit and the current phonetic context. This conditional probability is modeled by a conditional Gaussian density whose mean vector has a form of linear transform of the past spectral shape. Decision tree-based parameter tyingis performed to achieve robust trainingthat balances between model complexity and the amount of training data available. The concatenation models are implemented for a corpus-based speech synthesizer, and the effectiveness of the proposed method wasconfirmed by an objective evaluation as well as a subjective listening test. We also demonstrate that the proposed method generalizes some popular conventional methods in that those methods can be derived as the special cases of the proposed method.

View full abstract

Download PDF (324K)
Committee-Based Active Learning for Speech Recognition

Yuzo HAMANAKA, Koichi SHINODA, Takuya TSUTAOKA, Sadaoki FURUI, Tadashi ...

Article type: PAPER
Subject area: Speech and Hearing
2011 Volume E94.D Issue 10 Pages 2015-2023
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.2015

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a committee-based method of active learning for large vocabulary continuous speech recognition. Multiple recognizers are trained in this approach, and the recognition results obtained from these are used for selecting utterances. Those utterances whose recognition results differ the most among recognizers are selected and transcribed. Progressive alignment and voting entropy are used to measure the degree of disagreement among recognizers on the recognition result. Our method was evaluated by using 191-hour speech data in the Corpus of Spontaneous Japanese. It proved to be significantly better than random selection. It only required 63h of data to achieve a word accuracy of 74%, while standard training (i.e., random selection) required 103h of data. It also proved to be significantly better than conventional uncertainty sampling using word posterior probabilities.

View full abstract

Download PDF (585K)
Scalable Object Discovery: A Hash-Based Approach to Clustering Co-occurring Visual Words

Gibran FUENTES PINEDA, Hisashi KOGA, Toshinori WATANABE

Article type: PAPER
Subject area: Image Recognition, Computer Vision
2011 Volume E94.D Issue 10 Pages 2024-2035
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.2024

JOURNAL FREE ACCESS

Show abstractHide abstract

We present a scalable approach to automatically discovering particular objects (as opposed to object categories) from a set of images. The basic idea is to search for local image features that consistently appear in the same images under the assumption that such co-occurring features underlie the same object. We first represent each image in the set as a set of visual words (vector quantized local image features) and construct an inverted file to memorize the set of images in which each visual word appears. Then, our object discovery method proceeds by searching the inverted file and extracting visual word sets whose elements tend to appear in the same images; such visual word sets are called co-occurring word sets. Because of unstable and polysemous visual words, a co-occurring word set typically represents only a part of an object. We observe that co-occurring word sets associated with the same object often share many visual words with one another. Hence, to obtain the object models, we further cluster highly overlapping co-occurring word sets in an agglomerative manner. Remarkably, we accelerate both extraction and clustering of co-occurring word sets by Min-Hashing. We show that the models generated by our method can effectively discriminate particular objects. We demonstrate our method on the Oxford buildings dataset. In a quantitative evaluation using a set of ground truth landmarks, our method achieved higher scores than the state-of-the-art methods.

View full abstract

Download PDF (8724K)
A Parallel Framework for Fast Photomosaics

Dongwann KANG, Sang-Hyun SEO, Seung-Taek RYOO, Kyung-Hyun YOON

Article type: PAPER
Subject area: Computer Graphics
2011 Volume E94.D Issue 10 Pages 2036-2042
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.2036

JOURNAL FREE ACCESS

Show abstractHide abstract

Main bottleneck of photomosaic algorithm is a search for a best matched image. Unlike several techniques which use fast approximation search for increasing the speed, we propose a parallel framework for fast photomosaic using a programmable GPU. This paper suggests a design of vertex structure for a best match searching on each cell of photomosaic grid and shows a texture representation of image database. The shader programs which are used for searching a best match and rendering image tiles into a display are presented. In addition, a simple duplicate reduction and color correction methods are proposed. Our algorithm not only offers dramatic enhancement of speed, but also always guarantees the ‘exact’ result.

View full abstract

Download PDF (3342K)
An Optimal Algorithm for Searching the Optimal Translation of Query Windows in Quadtree Decomposition

Hao CHEN, Guangcun LUO

Article type: LETTER
Subject area: Data Engineering, Web Information Systems
2011 Volume E94.D Issue 10 Pages 2043-2047
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.2043

JOURNAL FREE ACCESS

Show abstractHide abstract

One of the efficient methods to build the index of continuous window queries over moving objects is by means of region quadtree index. In this paper, we present an optimal algorithm to search for the optimal position translation of query windows, where the total number of decomposed quadtree blocks for those windows in quadtree representation is minimal. We exploit the branch-and-bound concept to prune the particular paths of recursions in the search space. Evaluation proves that our optimal algorithm reduces search time greatly and the quadtree index based on optimal position translation works efficiently for continuous window queries. To the best of our knowledge, the algorithms and experiments reported in this paper are novel.

View full abstract

Download PDF (194K)
ROCKET: A Robust Parallel Algorithm for Clustering Large-Scale Transaction Databases

Woong-Kee LOH, Yang-Sae MOON, Heejune AHN

Article type: LETTER
Subject area: Artificial Intelligence, Data Mining
2011 Volume E94.D Issue 10 Pages 2048-2051
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.2048

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a robust and efficient algorithm called ROCKET for clustering large-scale transaction databases. ROCKET is a divisive hierarchical algorithm that makes the most of recent hardware architecture. ROCKET handles the cases with the small and the large number of similar transaction pairs separately and efficiently. Through experiments, we show that ROCKET achieves high-quality clustering with a dramatic performance improvement.

View full abstract

Download PDF (151K)
A Visual Signal Reliability for Robust Audio-Visual Speaker Identification

Md. TARIQUZZAMAN, Jin Young KIM, Seung You NA, Hyoung-Gook KIM, Dongso ...

Article type: LETTER
Subject area: Human-computer Interaction
2011 Volume E94.D Issue 10 Pages 2052-2055
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.2052

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, a novel visual signal reliability (VSR) measure is proposed to consider video degradation at the signal level in audio-visual speaker identification (AVSI). The VSR estimation is formulated using a Gaussian fuzzy membership function (GFMF) to measure lighting variations. The variance parameters of GFMF are optimized in order to maximize the performance of the overall AVSI. The experimental results show that the proposed method outperforms the score-based reliability measuring technique.

View full abstract

Download PDF (390K)
Fast Shape Matching Using Statistical Features of Shape Contexts

Moon-Jai LIM, Chan-Hee HAN, Si-Woong LEE, Yun-Ho KO

Article type: LETTER
Subject area: Image Recognition, Computer Vision
2011 Volume E94.D Issue 10 Pages 2056-2058
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.2056

JOURNAL FREE ACCESS

Show abstractHide abstract

A novel fast algorithm for shape matching using statistical features of shape contexts is presented. By pruning the candidate shapes using the moment-based statistical features of shape contexts, the required number of matching processes is dramatically reduced with negligible performance degradation. Experimental results demonstrate that the proposed algorithm reduces the pruning time up to 1/(r·) compared with the conventional RSC algorithm while maintaining a similar or better performance, where n is the number of sampled points of a shape and r is the number of randomly selected representative shape contexts for the query shape.

View full abstract

Download PDF (487K)
Fast and Simple 2D Shape Retrieval Using Discrete Shock Graph

Solima KHANAM, Seok-Woo JANG, Woojin PAIK

Article type: LETTER
Subject area: Image Recognition, Computer Vision
2011 Volume E94.D Issue 10 Pages 2059-2062
Published: October 01, 2011
Released on J-STAGE: October 01, 2011

DOIhttps://doi.org/10.1587/transinf.E94.D.2059

JOURNAL FREE ACCESS

Show abstractHide abstract

In this letter, we propose an effective method to retrieve images from a 2D shape image database using discrete shock graphs combined with an adaptive selection algorithm. Experimental results show that our method is more accurate and fast than conventional approaches and reduces computational complexity.

View full abstract

Download PDF (654K)

Register with J-STAGE for free!