IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
E97.D 巻, 6 号
選択された号の論文の40件中1~40を表示しています
Special Section on Advances in Modeling for Real-world Speech Information Processing and its Application
  • Yoichi YAMASHITA
    2014 年 E97.D 巻 6 号 p. 1402
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
  • Toru NAKASHIKA, Tetsuya TAKIGUCHI, Yasuo ARIKI
    原稿種別: PAPER
    専門分野: Voice Conversion and Speech Enhancement
    2014 年 E97.D 巻 6 号 p. 1403-1410
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build high-order eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speaker-dependent RBMs with neural networks, expecting that they automatically discover abstractions to express the original input features. Under this concept, if we train the RBMs using only the speech of an individual speaker that includes various phonemes while keeping the speaker individuality unchanged, it can be considered that there are fewer phonemes and relatively more speaker individuality in the output features of the hidden layer than original acoustic features. Training the RBMs for a source speaker and a target speaker, we can then connect and convert the speaker individuality abstractions using Neural Networks (NN). The converted abstraction of the source speaker is then back-propagated into the acoustic space (e.g., MFCC) using the RBM of the target speaker. We conducted speaker-voice conversion experiments and confirmed the efficacy of our method with respect to subjective and objective criteria, comparing it with the conventional Gaussian Mixture Model-based method and an ordinary NN.
  • Ryo AIHARA, Ryoichi TAKASHIMA, Tetsuya TAKIGUCHI, Yasuo ARIKI
    原稿種別: PAPER
    専門分野: Voice Conversion and Speech Enhancement
    2014 年 E97.D 巻 6 号 p. 1411-1418
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    This paper presents a voice conversion (VC) technique for noisy environments based on a sparse representation of speech. Sparse representation-based VC using Non-negative matrix factorization (NMF) is employed for noise-added spectral conversion between different speakers. In our previous exemplar-based VC method, source exemplars and target exemplars are extracted from parallel training data, having the same texts uttered by the source and target speakers. The input source signal is represented using the source exemplars and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. However, this exemplar-based approach needs to hold all training exemplars (frames), and it requires high computation times to obtain the weights of the source exemplars. In this paper, we propose a framework to train the basis matrices of the source and target exemplars so that they have a common weight matrix. By using the basis matrices instead of the exemplars, the VC is performed with lower computation times than with the exemplar-based method. The effectiveness of this method was confirmed by comparing its effectiveness (in speaker conversion experiments using noise-added speech data) with that of an exemplar-based method and a conventional Gaussian mixture model (GMM)-based method.
  • Kazuhiro KOBAYASHI, Tomoki TODA, Hironori DOI, Tomoyasu NAKANO, Masata ...
    原稿種別: PAPER
    専門分野: Voice Conversion and Speech Enhancement
    2014 年 E97.D 巻 6 号 p. 1419-1428
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    The perceived age of a singing voice is the age of the singer as perceived by the listener, and is one of the notable characteristics that determines perceptions of a song. In this paper, we describe an investigation of acoustic features that have an effect on the perceived age, and a novel voice timbre control technique based on the perceived age for singing voice conversion (SVC). Singers can sing expressively by controlling prosody and voice timbre, but the varieties of voices that singers can produce are limited by physical constraints. Previous work has attempted to overcome this limitation through the use of statistical voice conversion. This technique makes it possible to convert singing voice timbre of an arbitrary source singer into those of an arbitrary target singer. However, it is still difficult to intuitively control singing voice characteristics by manipulating parameters corresponding to specific physical traits, such as gender and age. In this paper, we first perform an investigation of the factors that play a part in the listener's perception of the singer's age at first. Then, we applied a multiple-regression Gaussian mixture models (MR-GMM) to SVC for the purpose of controlling voice timbre based on the perceived age and we propose SVC based on the modified MR-GMM for manipulating the perceived age while maintaining singer's individuality. The experimental results show that 1) the perceived age of singing voices corresponds relatively well to the actual age of the singer, 2) prosodic features have a larger effect on the perceived age than spectral features, 3) the individuality of a singer is influenced more heavily by segmental features than prosodic features 4) the proposed voice timbre control method makes it possible to change the singer's perceived age while not having an adverse effect on the perceived individuality.
  • Kou TANAKA, Tomoki TODA, Graham NEUBIG, Sakriani SAKTI, Satoshi NAKAMU ...
    原稿種別: PAPER
    専門分野: Voice Conversion and Speech Enhancement
    2014 年 E97.D 巻 6 号 p. 1429-1437
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    This paper presents an electrolaryngeal (EL) speech enhancement method capable of significantly improving naturalness of EL speech while causing no degradation in its intelligibility. An electrolarynx is an external device that artificially generates excitation sounds to enable laryngectomees to produce EL speech. Although proficient laryngectomees can produce quite intelligible EL speech, it sounds very unnatural due to the mechanical excitation produced by the device. Moreover, the excitation sounds produced by the device often leak outside, adding to EL speech as noise. To address these issues, there are mainly two conventional approached to EL speech enhancement through either noise reduction or statistical voice conversion (VC). The former approach usually causes no degradation in intelligibility but yields only small improvements in naturalness as the mechanical excitation sounds remain essentially unchanged. On the other hand, the latter approach significantly improves naturalness of EL speech using spectral and excitation parameters of natural voices converted from acoustic parameters of EL speech, but it usually causes degradation in intelligibility owing to errors in conversion. We propose a hybrid approach using a noise reduction method for enhancing spectral parameters and statistical voice conversion method for predicting excitation parameters. Moreover, we further modify the prediction process of the excitation parameters to improve its prediction accuracy and reduce adverse effects caused by unvoiced/voiced prediction errors. The experimental results demonstrate the proposed method yields significant improvements in naturalness compared with EL speech while keeping intelligibility high enough.
  • Kazuhiro NAKAMURA, Kei HASHIMOTO, Yoshihiko NANKAKU, Keiichi TOKUDA
    原稿種別: PAPER
    専門分野: HMM-based Speech Synthesis
    2014 年 E97.D 巻 6 号 p. 1438-1448
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    This paper proposes a novel approach for integrating spectral feature extraction and acoustic modeling in hidden Markov model (HMM) based speech synthesis. The statistical modeling process of speech waveforms is typically divided into two component modules: the frame-by-frame feature extraction module and the acoustic modeling module. In the feature extraction module, the statistical mel-cepstral analysis technique has been used and the objective function is the likelihood of mel-cepstral coefficients for given speech waveforms. In the acoustic modeling module, the objective function is the likelihood of model parameters for given mel-cepstral coefficients. It is important to improve the performance of each component module for achieving higher quality synthesized speech. However, the final objective of speech synthesis systems is to generate natural speech waveforms from given texts, and the improvement of each component module does not always lead to the improvement of the quality of synthesized speech. Therefore, ideally all objective functions should be optimized based on an integrated criterion which well represents subjective speech quality of human perception. In this paper, we propose an approach to model speech waveforms directly and optimize the final objective function. Experimental results show that the proposed method outperformed the conventional methods in objective and subjective measures.
  • Chen-Yu YANG, Zhen-Hua LING, Li-Rong DAI
    原稿種別: PAPER
    専門分野: Speech Synthesis and Related Topics
    2014 年 E97.D 巻 6 号 p. 1449-1460
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    In this paper, an automatic and unsupervised method using context-dependent hidden Markov models (CD-HMMs) is proposed for the prosodic labeling of speech synthesis databases. This method consists of three main steps, i.e., initialization, model training and prosodic labeling. The initial prosodic labels are obtained by unsupervised clustering using the acoustic features designed according to the characteristics of the prosodic descriptor to be labeled. Then, CD-HMMs of the spectral parameters, F0s and phone durations are estimated by a means similar to the HMM-based parametric speech synthesis using the initial prosodic labels. These labels are further updated by Viterbi decoding under the maximum likelihood criterion given the acoustic feature sequences and the trained CD-HMMs. The model training and prosodic labeling procedures are conducted iteratively until convergence. The performance of the proposed method is evaluated on Mandarin speech synthesis databases and two prosodic descriptors are investigated, i.e., the prosodic phrase boundary and the emphasis expression. In our implementation, the prosodic phrase boundary labels are initialized by clustering the durations of the pauses between every two consecutive prosodic words, and the emphasis expression labels are initialized by examining the differences between the original and the synthetic F0 trajectories. Experimental results show that the proposed method is able to label the prosodic phrase boundary positions much more accurately than the text-analysis-based method without requiring any manually labeled training data. The unit selection speech synthesis system constructed using the prosodic phrase boundary labels generated by our proposed method achieves similar performance to that using the manual labels. Furthermore, the unit selection speech synthesis system constructed using the emphasis expression labels generated by our proposed method can convey the emphasis information effectively while maintaining the naturalness of synthetic speech.
  • Xiaohong YANG, Mingxing XU, Yufang YANG
    原稿種別: PAPER
    専門分野: Speech Synthesis and Related Topics
    2014 年 E97.D 巻 6 号 p. 1461-1467
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    The research reported in this paper is an attempt to elucidate the predictors of pause duration in read-aloud discourse. Through simple linear regression analysis and stepwise multiple linear regression, we examined how different factors (namely, syntactic structure, discourse hierarchy, topic structure, preboundary length, and postboundary length) influenced pause duration both separately and jointly. Results from simple regression analysis showed that discourse hierarchy, syntactic structure, topic structure, and postboundary length had significant impacts on boundary pause duration. However, when these factors were tested in a stepwise regression analysis, only discourse hierarchy, syntactic structure, and postboundary length were found to have significant impacts on boundary pause duration. The regression model that best predicted boundary pause duration in discourse context was the one that first included syntactic structure, and then included discourse hierarchy and postboundary length. This model could account for about 80% of the variance of pause duration. Tests of mediation models showed that the effects of topic structure and discourse hierarchy were significantly mediated by syntactic structure, which was most closely correlated with pause duration. These results support an integrated model combining the influence of several factors and can be applied to text-to-speech systems.
  • Keigo KUBO, Sakriani SAKTI, Graham NEUBIG, Tomoki TODA, Satoshi NAKAMU ...
    原稿種別: PAPER
    専門分野: Speech Synthesis and Related Topics
    2014 年 E97.D 巻 6 号 p. 1468-1476
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Grapheme-to-phoneme (g2p) conversion, used to estimate the pronunciations of out-of-vocabulary (OOV) words, is a highly important part of recognition systems, as well as text-to-speech systems. The current state-of-the-art approach in g2p conversion is structured learning based on the Margin Infused Relaxed Algorithm (MIRA), which is an online discriminative training method for multiclass classification. However, it is known that the aggressive weight update method of MIRA is prone to overfitting, even if the current example is an outlier or noisy. Adaptive Regularization of Weight Vectors (AROW) has been proposed to resolve this problem for binary classification. In addition, AROW's update rule is simpler and more efficient than that of MIRA, allowing for more efficient training. Although AROW has these advantages, it has not been applied to g2p conversion yet. In this paper, we first apply AROW on g2p conversion task which is structured learning problem. In an evaluation that employed a dataset generated from the collective knowledge on the Web, our proposed approach achieves a 6.8% error reduction rate compared to MIRA in terms of phoneme error rate. Also the learning time of our proposed approach was shorter than that of MIRA in almost datasets.
  • Yu TSAO, Ting-Yao HU, Sakriani SAKTI, Satoshi NAKAMURA, Lin-shan LEE
    原稿種別: PAPER
    専門分野: Speech Recognition
    2014 年 E97.D 巻 6 号 p. 1477-1487
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    This study proposes a variable selection linear regression (VSLR) adaptation framework to improve the accuracy of automatic speech recognition (ASR) with only limited and unlabeled adaptation data. The proposed framework can be divided into three phases. The first phase prepares multiple variable subsets by applying a ranking filter to the original regression variable set. The second phase determines the best variable subset based on a pre-determined performance evaluation criterion and computes a linear regression (LR) mapping function based on the determined subset. The third phase performs adaptation in either model or feature spaces. The three phases can select the optimal components and remove redundancies in the LR mapping function effectively and thus enable VSLR to provide satisfactory adaptation performance even with a very limited number of adaptation statistics. We formulate model space VSLR and feature space VSLR by integrating the VS techniques into the conventional LR adaptation systems. Experimental results on the Aurora-4 task show that model space VSLR and feature space VSLR, respectively, outperform standard maximum likelihood linear regression (MLLR) and feature space MLLR (fMLLR) and their extensions, with notable word error rate (WER) reductions in a per-utterance unsupervised adaptation manner.
  • Shoko YAMAHATA, Yoshikazu YAMAGUCHI, Atsunori OGAWA, Hirokazu MASATAKI ...
    原稿種別: PAPER
    専門分野: Speech Recognition
    2014 年 E97.D 巻 6 号 p. 1488-1496
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Recognition errors caused by out-of-vocabulary (OOV) words lead critical problems when developing spoken language understanding systems based on automatic speech recognition technology. And automatic vocabulary adaptation is an essential technique to solve these problems. In this paper, we propose a novel and effective automatic vocabulary adaptation method. Our method selects OOV words from relevant documents using combined scores of semantic and acoustic similarities. Using this combined score that reflects both semantic and acoustic aspects, only necessary OOV words can be selected without registering redundant words. In addition, our method estimates probabilities of OOV words using semantic similarity and a class-based N-gram language model. These probabilities will be appropriate since they are estimated by considering both frequencies of OOV words in target speech data and the stable class N-gram probabilities. Experimental results show that our method improves OOV selection accuracy and recognition accuracy of newly registered words in comparison with conventional methods.
  • Lasguido NIO, Sakriani SAKTI, Graham NEUBIG, Tomoki TODA, Satoshi NAKA ...
    原稿種別: PAPER
    専門分野: Dialog System
    2014 年 E97.D 巻 6 号 p. 1497-1505
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    This paper describes the design and evaluation of a method for developing a chat-oriented dialog system by utilizing real human-to-human conversation examples from movie scripts and Twitter conversations. The aim of the proposed method is to build a conversational agent that can interact with users in as natural a fashion as possible, while reducing the time requirement for database design and collection. A number of the challenging design issues we faced are described, including (1) constructing an appropriate dialog corpora from raw movie scripts and Twitter data, and (2) developing an multi domain chat-oriented dialog management system which can retrieve a proper system response based on the current user query. To build a dialog corpus, we propose a unit of conversation called a tri-turn (a trigram conversation turn), as well as extraction and semantic similarity analysis techniques to help ensure that the content extracted from raw movie/drama script files forms appropriate dialog-pair (query-response) examples. The constructed dialog corpora are then utilized in a data-driven dialog management system. Here, various approaches are investigated including example-based (EBDM) and response generation using phrase-based statistical machine translation (SMT). In particular, we use two EBDM: syntactic-semantic similarity retrieval and TF-IDF based cosine similarity retrieval. Experiments are conducted to compare and contrast EBDM and SMT approaches in building a chat-oriented dialog system, and we investigate a combined method that addresses the advantages and disadvantages of both approaches. System performance was evaluated based on objective metrics (semantic similarity and cosine similarity) and human subjective evaluation from a small user study. Experimental results show that the proposed filtering approach effectively improve the performance. Furthermore, the results also show that by combing both EBDM and SMT approaches, we could overcome the shortcomings of each.
Regular Section
  • Naoki NISHIKAWA, Keisuke IWAI, Hidema TANAKA, Takakazu KUROKAWA
    原稿種別: PAPER
    専門分野: Fundamentals of Information Systems
    2014 年 E97.D 巻 6 号 p. 1506-1515
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Computer systems with GPUs are expected to become a strong methodology for high-speed encryption processing. Moreover, power consumption has remained a primary deterrent for such processing on devices of all sizes. However, GPU vendors are currently announcing their future roadmaps of GPU architecture development: Nvidia Corp. promotes the Kepler architecture and AMD Corp. emphasizes the GCN architecture. Therefore, we evaluated throughput and power efficiency of three 128-bit block ciphers on GPUs with recent Nvidia Kepler and AMD GCN architectures. From our experiments, whereas the throughput and per-watt throughput of AES-128 on Radeon HD 7970 (2048 cores) with GCN architecture are 205.0Gbps and 1.3Gbps/Watt respectively, those on Geforce GTX 680 (1536 cores) with Kepler architecture are, respectively, 63.9Gbps and 0.43Gbps/W; an approximately 3.2 times throughput difference occurs between AES-128 on the two GPUs. Next, we investigate the reasons for the throughput difference using our micro-benchmark suites. According to the results, we speculate that to ameliorate Kepler GPUs as co-processor of block ciphers, the arithmetic and logical instructions must be improved in terms of software and hardware.
  • Ahmad Iqbal Hakim SUHAIMI, Yuichi GOTO, Jingde CHENG
    原稿種別: PAPER
    専門分野: Software Engineering
    2014 年 E97.D 巻 6 号 p. 1516-1527
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Information Security Management Systems (ISMSs) play important roles in helping organizations to manage their information securely. However, establishing, managing, and maintaining ISMSs is not an easy task for most organizations because an ISMS has many participants and tasks, and requires many kinds of documents. Therefore, organizations with ISMSs demand tools that can support them to perform all tasks in ISMS lifecycle processes consistently and continuously. To realize such support tools, a database system that manages ISO/IEC 27000 series, which are international standards for ISMSs, and ISMS documents, which are the products of tasks in ISMS lifecycle processes, is indispensable. The database system should manage data of the standards and documents for all available versions and translations, relationship among the standards and documents, authorization to access the standards and documents, and metadata of the standards and documents. No such database system has existed until now. This paper presents an information security management database system (ISMDS) that manages ISO/IEC 27000 series and ISMS documents. ISMDS is a meta-database system that manages several databases of standards and documents. ISMDS is used by participants in ISMS as well as tools supporting the participants to perform tasks in ISMS lifecycle processes. The users or tools can retrieve data from all versions and translations of the standards and documents. The paper also presents some use cases to show the effectiveness of ISMDS.
  • Taku FUKUSHIMA, Takashi YOSHINO
    原稿種別: PAPER
    専門分野: Data Engineering, Web Information Systems
    2014 年 E97.D 巻 6 号 p. 1528-1534
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    In this study, we have developed a translation repair method to automatically improve the accuracy of translations. Machine translation (MT) supports multilingual communication; however, it cannot achieve high accuracy. MT creates only one translated sentence; therefore, it is difficult to improve the accuracy of translated sentences. Our method creates multiple translations by adding personal pronouns to the source sentence and by using a word dictionary and a parallel corpus. In addition, it selects an accurate translation from among the multiple translations using the results of a Web search. As a result, the translation repair method improved the accuracy of translated sentences, and its accuracy is greater than that of MT.
  • Dajuan FAN, Zhiqiu HUANG, Lei TANG
    原稿種別: PAPER
    専門分野: Data Engineering, Web Information Systems
    2014 年 E97.D 巻 6 号 p. 1535-1545
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    One of the most important problems in web services application is the integration of different existing services into a new composite service. Existing work has the following disadvantages: (i) developers are often required to provide a composite service model first and perform formal verifications to check whether the model is correct. This makes the synthesis process of composite services semi-automatic, complex and inefficient; (ii) there is no assurance that composite services synthesized by using the fully-automatic approaches are correct; (iii) some approaches only handle simple composition problems where existing services are atomic. To address these problems, we propose a correct assurance approach for automatically synthesizing composite services based on finite state machine model. The syntax and semantics of the requirement model specifying composition requirements is also proposed. Given a set of abstract BPEL descriptions of existing services, and a composition requirement, our approach automatically generate the BPEL implementation of the composite service. Compared with existing approaches, the composite service generated by utilizing our proposed approach is guaranteed to be correct and does not require any formal verification. The correctness of our approach is proved. Moreover, the case analysis indicates that our approach is feasible and effective.
  • Naoya ONIZAWA, Akira MOCHIZUKI, Hirokatsu SHIRAHAMA, Masashi IMAI, Tom ...
    原稿種別: PAPER
    専門分野: Dependable Computing
    2014 年 E97.D 巻 6 号 p. 1546-1556
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    This paper introduces a partially parallel inter-chip link architecture for asynchronous multi-chip Network-on-Chips (NoCs). The multi-chip NoCs that operate as a large NoC have been recently proposed for very large systems, such as automotive applications. Inter-chip links are key elements to realize high-performance multi-chip NoCs using a limited number of I/Os. The proposed asynchronous link based on level-encoded dual-rail (LEDR) encoding transmits several bits in parallel that are received by detecting the phase information of the LEDR signals at each serial link. It employs a burst-mode data transmission that eliminates a per-bit handshake for a high-speed operation, but the elimination may cause data-transmission errors due to cross-talk and power-supply noises. For triggering data retransmission, errors are detected from the embedded phase information; error-detection codes are not used. The throughput is theoretically modelled and is optimized by considering the bit-error rate (BER) of the link. Using delay parameters estimated for a 0.13 µm CMOS technology, the throughput of 8.82 Gbps is achieved by using 10 I/Os, which is 90.5% higher than that of a link using 9 I/Os without an error-detection method operating under negligible low BER (<10-20).
  • Akisato KIMURA, Kevin DUH, Tsutomu HIRAO, Katsuhiko ISHIGURO, Tomoharu ...
    原稿種別: PAPER
    専門分野: Artificial Intelligence, Data Mining
    2014 年 E97.D 巻 6 号 p. 1557-1566
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Social media such as microblogs have become so pervasive such that it is now possible to use them as sensors for real-world events and memes. While much recent research has focused on developing automatic methods for filtering and summarizing these data streams, we explore a different trend called social curation. In contrast to automatic methods, social curation is characterized as a human-in-the-loop and sometimes crowd-sourced mechanism for exploiting social media as sensors. Although social curation web services like Togetter, Naver Matome and Storify are gaining popularity, little academic research has studied the phenomenon. In this paper, our goal is to investigate the phenomenon and potential of this new field of social curation. First, we perform an in-depth analysis of a large corpus of curated microblog data. We seek to understand why and how people participate in this laborious curation process. We then explore new ways in which information retrieval and machine learning technologies can be used to assist curators. In particular, we propose a novel method based on a learning-to-rank framework that increases the curator's productivity and breadth of perspective by suggesting which novel microblogs should be added to the curated content.
  • Atsuhiro NISHI, Masanori YOKOYAMA, Ken-ichiro OGAWA, Taiki OGATA, Taka ...
    原稿種別: PAPER
    専門分野: Office Information Systems, e-Business Modeling
    2014 年 E97.D 巻 6 号 p. 1567-1573
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    The present study aims to investigate the effect of voluntary movements on human temporal perception in multisensory integration. We therefore performed temporal order judgment (TOJ) tasks in audio-tactile integration under three conditions: no movement, involuntary movement, and voluntary movement. It is known that the point of subjective simultaneity (PSS) under the no movement condition, that is, normal TOJ tasks, appears when a tactile stimulus is presented before an auditory stimulus. Our experiment showed that involuntary and voluntary movements shift the PSS to a value that reduces the interval between the presentations of auditory and tactile stimuli. Here, the shift of the PSS under the voluntary movement condition was greater than that under the involuntary movement condition. Remarkably, the PSS under the voluntary movement condition appears when an auditory stimulus slightly precedes a tactile stimulus. In addition, a just noticeable difference (JND) under the voluntary movement condition was smaller than those under the other two conditions. These results reveal that voluntary movements alternate the temporal integration of audio-tactile stimuli. In particular, our results suggest that voluntary movements reverse the temporal perception order of auditory and tactile stimuli and improve the temporal resolution of temporal perception. We discuss the functional mechanism of shifting the PSS under the no movement condition with voluntary movements in audio-tactile integration.
  • Kai KANG, Weibin LIU, Weiwei XING
    原稿種別: PAPER
    専門分野: Pattern Recognition
    2014 年 E97.D 巻 6 号 p. 1574-1582
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    This paper introduces an unsupervised method for motion pattern learning and abnormality detection from video surveillance. In the preprocessing steps, trajectories are segmented based on their locations, and the sub-trajectories are represented as codebooks. Under our framework, Hidden Markov Models (HMMs) are used to characterize the motion pattern feature of the trajectory groups. The state of trajectory is represented by a HMM and has a probability distribution over the possible output sub-trajectories. Bayesian Information Criterion (BIC) is introduced to measure the similarity between groups. Based on the pairwise similarity scores, an affinity matrix is constructed which indicates the distance between different trajectory groups. An Adaptable Dynamic Hierarchical Clustering (ADHC) tree is proposed to gradually merge the most similar groups and form the trajectory motion patterns, which implements a simpler and more tractable dynamical clustering procedure in updating the clustering results with lower time complexity and avoids the traditional overfitting problem. By using the HMM models generated for the obtained trajectory motion patterns, we may recognize motion patterns and detect anomalies by computing the likelihood of the given trajectory, where a maximum likelihood for HMM indicates a pattern, and a small one below a threshold suggests an anomaly. Experiments are performed on EIFPD trajectory datasets from a structureless scene, where pedestrians choose their walking paths randomly. The experimental results show that our method can accurately learn motion patterns and detect anomalies with better performance.
  • Kazu MISHIBA, Takeshi YOSHITOME
    原稿種別: PAPER
    専門分野: Image Processing and Video Processing
    2014 年 E97.D 巻 6 号 p. 1583-1589
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    The relative arrangement, such as relative positions and orientations among objects, can play an important role in expressing the situation such as sports games and race scenes. In this paper, we propose a retargeting method that allows maintaining the relative arrangement. Our proposed retargeting method is based on a warping method which finds an optimal transformation by solving an energy minimization problem. To achieve protection of object arrangement, we introduce an energy that enforces all the objects and the relative positions among these objects to be transformed by the same transformation in the retargeting process. In addition, our method imposes the following three types of conditions in order to obtain more satisfactory results: protection of important regions, avoiding extreme deformation, and cropping with preservation of the balance of visual importance. Experimental results demonstrate that our proposed method maintains the relative arrangement while protecting important regions.
  • Min YAO, Hiroshi NAGAHASHI, Kota AOKI
    原稿種別: PAPER
    専門分野: Image Recognition, Computer Vision
    2014 年 E97.D 巻 6 号 p. 1590-1598
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    A number of well-known learning-based face detectors can achieve extraordinary performance in controlled environments. But face detection under varying illumination is still challenging. Possible solutions to this illumination problem could be creating illumination invariant features or utilizing skin color information. However, the features and skin colors are not sufficiently reliable under difficult lighting conditions. Another possible solution is to do illumination normalization (e.g., Histogram Equalization (HE)) prior to executing face detectors. However, applications of normalization to face detection have not been widely studied in the literature. This paper applies and evaluates various existing normalization methods under the framework of combining the illumination normalization and two learning-based face detectors (Haar-like face detector and LBP face detector). These methods were initially proposed for different purposes (face recognition or image quality enhancement), but some of them significantly improve the original face detectors and lead to better performance than HE according to the results of the comparative experiments on two databases. Meanwhile, we propose a new normalization method called segmentation-based half histogram stretching and truncation (SH) for face detection under varying illumination. It first employs Otsu method to segment the histogram (intensities) of the input image into several spans and then does the redistribution on the segmented spans. In this way, the non-uniform illumination can be efficiently compensated and local facial structures can be appropriately enhanced. Our method obtains good performance according to the experiments.
  • Chuchart PINTAVIROOJ, Fernand S. COHEN, Woranut IAMPA
    原稿種別: PAPER
    専門分野: Image Recognition, Computer Vision
    2014 年 E97.D 巻 6 号 p. 1599-1613
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    This paper addresses the problems of fingerprint identification and verification when a query fingerprint is taken under conditions that differ from those under which the fingerprint of the same person stored in a database was constructed. This occurs when using a different fingerprint scanner with a different pressure, resulting in a fingerprint impression that is smeared and distorted in accordance with a geometric transformation (e.g., affine or even non-linear). Minutiae points on a query fingerprint are matched and aligned to those on one of the fingerprints in the database, using a set of absolute invariants constructed from the shape and/or size of minutiae triangles depending on the assumed map. Once the best candidate match is declared and the corresponding minutiae points are flagged, the query fingerprint image is warped against the candidate fingerprint image in accordance with the estimated warping map. An identification/verification cost function using a combination of distance map and global directional filterbank (DFB) features is then utilized to verify and identify a query fingerprint against candidate fingerprint(s). Performance of the algorithm yields an area of 0.99967 (perfect classification is a value of 1) under the receiver operating characteristic (ROC) curve based on a database consisting of a total of 1680 fingerprint images captured from 240 fingers. The average probability of error was found to be 0.713%. Our algorithm also yields the smallest false non-match rate (FNMR) for a comparable false match rate (FMR) when compared to the well-known technique of DFB features and triangulation-based matching integrated with modeling non-linear deformation. This work represents an advance in resolving the fingerprint identification problem beyond the state-of-the-art approaches in both performance and robustness.
  • Chidambaram CHIDAMBARAM, Hugo VIEIRA NETO, Leyza Elmeri Baldo DORINI, ...
    原稿種別: PAPER
    専門分野: Image Recognition, Computer Vision
    2014 年 E97.D 巻 6 号 p. 1614-1623
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Face recognition plays an important role in security applications, but in real-world conditions face images are typically subject to issues that compromise recognition performance, such as geometric transformations, occlusions and changes in illumination. Most face detection and recognition works to date deal with single face images using global features and supervised learning. Differently from that context, here we propose a multiple face recognition approach based on local features which does not rely on supervised learning. In order to deal with multiple face images under varying conditions, the extraction of invariant and discriminative local features is achieved by using the SURF (Speeded-Up Robust Features) approach, and the search for regions from which optimal features can be extracted is done by an improved ABC (Artificial Bee Colony) algorithm. Thresholds and parameters for SURF and improved ABC algorithms are determined experimentally. The approach was extensively assessed on 99 different still images - more than 400 trials were conducted using 20 target face images and still images under different acquisition conditions. Results show that our approach is promising for real-world face recognition applications concerning different acquisition conditions and transformations.
  • Eun-Seok LEE, Jin-Hee LEE, Byeong-Seok SHIN
    原稿種別: PAPER
    専門分野: Computer Graphics
    2014 年 E97.D 巻 6 号 p. 1624-1633
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Massive digital elevation models require a large number of geometric primitives that exceed the throughput of the existing graphics hardware. For the interactive visualization of these datasets, several adaptive reconstruction methods that reduce the number of primitives have been introduced over the decades. Quadtree triangulation, based on subdivision of the terrain into rectangular patches at different resolutions, is the most frequently used terrain reconstruction method. This usually accomplishes the triangulation using LOD (level-of-detail) selection and crack removal based on geometric errors. In this paper, we present bimodal vertex splitting, which performs LOD selection and crack removal concurrently on a GPU. The first mode splits each vertex for LOD selection and the second splits each vertex for crack removal. By performing these two operations concurrently on a GPU, we can efficiently accelerate the rendering speed by reducing the computation time and amount of transmission data in comparison with existing quadtree-based rendering methods.
  • Hyun-Jun SHIN, Hyun-Woo JANG, Hyoung-Kyu SONG
    原稿種別: LETTER
    専門分野: Fundamentals of Information Systems
    2014 年 E97.D 巻 6 号 p. 1634-1638
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    In this letter, a cooperative scheme is proposed for the broadcasting and cellular communication system. The proposed scheme improves bit error rate (BER) performance and throughput on the edge of a cellular base station (CBS) cooperating with another CBS in the same broadcasting coverage. The proposed scheme for the enhancement of BER performance employs two schemes by a channel quality information (CQI) between a broadcasting base station (BBS) and users. In a physical area, the edge of a CBS is concatenated with the edge of another CBS. When users are on the edge of a CBS, they transmit simultaneously the CQI to CBSs, and then a BBS and CBSs transmit signals by the proposed algorithm. The two schemes apply space-time cyclic delay diversity (CDD) and a combination of space-time block code (STBC) with vertical Bell Laboratories Layered Space-Time (V-BLAST) to a signal from a BBS and CBSs. The resulting performance indicates that the proposed scheme is effective for users on the edges of CBSs.
  • Yoon Hak KIM
    原稿種別: LETTER
    専門分野: Fundamentals of Information Systems
    2014 年 E97.D 巻 6 号 p. 1639-1643
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    We consider the problem of optimizing the quantizer design for distributed estimation systems where all nodes located at different sites collect measurements and transmit quantized data to a fusion node, which then produces an estimate of the parameter of interest. For this problem, the goal is to minimize the amount of information that the nodes have to transmit in order to attain a certain application accuracy. We propose an iterative quantizer design algorithm that seeks to find a non-regular mapping between quantization partitions and their codewords so as to minimize global distortion such as the estimation error. We apply the proposed algorithm to a system where an acoustic amplitude sensor model is employed at each node for source localization. Our experiments demonstrate that a significant performance gain can be achieved by our technique as compared with standard typical designs and even with distributed novel designs recently published.
  • Seung-Jun YU, Jang-Kyun AHN, Hyoung-Kyu SONG
    原稿種別: LETTER
    専門分野: Fundamentals of Information Systems
    2014 年 E97.D 巻 6 号 p. 1644-1647
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    In this letter, an improved channel-adaptive detection scheme based on condition number combined with a QRD-M and CLLL algorithms is presented for MIMO-OFDM systems. The proposed scheme estimates the channel state by using the condition number and then the number of layers for the QRD-M is changed according to the condition number of channel. After the number of layers is determined, the proposed scheme performs the combined QRD-M and CLLL. Simulation results show that the BER curves of the proposed scheme and QRD-M using CLLL have similar performance. However, the complexity of the proposed scheme is about 27% less than QRD-M detection using CLLL.
  • Donghai TIAN, Jingfeng XUE, Changzhen HU, Xuanya LI
    原稿種別: LETTER
    専門分野: Software System
    2014 年 E97.D 巻 6 号 p. 1648-1651
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    A whitelisting approach is a promising solution to prevent unwanted processes (e.g., malware) getting executed. However, previous solutions suffer from limitations in that: 1) Most methods place the whitelist information in the kernel space, which could be tempered by attackers; 2) Most methods cannot prevent the execution of kernel processes. In this paper, we present VAW, a novel application whitelisting system by using the virtualization technology. Our system is able to block the execution of unauthorized user and kernel processes. Compared with the previous solutions, our approach can achieve stronger security guarantees. The experiments show that VAW can deny the execution of unwanted processes effectively with a little performance overhead.
  • Zhuo ZHANG, Xiaoguang MAO, Yan LEI, Peng ZHANG
    原稿種別: LETTER
    専門分野: Software Engineering
    2014 年 E97.D 巻 6 号 p. 1652-1655
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Existing fault localization approaches usually do not provide a context for developers to understand the problem. Thus, this paper proposes a novel approach using the dynamic backward slicing technique to enrich contexts for existing approaches. Our empirical results show that our approach significantly outperforms five state-of-the-art fault localization techniques.
  • Aiguo CHEN, Guangchun LUO, Jinsheng REN
    原稿種別: LETTER
    専門分野: Information Network
    2014 年 E97.D 巻 6 号 p. 1656-1660
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Establishing trust measurements among peer-to-peer (P2P) networks is fast becoming a de-facto standard, and a fair amount of work has been done in the area of trust aggregation and calculation algorithms. However, the area of developing secure underlying protocols to distribute and access the trust ratings in the overlay network has been relatively unexplored. We propose an elliptic curve-based trust management protocol for P2P systems, which is designed to provide authentication and signature functions to protect the processes of trust value query and rating report. Additionally, instead of using single identities, the protocol generates two verifiable pseudonyms, one is used for transaction, the other is applied when the peer acts as a trust holding peer. A security analysis shows that the proposed protocol is extremely secure in the face of a variety of possible attacks.
  • Ju-Ho LEE, Goo-Yeon LEE, Choong-Kyo JEONG
    原稿種別: LETTER
    専門分野: Information Network
    2014 年 E97.D 巻 6 号 p. 1661-1663
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Mobile Multi-hop Relay (MMR) technology is usually used to increase the transmission rate or to extend communication coverage. In this work, we show that MMR technology can also be used to raise the network capacity. Because Relay Stations (RS) are connected to the Base Station (BS) wirelessly and controlled by the BS, an MMR network can easily be deployed when necessary. High capacity MMR networks thus provide a good candidate solution for coping with temporary traffic surges. For the capacity enhancement of the MMR network, we suggest a novel scheme to parallelize cell transmissions while controlling the interference between transmissions. Using a numerical example for a typical network that is conformant to the IEEE 802.16j, we find that the network capacity increases by 88 percent.
  • Duck-Ho BAE, Jong-Min LEE, Sang-Wook KIM, Youngjoon WON, Yongsu PARK
    原稿種別: LETTER
    専門分野: Artificial Intelligence, Data Mining
    2014 年 E97.D 巻 6 号 p. 1664-1667
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    A burst of social network services increases the need for in-depth analysis of network activities. Privacy breach for network participants is a concern in such analysis efforts. This paper investigates structural and property changes via several privacy preserving methods (anonymization) for social network. The anonymized social network does not follow the power-law for node degree distribution as the original network does. The peak-hop for node connectivity increases at most 1 and the clustering coefficient of neighbor nodes shows 6.5 times increases after anonymization. Thus, we observe inconsistency of privacy preserving methods in social network analysis.
  • Yuhu CHENG, Xuesong WANG, Ge CAO
    原稿種別: LETTER
    専門分野: Artificial Intelligence, Data Mining
    2014 年 E97.D 巻 6 号 p. 1668-1672
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    A multi-source Tri-Training transfer learning algorithm is proposed by integrating transfer learning and semi-supervised learning. First, multiple weak classifiers are respectively trained by using both weighted source and target training samples. Then, based on the idea of co-training, each target testing sample is labeled by using trained weak classifiers and the sample with the same label is selected as the high-confidence sample to be added into the target training sample set. Finally, we can obtain a target domain classifier based on the updated target training samples. The above steps are iterated till the high-confidence samples selected at two successive iterations become the same. At each iteration, source training samples are tested by using the target domain classifier and the samples tested as correct continue with training, while the weights of samples tested as incorrect are lowered. Experimental results on text classification dataset have proven the effectiveness and superiority of the proposed algorithm.
  • Jianqiao WANG, Yuehua LI, Jianfei CHEN, Yuanjiang LI
    原稿種別: LETTER
    専門分野: Pattern Recognition
    2014 年 E97.D 巻 6 号 p. 1673-1676
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    The label estimation technique provides a new way to design semi-supervised learning algorithms. If the labels of the unlabeled data can be estimated correctly, the semi-supervised methods can be replaced by the corresponding supervised versions. In this paper, we propose a novel semi-supervised learning algorithm, called Geodesic Weighted Sparse Representation (GWSR), to estimate the labels of the unlabeled data. First, the geodesic distance and geodesic weight are calculated. The geodesic weight is utilized to reconstruct the labeled samples. The Euclidean distance between the reconstructed labeled sample and the unlabeled sample equals the geodesic distance between the original labeled sample and the unlabeled sample. Then, the unlabeled samples are sparsely reconstructed and the sparse reconstruction weight is obtained by minimizing the L1-norm. Finally, the sparse reconstruction weight is utilized to estimate the labels of the unlabeled samples. Experiments on synthetic data and USPS hand-written digit database demonstrate the effectiveness of our method.
  • Jaak SIMM, Ildefons MAGRANS DE ABRIL, Masashi SUGIYAMA
    原稿種別: LETTER
    専門分野: Pattern Recognition
    2014 年 E97.D 巻 6 号 p. 1677-1681
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Multi-task learning is an important area of machine learning that tries to learn multiple tasks simultaneously to improve the accuracy of each individual task. We propose a new tree-based ensemble multi-task learning method for classification and regression (MT-ExtraTrees), based on Extremely Randomized Trees. MT-ExtraTrees is able to share data between tasks minimizing negative transfer while keeping the ability to learn non-linear solutions and to scale well to large datasets.
  • Jung-In LEE, Jeung-Yoon CHOI, Hong-Goo KANG
    原稿種別: LETTER
    専門分野: Speech and Hearing
    2014 年 E97.D 巻 6 号 p. 1682-1685
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    There have been steady demands for a speech segmentation method to handle various speech applications. Conventional segmentation algorithms show reliable performance but they require a sufficient training database. This letter proposes a manner class segmentation method based on the acoustic event and landmark detection used in the knowledge-based speech recognition system. Measurements of sub-band abruptness and additional parameters are used to detect the acoustic events. Candidates of manner classes are segmented from the acoustic events and determined based on the knowledge of acoustic phonetics and acoustic parameters. Manners of vowel/glide, nasal, fricative, stop burst, stop closure, and silence are segmented in this system. In total, 71% of manner classes are correctly segmented with 20-ms error boundaries.
  • Jangwon CHOI, Yoonsik CHOE
    原稿種別: LETTER
    専門分野: Image Processing and Video Processing
    2014 年 E97.D 巻 6 号 p. 1686-1689
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    This letter proposes an adaptive base plane filtering algorithm for the inter-plane estimation of RGB images in HEVC RExt. Because most high-frequency components of RGB images have low inter-plane correlation, our proposed scheme adaptively removes the high-frequency components of the base plane in order to enhance the inter-plane estimation accuracy. The experimental results show that the proposed scheme provides average BD rate gains of 0.6%, 1.0%, and 1.2% in the G, B, and R planes, respectively, with slightly decreased complexity, as compared to the previous inter-plane filtering method.
  • BenJuan YANG, BenYong LIU
    原稿種別: LETTER
    専門分野: Image Processing and Video Processing
    2014 年 E97.D 巻 6 号 p. 1690-1693
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    Artificial blurring is a typical operation in image forging. Most existing image forgery detection methods consider only one single feature of artificial blurring operation. In this manuscript, we propose to adopt feature fusion, with multifeatures for artificial blurring operation in image tampering, to improve the accuracy of forgery detection. First, three feature vectors that address the singular values of the gray image matrix, correlation coefficients for double blurring operation, and image quality metrics (IQM) are extracted and fused using principal component analysis (PCA), and then a support vector machine (SVM) classifier is trained using the fused feature extracted from training images or image patches containing artificial blurring operations. Finally, the same procedures of feature extraction and feature fusion are carried out on the suspected image or suspected image patch which is then classified, using the trained SVM, into forged or non-forged classes. Experimental results show the feasibility of the proposed method for image tampering feature fusion and forgery detection.
  • Yeon-Soo LEE, Hyoung-Gyu LEE, Hae-Chang RIM, Young-Sook HWANG
    原稿種別: LETTER
    専門分野: Natural Language Processing
    2014 年 E97.D 巻 6 号 p. 1694-1698
    発行日: 2014/06/01
    公開日: 2014/06/01
    ジャーナル フリー
    In phrase-based statistical machine translation, long distance reordering problem is one of the most challenging issues when translating syntactically distant language pairs. In this paper, we propose a novel reordering model to solve this problem. In our model, reordering is affected by the overall structures of sentences such as listings, reduplications, and modifications as well as the relationships of adjacent phrases. To this end, we reflect global syntactic contexts including the parts that are not yet translated during the decoding process.
feedback
Top