IEICE Transactions on Information and Systems
Online ISSN : 1745-1361
Print ISSN : 0916-8532
Volume E91.D , Issue 6
Showing 1-32 articles out of 32 articles from the selected issue
Special Section on Human Communication III
  • Shunichi YONEMURA
    2008 Volume E91.D Issue 6 Pages 1593
    Published: June 01, 2008
    Released: July 01, 2018
    JOURNALS FREE ACCESS
    Download PDF (45K)
  • Shigeo MORISHIMA
    Type: INVITED PAPER
    Subject area: INVITED
    2008 Volume E91.D Issue 6 Pages 1594-1603
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    Dive into the Movie (DIM)” is a name of project to aim to realize a world innovative entertainment system which can provide an immersion experience into the story by giving a chance to audience to share an impression with his family or friends by watching a movie in which all audience can participate in the story as movie casts. To realize this system, several techniques to model and capture the personal characteristics instantly in face, body, gesture, hair and voice by combining computer graphics, computer vision and speech signal processing technique. Anyway, all of the modeling, casting, character synthesis, rendering and compositing processes have to be performed on real-time without any operator. In this paper, first a novel entertainment system, Future Cast System (FCS), is introduced which can create DIM movie with audience's participation by replacing the original roles' face in a pre-created CG movie with audiences' own highly realistic 3D CG faces. Then the effects of DIM movie on audience experience are evaluated subjectively. The result suggests that most of the participants are seeking for higher realism, impression and satisfaction by replacing not only face part but also body, hair and voice. The first experimental trial demonstration of FCS was performed at the Mitsui-Toshiba pavilion of the 2005 World Exposition in Aichi Japan. Then, 1,640,000 people have experienced this event during 6 months of exhibition and FCS became one of the most popular events at Expo. 2005.
    Download PDF (7625K)
  • Takayuki KURODA, Takuo SUGANUMA, Norio SHIRATORI
    Type: PAPER
    Subject area: Media Communication
    2008 Volume E91.D Issue 6 Pages 1604-1612
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    In this paper, we present a new three-dimensional (3D) virtual environment (3DVE) system named “QuViE/P”, which can enhance quality of service (QoS), that users actually feel, as good as possible when resources of computers and networks are limited. To realize this, we focus on characteristics of user's perceptual quality evaluation on 3D objects. We propose an effective QoS control scheme for QuViE/P by introducing relationships between system's internal quality parameters and user's perceptual quality parameters. This scheme can appropriately maintain the QoS of the 3DVE system and it is expected to improve convenience when using 3DVE system where resources are insufficient. We designed and implemented a prototype of QuViE/P using a multiagent framework. The experiment results show that even when the computer resource is reduced to 20% of the required amount, the proposed scheme can maintain the quality of important objects to a certain level.
    Download PDF (6101K)
  • Kaoru NAKAZONO, Saori TANAKA
    Type: PAPER
    Subject area: Media Communication
    2008 Volume E91.D Issue 6 Pages 1613-1621
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    This paper discusses the design of configurations of videophone equipment aimed at online sign interpretation. We classified interpretation services into three types of situations: on-site interpretation, partial online interpretation, and full online interpretation. For each situation, the spatial configurations of the equipment are considered keeping the issue of nonverbal signals in mind. Simulation experiments of sign interpretation were performed using these spatial configurations and the qualities of the configurations were assessed. The preferred configurations had the common characteristics that the hearing subject could see the face of his/her principal conversation partner, that is, the deaf subject. The results imply that hearing people who do not understand sign language utilize nonverbal signals for facilitating interpreter-mediated conversation.
    Download PDF (2474K)
  • Chih-Chien WANG, Shu-Chen CHANG
    Type: PAPER
    Subject area: Media Communication
    2008 Volume E91.D Issue 6 Pages 1622-1627
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    Recent developments in information technology have made it easy for people to “chat” online with others in real time, and many do so regularly. “Virtual” relationships can be attractive, especially for people with social interaction problems in the “real world”. This study examines the influence on online chat dependency of three dimensions of social anxiety: general social situation fear, negative evaluation fear, and novel social situation fear. Participants of this study were 454 college students. The survey results show that negative evaluation fear and general social situation fear are relative to online chat dependency, while novel social situation fear does not seem to be a relevant factor.
    Download PDF (1626K)
  • Hiroki MORI, Koh OHSHIMA
    Type: PAPER
    Subject area: Media Communication
    2008 Volume E91.D Issue 6 Pages 1628-1633
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    A framework for generating facial expressions from emotional states in daily conversation is described. It provides a mapping between emotional states and facial expressions, where the former is represented by vectors with psychologically-defined abstract dimensions, and the latter is coded by the Facial Action Coding System. In order to obtain the mapping, parallel data with rated emotional states and facial expressions were collected for utterances of a female speaker, and a neural network was trained with the data. The effectiveness of proposed method is verified by a subjective evaluation test. As the result, the Mean Opinion Score with respect to the suitability of generated facial expression was 3.86 for the speaker, which was close to that of hand-made facial expressions.
    Download PDF (4759K)
  • Chika NAGAOKA, Masashi KOMORI
    Type: PAPER
    Subject area: Human Information Processing
    2008 Volume E91.D Issue 6 Pages 1634-1640
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    Body movement synchrony (i. e. rhythmic synchronization between the body movements of interacting partners) has been described by subjective impressions of skilled counselors and has been considered to reflect the depth of the client-counselor relationship. This study analyzed temporal changes in body movement synchrony through a video analysis of client-counselor dialogues in counseling sessions. Four 50-minute psychotherapeutic counseling sessions were analyzed, including two negatively evaluated sessions (low evaluation groups) and two positively evaluated sessions (high evaluation groups). In addition, two 50-minute ordinary advice sessions between two high school teachers and the clients in the high rating group were analyzed. All sessions represent role-playing. The intensity of the participants' body movement was measured using a video-based system. Temporal change of body movement synchrony was analyzed using moving correlations of the intensity between the two time series. The results revealed (1) A consistent temporal pattern among the four counseling cases, though the moving correlation coefficients were higher for the high evaluation group than the low evaluation group and (2) Different temporal patterns for the counseling and advice sessions even when the clients were the same. These results were discussed from the perspective of the quality of client-counselor relationship.
    Download PDF (1480K)
  • Yuki HONGOH, Shinichi KITA, Yoshiharu SOETA
    Type: PAPER
    Subject area: Human Information Processing
    2008 Volume E91.D Issue 6 Pages 1641-1648
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    We examined how spatial disparity between the auditory and visual stimuli modulated the audio-visual (A-V) prior entry effect. Spatial and temporal proximity of multisensory stimuli are crucial factors for multisensory perception in most cases (e. g. [1], [2]). However our previous research [3], [4] suggested that this well-accepted hypothesis was not applicable to the A-V prior entry effect. In order to examine the effect of the spatial disparity on the A-V prior entry effect, six loudspeakers and two light emitting diodes (LEDs) were used as stimuli. The loudspeakers were located at 10, 25, and 90 degrees from the midline of the participants to both right and left sides. A preceding sound was presented from one of these six loudspeakers. After the preceding sound, two visual targets were presented successively at a short interval and participants judged which visual target was presented first. Two colour changeable (‘red’ or ‘green’) LEDs were used for the visual targets and participants judged the order of visual targets by their colour not by their side in order to avoid the response bias as much as possible. The visual targets were situated at 10 degrees or 25 degrees from the participants' midline to both right and left in the Experiment 1. Results showed a biased judgment that the visual target at the sound presented side was presented first. The amplitude of the A-V prior entry effect was greater when the preceding sound source was more apart from the midline of participants. This effect of spatial separation indicated that the clarity of either right or left side of the preceding sound enhanced the amplitude of the A-V prior entry effect (Experiment 2). These results challenge the belief that the spatial proximity of multisensory stimuli is a crucial factor for multisensory perception.
    Download PDF (1866K)
  • Tamami SUDO, Ken MOGI
    Type: PAPER
    Subject area: Human Information Processing
    2008 Volume E91.D Issue 6 Pages 1649-1655
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    In this study, we conducted a series of experiments using stimuli characterized by various attributes in order to understand the categorization process in an infant's pre-linguistic development. The infants are able to assign the same label to members within the same category by focusing attention on specific features or functions common to the members. The ability to categorize is likely to play an essential role in an infant's overall cognitive development. Specifically, we investigated how the infants use different strategies in the process of linguistic categorization. In one strategy, members of a single category are derived from perceptual similarities within the most representative members, i. e., the prototypical members. Alternatively, each membership is established by referring to the linguistic labels for each category provided by the caretaker, in a symbol grounding process. We found that the infant is able to employ these strategies in a flexible manner in its development. We discuss the interplay between different cognitive strategies, including the prototype effects in the infant's cognitive development and the implications for cortical mechanism involved.
    Download PDF (2414K)
  • Hirohisa KIGUCHI, Nobuhiko ASAKURA
    Type: PAPER
    Subject area: Human Information Processing
    2008 Volume E91.D Issue 6 Pages 1656-1663
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    Many studies of on-line comprehension of semantic violations have shown that the human sentence processor rapidly constructs a higher-order semantic interpretation of the sentence. What remains unclear, however, is the amount of time required to detect semantic anomalies while concatenating two words to form a phrase with very rapid stimuli presentation. We aimed to examine the time course of semantic integration in concatenating two words in phrase structure building, using magnetoencephalography (MEG). In the MEG experiment, subjects decided whether two words (a classifier and its corresponding noun), presented each for 66ms, form a semantically correct noun phrase. Half of the stimuli were matched pairs of classifiers and nouns. The other half were mismatched pairs of classifiers and nouns. In the analysis of MEG data, there were three primary peaks found at approximately 25ms (M1), 170ms (M2) and 250ms (M3) after the presentation of the target words. As a result, only the M3 latencies were significantly affected by the stimulus conditions. Thus, the present results indicate that the semantic integration in concatenating two words starts from approximately 250ms.
    Download PDF (3045K)
  • Po-Hsun CHENG, Sao-Jie CHEN, Jin-Shin LAI, Feipei LAI
    Type: PAPER
    Subject area: Interface Design
    2008 Volume E91.D Issue 6 Pages 1664-1672
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    This paper illustrates a feasible health informatics domain knowledge management process which helps gather useful technology information and reduce many knowledge misunderstandings among engineers who have participated in the IBM mainframe rightsizing project at National Taiwan University (NTU) Hospital. We design an asynchronously sharing mechanism to facilitate the knowledge transfer and our health informatics domain knowledge management process can be used to publish and retrieve documents dynamically. It effectively creates an acceptable discussion environment and even lessens the traditional meeting burden among development engineers. An overall description on the current software development status is presented. Then, the knowledge management implementation of health information systems is proposed.
    Download PDF (3867K)
  • Jeong-Sik KIM, Soo-Mi CHOI
    Type: PAPER
    Subject area: Interface Design
    2008 Volume E91.D Issue 6 Pages 1673-1680
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    We present an interactive system for cosmetic makeup of a point-based face model acquired by 3D scanners. We first enhance the texture of a face model in 3D space using low-pass Gaussian filtering, median filtering, and histogram equalization. The user is provided with a stereoscopic display and haptic feedback, and can perform simulated makeup tasks including the application of foundation, color makeup, and lip gloss. Fast rendering is achieved by processing surfels using the GPU, and we use a BSP tree data structure and a dynamic local refinement of the facial surface to provide interactive haptics. We have implemented a prototype system and evaluated its performance.
    Download PDF (7295K)
  • Hideyuki FUJITA, Masatoshi ARIKAWA
    Type: PAPER
    Subject area: Interface Design
    2008 Volume E91.D Issue 6 Pages 1681-1692
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    Our research goal is to facilitate the sharing of stories with digital photographs. Some map websites now collect stories associated with peoples' relationships to places. Users map collections of places and include their intangible emotional associations with each location along with photographs, videos, etc. Though this framework of mapping stories is important, it is not sufficiently expressive to communicate stories in a narrative fashion. For example, when the number of the mapped collections of places is particularly large, it is neither easy for viewers to interpret the map nor is it easy for the creator to express a story as a series of events in the real world. This is because each narrative, in the form of a sequence of textual narratives, a sequence of photographs, a movie, or audio is mapped to just one point. As a result, it is up to the viewer to decide which points on the map must be read, and in what order. The conventional framework is fairly suitable for mapping and expressing fragments or snapshots of a whole story and not for conveying the whole story as a narrative using the entire map as the setting. We therefore propose a new framework, Spatial Slideshow, for mapping personal photo collections and representing them as stories such as route guidances, sightseeing guidances, historical topics, fieldwork records, personal diaries, and so on. It is a fusion of personal photo mapping and photo storytelling. Each story is conveyed through a sequence of mapped photographs, presented as a synchronized animation of a map and an enhanced photo slideshow. The main technical novelty of this paper is a method for creating three-dimensional animations of photographs that induce the visual effect of motion from photo to photo. We believe that the proposed framework may have considerable significance in facilitating the grassroots development of spatial content driven by visual communication concerning real-world locations or events.
    Download PDF (6151K)
  • Kiyoshi HOSHINO
    Type: PAPER
    Subject area: Interface Design
    2008 Volume E91.D Issue 6 Pages 1693-1699
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    A new type of humanoid robot arm which can coexist and be interactive with human beings are looked for. For the purpose of implementation of human smooth and fast movement to a pneumatic robot, the author used a humanoid robot arm with pneumatic agonist-antagonist actuators as endoskeletons which has control mechanism in the stiffness of each joint, and the controllability was experimentally discussed. Using Kitamori's method to experimentally decide the control gains and using I-PD controller, three joints of the humanoid robot arm were experimentally controlled. The damping control algorithm was also adopted to the wrist joint, to modify the speed in accordance with the power. The results showed that the controllability to step-wise input was less than one degree in error to follow the target angles, and the time constant was less than one second. The simultaneous input of command to three joints was brought about the overshoot of about ten percent increase in error. The humanoid robot arm can generate the calligraphic motions, moving quickly at some times but slowly at other times, or particularly softly on some occasions but stiffly on other occasions at high accuracy.
    Download PDF (3218K)
  • Shusuke OKAMOTO, Masaru KAMADA, Tatsuhiro YONEKURA
    Type: LETTER
    Subject area: Interface Design
    2008 Volume E91.D Issue 6 Pages 1700-1703
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    This letter proposes a prototyping tool for Web-based Multiuser Online Role-Playing Game (MORPG). The design goal is to make this tool simple and powerful. The tool is comprised of a GUI editor, a translator and a runtime environment. The GUI editor is used to edit state-transition diagrams, each of which defines the behavior of the fictional characters. The state-transition diagrams are translated into C program codes, which plays the role of a game engine in RPG system. The runtime environment includes PHP, JavaScript with Ajax and HTML. So the prototype system can be played on the usual Web browser, such as Fire-fox, Safari and IE. On a click or key press by a player, the Web browser sends it to the Web server to reflect its consequence on the screens which other players are looking at. Prospected users of this tool include programming novices and schoolchildren. The knowledge or skill of any specific programming languages is not required to create state-transition diagrams. Its structure is not only suitable for the definition of a character behavior but also intuitive to help novices understand. Therefore, the users can easily create Web-based MORPG system with the tool.
    Download PDF (2124K)
Regular Section
  • Mitsuo WAKATSUKI, Etsuji TOMITA
    Type: PAPER
    Subject area: Algorithm Theory
    2008 Volume E91.D Issue 6 Pages 1704-1718
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    A deterministic pushdown automaton (dpda) having just one stack symbol is called a deterministic restricted one-counter automaton (droca). When it accepts an input by empty stack, it is called strict. This paper is concerned with a subclass of real-time strict droca's, called Szilard strict droca's, and studies the problem of identifying the subclass in the limit from positive data. The class of languages accepted by Szilard strict droca's coincides with the class of Szilard languages (or, associated languages) of strict droca's and is incomparable to each of the class of regular languages and that of simple languages. After providing some properties of languages accepted by Szilard strict droca's, we show that the class of Szilard strict droca's is polynomial time identifiable in the limit from positive data in the sense of Yokomori. This identifiability is proved by giving an exact characteristic sample of polynomial size for a language accepted by a Szilard strict droca. The class of very simple languages, which is a proper subclass of simple languages, is also proved to be polynomial time identifiable in the limit from positive data by Yokomori, but it is yet unknown whether there exists a characteristic sample of polynomial size for any very simple language.
    Download PDF (3240K)
  • Sung-Hyun SHIN, Yang-Sae MOON, Jinho KIM, Sang-Wook KIM
    Type: PAPER
    Subject area: Database
    2008 Volume E91.D Issue 6 Pages 1719-1729
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    In recent years, a horizontal table with a large number of attributes is widely used in OLAP or e-business applications to analyze multidimensional data efficiently. For efficient storing and querying of horizontal tables, recent works have tried to transform a horizontal table to a traditional vertical table. Existing works, however, have the drawback of not considering an optimized PIVOT operation provided (or to be provided) in recent commercial RDBMSs. In this paper we propose a formal approach that exploits the optimized PIVOT operation of commercial RDBMSs for storing and querying of horizontal tables. To achieve this goal, we first provide an overall framework that stores and queries a horizontal table using an equivalent vertical table. Under the proposed framework, we then formally define 1) a method that stores a horizontal table in an equivalent vertical table and 2) a PIVOT operation that converts a stored vertical table to an equivalent horizontal view. Next, we propose a novel method that transforms a user-specified query on horizontal tables to an equivalent PIVOT-included query on vertical tables. In particular, by providing transformation rules for all five elementary operations in relational algebra as theorems, we prove our method is theoretically applicable to commercial RDBMSs. Experimental results show that, compared with the earlier work, our method reduces storage space significantly and also improves average performance by several orders of magnitude. These results indicate that our method provides an excellent framework to maximize performance in handling horizontal tables by exploiting the optimized PIVOT operation in commercial RDBMSs.
    Download PDF (5417K)
  • Yi YU, Kazuki JOE, J. Stephen DOWNIE
    Type: PAPER
    Subject area: Contents Technology and Web Information Systems
    2008 Volume E91.D Issue 6 Pages 1730-1739
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    This paper investigates suitable indexing techniques to enable efficient content-based audio retrieval in large acoustic databases. To make an index-based retrieval mechanism applicable to audio content, we investigate the design of Locality Sensitive Hashing (LSH) and the partial sequence comparison. We propose a fast and efficient audio retrieval framework of query-by-content and develop an audio retrieval system. Based on this framework, four different audio retrieval schemes, LSH-Dynamic Programming (DP), LSH-Sparse DP (SDP), Exact Euclidian LSH (E2LSH)-DP, E2LSH-SDP, are introduced and evaluated in order to better understand the performance of audio retrieval algorithms. The experimental results indicate that compared with the traditional DP and the other three compititive schemes, E2LSH-SDP exhibits the best tradeoff in terms of the response time, retrieval accuracy and computation cost.
    Download PDF (3343K)
  • Chen-Sung CHANG
    Type: PAPER
    Subject area: Artificial Intelligence and Cognitive Science
    2008 Volume E91.D Issue 6 Pages 1740-1747
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    This paper presents a real-time decision support system (RDSS) based on artificial intelligence (AI) for voltage collapse avoidance (VCA) in power supply networks. The RDSS scheme employs a fuzzy hyperrectangular composite neural network (FHRCNN) to carry out voltage risk identification (VRI). In the event that a threat to the security of the power supply network is detected, an evolutionary programming (EP)-based algorithm is triggered to determine the operational settings required to restore the power supply network to a secure condition. The effectiveness of the RDSS methodology is demonstrated through its application to the American Electric Power Provider System (AEP, 30-bus system) under various heavy load conditions and contingency scenarios. In general, the numerical results confirm the ability of the RDSS scheme to minimize the risk of voltage collapse in power supply networks. In other words, RDSS provides Power Provider Enterprises (PPEs) with a viable tool for performing on-line voltage risk assessment and power system security enhancement functions.
    Download PDF (2070K)
  • Xiao-Dong WANG, Keikichi HIROSE, Jin-Song ZHANG, Nobuaki MINEMATSU
    Type: PAPER
    Subject area: Pattern Recognition
    2008 Volume E91.D Issue 6 Pages 1748-1755
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    A method was developed for automatic recognition of syllable tone types in continuous speech of Mandarin by integrating two techniques, tone nucleus modeling and neural network classifier. The tone nucleus modeling considers a syllable F0 contour as consisting of three parts: onset course, tone nucleus, and offset course. Two courses are transitions from/to neighboring syllable F0 contours, while the tone nucleus is intrinsic part of the F0 contour. By viewing only the tone nucleus, acoustic features less affected by neighboring syllables are obtained. When using the tone nucleus modeling, automatic detection of tone nucleus comes crucial. An improvement was added to the original detection method. Distinctive acoustic features for tone types are not limited to F0 contours. Other prosodic features, such as waveform power and syllable duration, are also useful for tone recognition. Their heterogeneous features are rather difficult to be handled simultaneously in hidden Markov models (HMM), but are easy in neural networks. We adopted multi-layer perception (MLP) as a neural network. Tone recognition experiments were conducted for speaker dependent and independent cases. In order to show the effect of integration, experiments were conducted also for two baselines: HMM classifier with tone nucleus modeling, and MLP classifier viewing entire syllable instead of tone nucleus. The integrated method showed 87.1% of tone recognition rate in speaker dependent case, and 80.9% in speaker independent case, which was about 10% relative error reduction as compared to the baselines.
    Download PDF (3266K)
  • Seiji HOTTA
    Type: PAPER
    Subject area: Pattern Recognition
    2008 Volume E91.D Issue 6 Pages 1756-1763
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    A family of linear subspace classifiers called local subspace classifier (LSC) outperforms the k-nearest neighbor rule (kNN) and conventional subspace classifiers in handwritten digit classification. However, LSC suffers very high sensitivity to image transformations because it uses projection and the Euclidean distances for classification. In this paper, I present a combination of a local subspace classifier (LSC) and a tangent distance (TD) for improving accuracy of handwritten digit recognition. In this classification rule, we can deal with transform-invariance easily because we are able to use tangent vectors for approximation of transformations. However, we cannot use tangent vectors in other type of images such as color images. Hence, kernel LSC (KLSC) is proposed for incorporating transform-invariance into LSC via kernel mapping. The performance of the proposed methods is verified with the experiments on handwritten digit and color image classification.
    Download PDF (4044K)
  • Heiga ZEN, Tomoki TODA, Keiichi TOKUDA
    Type: PAPER
    Subject area: Speech and Hearing
    2008 Volume E91.D Issue 6 Pages 1764-1773
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    We describe a statistical parametric speech synthesis system developed by a joint group from the Nagoya Institute of Technology (Nitech) and the Nara Institute of Science and Technology (NAIST) for the annual open evaluation of text-to-speech synthesis systems named Blizzard Challenge 2006. To improve our 2005 system (Nitech-HTS 2005), we investigated new features such as mel-generalized cepstrum-based line spectral pairs (MGC-LSPs), maximum likelihood linear transform (MLLT), and a full covariance global variance (GV) probability density function (pdf). A combination of mel-cepstral coefficients, MLLT, and full covariance GV pdf scored highest in subjective listening tests, and the 2006 system performed significantly better than the 2005 system. The Blizzard Challenge 2006 evaluations show that Nitech-NAIST-HTS 2006 is competitive even when working with relatively large speech databases.
    Download PDF (4209K)
  • Yutaka TSUBOI, Takehiro IHARA, Kazuyuki TAKAGI, Kazuhiko OZEKI
    Type: PAPER
    Subject area: Speech and Hearing
    2008 Volume E91.D Issue 6 Pages 1774-1782
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    A solution to the problem of improving robustness to noise in automatic speech recognition is presented in the framework of multi-band, multi-SNR, and multi-path approaches. In our word recognizer, the whole frequency band is divided into seven-overlapped subbands, and then sub-band noisy phoneme HMMs are trained on speech data mixed with the filtered white Gaussian noise at multiple SNRs. The acoustic model of a word is built as a set of concatenations of clean and noisy sub-band phoneme HMMs arranged in parallel. A Viterbi decoder allows a search path to transit to another SNR condition at a phoneme boundary. The recognition scores of the sub-bands are then recombined to give the score for a word. Experiments show that the overlapped seven-band system yields the best performance under nonstationary ambient noises. It is also shown that the use of filtered white Gaussian noise is advantageous for training noisy phoneme HMMs.
    Download PDF (2530K)
  • Yoshihiko HASHIDUME, Yoshitaka MORIKAWA, Shuichi MAKI
    Type: PAPER
    Subject area: Image Processing and Video Processing
    2008 Volume E91.D Issue 6 Pages 1783-1792
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    In this paper, we investigate minimum mean absolute error (mmae) predictors for lossless image coding. In some prediction-based lossless image coding systems, coding performance depends largely on the efficiency of predictors. In this case, minimum mean square error (mmse) predictors are often used. Generally speaking, these predictors have a problem that outliers departing very far from a regression line are conspicuous enough to obscure inliers. That is, in image compression, large prediction errors near edges cause the degradation of the prediction accuracy of flat areas. On the other hand, mmae predictors are less sensitive to edges and provide more accurate prediction for flat areas than mmse predictors. At the same time, the prediction accuracy of edge areas is brought down. However, the entropy of the prediction errors based on mmae predictors is reduced compared with that of mmse predictors because general images mainly consist of flat areas. In this study, we adopt the Laplacian and the Gaussian function models for prediction errors based on mmae and mmse predictors, respectively, and show that mmae predictors outperform conventional mmse-based predictors including weighted mmse predictors in terms of coding performance.
    Download PDF (4335K)
  • Al MANSUR, Yoshinori KUNO
    Type: PAPER
    Subject area: Image Recognition, Computer Vision
    2008 Volume E91.D Issue 6 Pages 1793-1803
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    Service robots need to be able to recognize and identify objects located within complex backgrounds. Since no single method may work in every situation, several methods need to be combined and robots have to select the appropriate one automatically. In this paper we propose a scheme to classify situations depending on the characteristics of the object of interest and user demand. We classify situations into four groups and employ different techniques for each. We use Scale-invariant feature transform (SIFT), Kernel Principal Components Analysis (KPCA) in conjunction with Support Vector Machine (SVM) using intensity, color, and Gabor features for five object categories. We show that the use of appropriate features is important for the use of KPCA and SVM based techniques on different kinds of objects. Through experiments we show that by using our categorization scheme a service robot can select an appropriate feature and method, and considerably improve its recognition performance. Yet, recognition is not perfect. Thus, we propose to combine the autonomous method with an interactive method that allows the robot to recognize the user request for a specific object and class when the robot fails to recognize the object. We also propose an interactive way to update the object model that is used to recognize an object upon failure in conjunction with the user's feedback.
    Download PDF (8217K)
  • Noritaka OSAWA
    Type: PAPER
    Subject area: Computer Graphics
    2008 Volume E91.D Issue 6 Pages 1804-1812
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    Three-dimensional visualization using jigsaw-puzzle-like glyphs, or shapes, is proposed as a means of representing grammatical constraints in programming. The proposed visualization uses 3D glyphs such as convex, concave, and wireframe shapes. A semantic constraint, such as a type constraint in an assignment, is represented by an inclusive match between 3D glyphs. An application of the proposed visualization method to a subset of the Java programming language is demonstrated. An experimental evaluation showed that the 3D glyphs are easier to learn and enable users to more quickly understand their relationships than 2D glyphs and 1D symbol sequences.
    Download PDF (6097K)
  • Shangce GAO, Wei WANG, Hongwei DAI, Fangjia LI, Zheng TANG
    Type: PAPER
    Subject area: Biocybernetics, Neurocomputing
    2008 Volume E91.D Issue 6 Pages 1813-1823
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    Both the clonal selection algorithm (CSA) and the ant colony optimization (ACO) are inspired by natural phenomena and are effective tools for solving complex problems. CSA can exploit and explore the solution space parallely and effectively. However, it can not use enough environment feedback information and thus has to do a large redundancy repeat during search. On the other hand, ACO is based on the concept of indirect cooperative foraging process via secreting pheromones. Its positive feedback ability is nice but its convergence speed is slow because of the little initial pheromones. In this paper, we propose a pheromone-linker to combine these two algorithms. The proposed hybrid clonal selection and ant colony optimization (CSA-ACO) reasonably utilizes the superiorities of both algorithms and also overcomes their inherent disadvantages. Simulation results based on the traveling salesman problems have demonstrated the merit of the proposed algorithm over some traditional techniques.
    Download PDF (2515K)
  • Inbok LEE
    Type: LETTER
    Subject area: Algorithm Theory
    2008 Volume E91.D Issue 6 Pages 1824-1826
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    Approximate pattern matching plays an important role in various applications. In this paper we focus on (δ,γ)-matching, where a character can differ at most δ and the sum of these errors is smaller than γ. We show how to find these matches when the pattern is transformed by yx+β, without knowing α and β in advance.
    Download PDF (454K)
  • Keehang KWON, Dae-Seong KANG
    Type: LETTER
    Subject area: Fundamentals of Software and Theory of Programs
    2008 Volume E91.D Issue 6 Pages 1827-1829
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    We propose HHWeb, an extension to LogicWeb with hereditary Harrop formulas. HHWeb extends the LogicWeb of Loke and Davison by allowing goals of the form (∃x1…∃xnD)⊃G (or equivalently ∀x1…∀xn (DG)) where D is a web page and G is a goal. This goal is intended to be solved by instantiating x1,…,xn in D by new names and then solving the resulting goal. The existential quantifications at the head of web pages are particularly flexible in controlling the visibility of names. For example, they can provide scope to functions and constants as well as to predicates. In addition, they have such simple semantics that implementation becomes more efficient. Finally, they provide a client-side interface which is useful for customizing web pages.
    Download PDF (422K)
  • Jong Kyu KIM, Nam Soo KIM
    Type: LETTER
    Subject area: Speech and Hearing
    2008 Volume E91.D Issue 6 Pages 1830-1833
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.
    Download PDF (716K)
  • Yutao DONG, Xiangzhong FANG, Jing YANG
    Type: LETTER
    Subject area: Speech and Hearing
    2008 Volume E91.D Issue 6 Pages 1834-1837
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    This letter proposes a new algorithm of refining the quantization parameter in H.264 real-time encoding. In the H.264 encoding, the quantization parameter computed according to the quadratic rate model is not accurate in meeting the target bit rate. In order to make the actual encoded bit rate closer to the target bit rate, ρ-domain rate model is introduced in our proposed quantization parameter refinement algorithm. Simulation results show that the proposed algorithm achieves obvious gain in PSNR and has stabler encoded bit rate compared to Jiang's algorithm.
    Download PDF (647K)
  • Xiao WU, Ming LI, Hongbin SUO, Yonghong YAN
    Type: LETTER
    Subject area: Music Information Processing
    2008 Volume E91.D Issue 6 Pages 1838-1840
    Published: June 01, 2008
    Released: March 01, 2010
    JOURNALS FREE ACCESS
    In this letter we focus on the task of selecting the melody track from a polyphonic MIDI file. Based on the intuition that music and language are similar in many aspects, we solve the selection problem by introducing an n-gram language model to learn the melody co-occurrence patterns in a statistical manner and determine the melodic degree of a given MIDI track. Furthermore, we propose the idea of using background model and posterior probability criteria to make modeling more discriminative. In the evaluation, the achieved 81.6% correct rate indicates the feasibility of our approach.
    Download PDF (461K)
feedback
Top