This paper addresses the challenging problem of automating the human's intelligent ability to evaluate output from machine translation (MT) systems, which are subsystems of Speech-to-Speech MT (SSMT) systems. Conventional automatic MT evaluation methods include BLEU, which MT researchers have frequently used. BLEU is unsuitable for SSMT evaluation for two reasons. First, BLEU assesses errors lightly at the beginning or ending of translations and heavily in the middle, although the assessments should be independent from the positions. Second, BLEU lacks tolerance in accepting colloquial sentences with small errors, although such errors do not prevent us from continuing conversation. In this paper, the authors report a new evaluation method called RED that automatically grades each MT output by using a decision tree (DT). The DT is learned from training examples that are encoded by using multiple edit distances and their grades. The multiple edit distances are normal edit dista nce (ED) defined by insertion, deletion, and replacement, as well as extensions of ED. The use of multiple edit distances allows more tolerance than either ED or BLEU. Each evaluated MT output is assigned a grade by using the DT. RED and BLEU were compared for the task of evaluating SSMT systems, which have various performances, on a spoken language corpus, ATR's Basic Travel Expression Corpus (BTEC). Experimental results showed that RED significantly outperformed BLEU.
In this paper, we propose a new algorithm of summarization which targets a
new kind of structured contents. The structured content, which is
to be created by semantic authoring, consists of
sentenses and rhetorical relation among sentences: It is represented by a graph, where a node is a sentence
and an edge is a rhetorical relation. We simulate creating this content graph by using
news paper articles that are annotated rhetorical relations by a GDA tagset.
Our summarization method basically uses spreading activation over the
content graph, followed by
particular postprocesses to increase readability of the resultant summary. Experimental
evaluation shows our method is at least equal to or better than Lead method for
summarizing news paper articles.
Inductive Logic Programming (ILP) becomes interesting when the expressive power of first-order representation provides comprehensibility to learning result and capability to handle more complex data consisting of their relations. Nevertheless, the bottleneck for learning first-order theory is enormous hypothesis search space which causes inefficient performance by the existing learning approaches compared to the propositional approaches. This paper introduces an improved ILP approach capable of handling more efficiently a kind of data called multiple-part data, i.e., one instance of data consists of several parts as well as relations among parts. This approach tries to find hypothesis describing class of each training example by using both individual and relational characteristics of its part which is similar to finding common substructures among the complex relational instances. The multiple-part data can be found in various domains especially on Structure-Activity Relationship (SAR) studies which aim to generate hypotheses describing activities or characteristics of chemical compounds from their own structures. Each compound is composed of atoms as parts, and various kinds of bond as relations among atoms. We then apply the proposed algorithm for SAR studies by conducting experiments on two real-world datasets: mutagenicity in nitroaromatic compounds and dopamine antagonist compounds. The experiment results were compared to the previous approaches in order to show the performance of proposed approach.
In this study, analysis has been executed on the free-description answers in the questionnaire of 360-degree feedback in personnel evaluation in an existing company. The aim of this analysis was to obtain a new evaluation index for the competencies of company managers, on which the company is expected to find a way of growth by enforcing innovative managers. We analyzed and visualized the mixture of textual and the categorical data from questionnaire results in the 360-degree feedback, and followed the presented process of chance discovery. In the case we used KeyGraph, the importance of managers' "trustworthiness" was discovered first. Then, following the process, we focused attention to managers' trustworthiness. That is, we executed a new 360-degree feedback with additional questions regarding managers' trustworthiness, in order to confirm the validity of the measurement. As a result, it turned out that the correlation of a manager's trustworthiness with his/her actions to surroundings, and with the business performances were high. This discovery presented with a useful measure for effective restructuring, leading to the growth of the company, and is currently being introduced to their personnel evaluations. This new process on the mixture of textual and categorical data was essential to this novel application of chance discovery to personnel evaluation. The presented success can be regarded as due to the human-centric process of mining scenarios from the mixed data.
The symbol acquisition and manipulation abilities are one of the inherent characteristics of human beings comparing with other creatures. In this paper, based on recurrent self-organizing map and dynamics-based information processing system, we propose a dynamics based self-organizing map (DBSOM). This method enables designing a topological map using time sequence data, which causes recognition and generation of the robot motion. Using this method, we design the self-organizing symbol acquisition system and robot motion generation system for a humanoid robot. By implementing DBSOM to the robot in the real world, we realize the symbol acquisition from the experimental data and investigate the spatial property of the obtained DBSOM.
A novel approach to human-robot collaboration based on quasi-symbolic expressions is proposed. The target task is navigation in which a person with his or her eyes covered and a humanoid robot collaborate in a context-dependent manner. The robot uses a recurrent neural net with parametric bias (RNNPB) model to acquire the behavioral primitives, which are sensory-motor units, composing the whole task. The robot expresses the PB dynamics as primitives using symbolic sounds, and the person influences these dynamics through tactile sensors attached to the robot. Experiments with six participants demonstrated that the level of influence the person has on the PB dynamics is strongly related to task performance, the person's subjective impressions, and the prediction error of the RNNPB model (task stability). Simulation experiments demonstrated that the subjective impressions of the correspondence between the utterance sounds (the PB values) and the motions were well reproduced by the rehearsal of the RNNPB model.
In this paper, we propose a novel kernel computation algorithm between time-series human motion data for online action recognition. The proposed kernel is based on probabilistic models called switching linear dynamics (SLDs). SLD is one of the powerful tools for tracking, analyzing and classifying human complex time-series motion. The proposed kernel incorporates information about the latent variables in SLDs. The empirical evaluation using real motion data shows that a classifier using SVM with our proposed kernel has much better performance than the classifiers with some conventional kernel techniques. Another experimental result using kernel principal component analysis shows that the proposed kernel has excellent performance in extracting and separating different action categories, such as walking and running.
Research on human-robot interaction is getting an increasing amount of attention. Since most research has dealt with communication between one robot and one person, quite few researchers have studied communication between a robot and multiple people. This paper presents a method that enables robots to communicate with multiple people using the ``selection priority of the interactive partner'' based on the concept of Proxemics. In this method, a robot changes active sensory-motor modalities based on the interaction distance between itself and a person. Our method was implemented into a humanoid robot, SIG2. SIG2 has various sensory-motor modalities to interact with humans. A demonstration of SIG2 showed that our method selected an appropriate interaction partner during interaction with multiple people.
If a dialog system can respond to the user as reasonably as a human, the interaction will become smoother. Timing of the response such as back-channels and turn-taking plays an important role in such a smooth dialog as in human-human interaction. We developed a response timing generator for such a dialog system. This generator uses a decision tree to detect the timing based on the features coming from some prosodic and linguistic information. The timing generator decides the action of the system at every 100 ms during the user's pause. In this paper, we describe a robust spoken dialog system using the timing generator. Subjective evaluation proved that almost all of the subjects experienced a friendly feeling from the system.
Environmental sounds are very helpful in understanding environmental situations and in telling the approach of danger, and sound-imitation words (sound-related onomatopoeia) are important expressions to inform such sounds in human communication, especially in Japanese language. In this paper, we design a method to recognize sound-imitation words (SIWs) for environmental sounds. Critical issues in recognizing SIW are how to divide an environmental sound into recognition units and how to resolve representation ambiguity of the sounds. To solve these problems, we designed three-stage procedure that transforms environmental sounds into sound-imitation words, and phoneme group expressions that can represent ambiguous sounds. The three-stage procedure is as follows: (1) a whole waveform is divided into some chunks, (2) the chunks are transformed into sound-imitation syllables by phoneme recognition, (3) a sound-imitation word is constructed from sound-imitation syllables according to the requirements of the Japanese language. Ambiguity problem is that an environmental sound is often recognized differently by different listeners even under the same situation. Phoneme group expressions are new phonemes for environmental sounds, and they can express multiple sound-imitation words by one word. We designed two sets of phoneme groups: ``a set of basic phoneme group'' and ``a set of articulation-based phoneme group'' to absorb the ambiguity. Based on subjective experiments, the set of basic phoneme groups proved more appropriate to represent environmental sounds than the articulation-based one or a set of normal Japaneses phonemes.
The careful observation of motion phenomena is important in understanding the skillful human motion. However, this is a difficult task due to the complexities in timing when dealing with the skilful control of anatomical structures. To investigate the dexterity of human motion, we decided to concentrate on timing with respect to motion, and we have proposed a method to extract the peak timing synergy from multivariate motion data. The peak timing synergy is defined as a frequent ordered graph with time stamps, which has nodes consisting of turning points in motion waveforms. A proposed algorithm, PRESTO automatically extracts the peak timing synergy. PRESTO comprises the following 3 processes: (1) detecting peak sequences with polygonal approximation; (2) generating peak-event sequences; and (3) finding frequent peak-event sequences using a sequential pattern mining method, generalized sequential patterns (GSP). Here, we measured right arm motion during the task of cello bowing and prepared a data set of the right shoulder and arm motion. We successfully extracted the peak timing synergy on cello bowing data set using the PRESTO algorithm, which consisted of common skills among cellists and personal skill differences. To evaluate the sequential pattern mining algorithm GSP in PRESTO, we compared the peak timing synergy by using GSP algorithm and the one by using filtering by reciprocal voting (FRV) algorithm as a non time-series method. We found that the support is 95 - 100% in GSP, while 83 - 96% in FRV and that the results by GSP are better than the one by FRV in the reproducibility of human motion. Therefore we show that sequential pattern mining approach is more effective to extract the peak timing synergy than non-time series analysis approach.
This study is meant to contribute, from a psychological viewpoint, to the development of a ``symbiotic'' system, an intelligent system capable of ``living with'' human beings. We approach this by examining how people interact gesturally with each other, with a special focus on breathing movements. Within this general framework, the present paper reports two experiments conducted to examine the dynamics underlying intra- and inter-personal coordination of speech articulation, hand gesture movements, and breathing movements. The results reveal similarities as well as differences between intra- and inter-personal coordination, and we discuss their implications for existing theories of motor coordination, as well as for the development of human-machine symbiotic systems.
Researchers are able to estimate what subjects attend to by using eye tracking systems. Existing approaches for analyzing eye movements are very useful to estimate attention to still objects. But they are inadequate to estimate attention to moving objects, although paying attention to moving objects is usual human behavior. Thus, we propose a novel approach and algorithm to estimate attention to moving objects more precisely. Our approach is to extract "eye tracking movements". We phrase both saccadic eye movements and smooth pursuit eye movements as "eye tracking movements". Because previous works reveal that humans often track a moving object by the eye movements when they pay attention to the object, our approach seems to be appropriate.
Our algorithm used for extracting eye tracking movements is called Path Stitching (PS) algorithm. The PS algorithm is a kind of Longest Common Subsequence (LCS) discovery algorithms using Dynamic Time Warping (DTW) as a similarity measurement procedure. Eye tracking movements can be regarded as a situation that a subsequence of eye movements' trajectory is similar to a subsequence of the target's trajectory with a small or no time-difference. Because humans track a moving object by the eye movements with locally fluctuated time difference, we think the use of DTW is very appropriate. In addition, to make outputs more comprehensible, we employ the LCS discovery approach for the PS algorithm.
The experimental results show that the PS algorithm is more able to extract eye tracking movements than the fixation extracting approach. The fixation extracting approach is the most popular and the strongest approach to analyze eye movements. Therefore, we conclude that our novel approach will advance the analysis of human behavior by the eye movements and human-computer interaction systems they utilize user's eye movements.