This paper proposes a response autonomous production method on chatting system for intelligence conversation. An autonomous response method needs some rules for fixed pattern talking and application ability for new expression. A response must make autonomously based on understanding of natural language sentence. So, it needs a mechanism of humanistic association and amphiboly based on grammar and rule of conversation. This paper explains a mechanism of concept association and commonsense judgement and proposes a association response method based on commonsense using the mechanism that is unrealizable only with statistical processing. The proposed response method are place association method, adjective association method and change topic association method based on place information getting from conversation situation. And then there was able to achieve an accuracy of 80%, 74% and 71%.
A suitable control of head motion in robots synchronized with its utterances is important for having a smooth human-robot interaction. Based on rules inferred from analyses of the relationship between head motion and dialog acts, this paper proposes a model for generating head tilting and evaluates the model using different types of humanoid robots. Analysis of subjective scores showed that the proposed model can generate head motion with increased naturalness compared to nodding only or directly mapping people's original motions without gaze information. We also evaluate the proposed model in a real human-robot interaction, by conducting an experiment in which participants act as visitors to an information desk attended by robots. The effects of gazing control were also taken into account when mapping the original motion to the robot. Evaluation results indicated that the proposed model performs equally to directly mapping people's original motion with gaze information, in terms of perceived naturalness
We introduce new heuristics of HTN (Hierarchical Task Network) planning for mobile robots with two arms/hands that pick and place objects among movable obstacles. Based on our new heuristics, the robot moves obstacles if necessary, picks and places the target objects without collisions. The robot chooses the (right or left) hand to use for each manipulation in order to avoid collisions and reduce the number of obstacle movements. Although we do not use detailed 2D/3D maps and specifications of arms, our heuristics roughly check the collisions. Therefore, we can produce task plans that are executable by lower-level modules. This paper also shows that replanning is effective not only for adapting to human behaviors but also for saving total plan execution time. For example, when a person removes obstacles for the robot, the robot can save time by replanning and omitting this obstacle removal. In another example, suppose that the robot is moving an object to place A, following the user's instruction. If the user changes his/her mind and reinstructs the robot to move the object to another place B, the robot does not have to move the object to A before moving it to B. In these cases, we show that the robot can save much time by task-level replanning.
Agency identification has been one of fundamental issue of Human-Agent Interaction studies. We carried out two experiments to examine what sort of behavior does make human identify the agency. And In order to examine agency identification, there was equipped an experimental environment for observing how people interpret other's behavior. The experimental environment which physically provided the interaction between human and computer was a media system that connects two sides of the experimental environment through the computer network. Therefore two persons can interact each other by using the own side's experimental environment that they can only touch and change color of the grid described to the screen. The task of experiment required participants to discriminate the other party if it was a human or a computer when they played the system. In this study, we regard attribution of humanlikeness toward other's behaviors as a sign of agency identification. The result of experiments showed that people can attributed humanlikeness toward other's behaviors when their actions were synchronized with other's actions such as rhythmical pattern and relation of spacial pattern. This result suggests that human agency identification is induced by interaction between the target entity and his/herself.
Nowadays the voice user interface using the automatic speech recognition (ASR) is used for car navigation, tourist's machine translation and information retrieval on the smart phone. Tasks of these applications are well-defined and speakers' high motivation makes them speak clearly. The full equipped domain-limited language models and the clear speaking contribute to the high accurate speech recognition. But in casual conversations, the dialogue domains are not limited and speakers' utterances are not grammatical. This is an ill-defined task for N-gram language model. Latent semantic analysis (LSA) is a technique that can cover long range semantic coherence with semantic relationship between two words in the recognition results. It does not take a word-order into account, so that it is applicable for ungrammatical utterances. It would appear to be effective to use two words with high co-occurrence as correct results, but there is a problem in that a pair of two misrecognized words may have a high co-occurrence. Since a language model such as N-gram applies short range semantic similarities, word pairs with high co-occurrence are frequently recognized together. Our method is a modified LSA. We propose using a word co-occurrence analysis with utterance pairs in order to obtain the appropriate keywords. In a conversation, because speakers talk about one common domain, a user's utterance and a system's last utterance may have high co-occurrence. The system reacts to the recognized words that have high co-occurrence with words in the system's last utterance. We applied this word co-occurrence analysis with utterance pairs to our voice interface robot. The precision rate was 43% in the original recognition system and it was improved to 71% with the word co-occurrence analysis.
In multiparty human-agent interaction, the agent should be able to properly respond to a user by determining whether an utterance is addressed to the agent or to another user. This study proposes a mechanism for identifying the addressee by using nonverbal cues including the acoustic information from the user's speech and head orientation. First, we conduct a WOZ experiment to collect human-human-agent triadic conversations, in which the agent plays a role of an information provider. Then, we analyze whether the acoustic features and head orientations are correlated with addressee-hood. We found that speech features were different depending on whom the person talks to. When people talked to the agent, they spoke with a higher tone of voice and also spoke more loudly and slowly. In addition, the subjects looked at the agent 93.2% of the time while they were talking to the agent. On the other hand, the speaker looked at his/her partner only 33.5% of the time while they were talking to one another. These results suggest that people frequently look at the addressee, but it is difficult to estimate the addressee solely based on the head direction. Based on these analyses, we propose addressee estimation models by integrating speech and head direction information using SVM, and the accuracy of the best performance model is over 80%. Then, we implement an addressee identification mechanism by integrating speech processing and face tracking. We also conduct an evaluation experiment for our addressee identification mechanism, and report that the accuracy remains over 80% if invalid speech input can be eliminated.
We propose an alternative approach to find each robotic agent's unique interaction strategies, called Immersing Discovery Method. In this approach, the human manipulator behaves as if she/he becomes the robot and finds efficient interaction strategies based on each robot's shape and modalities. We implement the system including a reconfigurable body robot, an easier manipulation system, and a recording system to evaluate the validity of our method. We evaluate a block-assembling task by the system by turning on and off the modality of the robot's head. Subsequently, the robot's motion during player's motion significantly increases whereas the ratio of confirmatory behavior significantly decreases in the head-fixed design. This result suggests that the robot leads the users and the user follows the robot in head-fixed condition not as in a normal turn-taking interaction style of the head-free condition.
In case of educational support using a partner robot, its individual robot character is considered to play an important role. Based on this consideration, we focus on the excellence property of the robot, assuming that the difference of robot excellence affects the performance of a human learner who joins a learning activity together with the robot. In this paper, we report on a field experiment that was conducted at an English learning school for Japanese children (4--8 years of age). We divided 19 participants into three experimental conditions in which we introduced a small humanoid robot with three different excellence levels one for each condition: Condition A robot was designed to be with the highest excellence level. The robot could answer to any question given by a human teacher correctly. In contrast, Condition B robot could not answer correctly at the beginning but could learn from partner humans who could take the robot by the hand and teach it step by step. Finally, Condition C robot was designed to be with the lowest excellence level. Condition C robot could neither answer to any question from beginning nor learn from partner humans. Experimental results showed that the learning performance of participants who joined a drawing-game lesson for English names of various shapes together with Condition A or Condition B robot was more increased than the case with Condition C robot. In addition, Condition B and Condition C robots were found to be more effective in motivating participants in the learning activity. Those findings would be valuable for a better future design of a partner robot whose goal is to enrich and support educational activities in classrooms.
In this research, we develop the operation information dissemination agent "smart caster 24 (TWENTY FOUR)" for the purpose of the design of the expressional model as a life-like agent which enables to watch for a long time. The smart caster 24 is a life-like agent who disseminates news information on the Internet in the form of human newscaster. We analyzed total of 90 news reading motions of 18 human active newscasters and designed the nonverbal expression model for the smart caster. In the results of news reading experiments, we found that the smart caster 24 presented same impression with human newscasters in some evaluation items.
The use of virtual conversational agents is awaited in the tutoring of physical skills such as sports or dances. This paper describes an ongoing project aiming to realize a virtual instructor for ballroom dance. First, a human-human experiment is conducted to collect the interaction corpus between a professional instructor and six learners. The verbal and non-verbal behaviors of the instructor is analyzed and served as the base of a state transition model for ballroom dance tutoring. In order to achieve natural and highly interactive instruction during the multi-modal interaction between the virtual instructor and the learner, it is necessary to divide the learner's motion into small but meaningful segments in real-time. In the case of ballroom dance, the smallest meaningful unit of dance steps is called count which should be synchronized with the beats of accompanied music. A method of automatic extraction from the learner's dance practice motion into count segments is proposed in this paper. This method is based on trajectory similarity comparison between the motion of the learner and the data collected from a professional dance instructor using Angular Matrix for Shape Similarity (AMSS). Another algorithm for identifying the worst performed portion and the corresponding improvement of the learner's motion is also proposed. Finally, a comparison experiment between the prototype system and a non-interactive self-training system was conducted and reported.
We developed a Sociable Trash Box (STB) as a children-assisted robot that able to establish asynchronous child assistance with a social rapport network for the purpose of collecting trash from a public space. In this study, we explore the interactive pattern of children, optimum interpersonal spheres, and sociopetal space by considering the dynamic interaction of children (interactive time, distance between child-robot, and interactive angle according to the STB's perspective). An experiment was conducted in a Developmental Center for Children to evaluate the validity and effectiveness of our approach through five action scenarios: move toward trash (MT-I), communicate (electronically) with other STBs to move and create a distance (MT-G), move to a crowded space without communicating with other STBs (MC-I), move to a crowded space to communicate with other STBs (MC-G), and STBs do not move and behave (NMB-I, NMB-G). We apply a model-based unsupervised Mixture Gaussian to extract the optimum spaces when the children interacted with an individual social trash box (STB) and a group of STBs.
The paper proposes a novel recommender system which supports users to clarify the most appropriate preference by recommending other categories' items that almost meet the attributes selected by users. Such an advantage is achieved by both the preference ncretization of users and the preference change of users.To investigate the effectiveness of the proposed system, we conducted the human-subject experiments and found that the proposed system supports users to find their desirable items by clarifying their preference. Concretely, the following implications have been revealed: (1) the proposed recommender system with both the serendipity and decision buttons enables users to clarify their preference by comparing items which are classified in different categories; (2) in detail, the item recommendation based on the selected item attributes contributes to clarifying the users' preference through a change of their preference, while the item recommendation based on the item characteristic contributes to clarifying the users' preference through a concretization of their preference; and (3) the proposed recommender system with the decision button succeeds the further clarification of the preference of users who have already clarified it.
Predicting entailment between two given texts is an important task on which the performance of numerous NLP tasks such as question answering, text summarization, and information extraction depend.The degree to which two texts are similar has been used extensively as a key feature in much previous work in predicting entailment. However, using similarity scores directly, without proper transformations, results in suboptimal performance. Given a set of lexical similarity measures, we propose a method that jointly learns both (a) a set of non-linear transformation functions for those similarity measures and, (b) the optimal non-linear combination of those transformation functions to predict textual entailment. Our method consistently outperforms numerous baselines, reporting a micro-averaged F-score of 46.48 on the RTE-7 benchmark dataset. The proposed method is ranked 2-nd among 33 entailment systems participated in RTE-7, demonstrating its competitiveness over numerous other entailment approaches. Although our method is statistically comparable to the current state-of-the-art, we require less external knowledge resources.
We propose two Markov chain Monte Carlo (MCMC) methods for Bayesian inference via propositionalized probability computation using binary decision diagrams (BDDs). The main advantage of our methods is that it has no restriction on logical formulas. To illustrate our methods, we first formulate LDA (latent Dirichlet allocation) which is a well-known generative probabilistic model for bag-of-words as a form of statistical abduction, and compare the learning result of our methods with that of an MCMC method called collapsed Gibbs sampling specialized for LDA. We also apply our methods to two problems, one is diagnosis for failure in a logic circuit and the other is evaluating abductive hypotheses for metabolic pathway. The experiment results show Bayesian inference using proposed methods achieves better accuracy than that of Maximum likelihood estimation.