We regarded a dialog strategy for information retrieval as a graph search problem and proposed several novel dialog strategies that can recover from misrecognition through a spoken dialog that traverses the graph. To recover from misrecognition without seeking confirmation, our system kept multiple understanding hypotheses at each turn and searched for a globally optimal hypothesis in the graph whose nodes express understanding states across user utterances in a whole dialog. In the search, we used a new criterion based on efficiency in information retrieval and consistency with understanding hypotheses, which is also used to select an appropriate system response. We showed that our system can make more efficient and natural dialogs than previous ones.
Estimation of an emotion from an uttered text is one of the most important method for natural conversations between human and robot. Most of conventional methods can only deal with a simple sentence. However, many variations of colloquial expressions are observed in natural conversations. In order to deal with such expressions, we improve the conventional emotion estimation method, which is based on the association mechanism.First, adverb and interjection are considered. The conventional method only deal with adjective and noun. However, colloquial expressions sometimes have several adverbs and interjections, and these words represent an emotion. For example, the word ``excellent'' has joyful emotion. Therefore, emotion labels are given to many adverbs and interjections, and used for emotion estimation of sentences.Second, complex grammars can be considered. The conventional method assumes that the input sentence has a simple sentence and only one ``emotional word'' is included. However, a colloquial expression may have many ``emotional words'' and may be a compound sentence or a complex sentence. In the proposed method, the sentence structure is analyzed, and the last ``emotional word'' is used for estimation because the last part is the most important in Japanese sentence. Finally, emotion estimation rules are updated.In order to investigate the effectiveness of the proposed method, emotion estimation experiments were carried out. 315 sentences were selected from utterances in a Japanese movie, and 15 evaluators gave an emotion label for each sentence. Average of agreement ratio among evaluators was 42.3%. Experimental results showed that estimation accuracy given by the proposed method was 31.4%. It was 10 points higher than the conventional estimation method, and achieved 74.2% of human ability.
Recently, computerized dialogue systems are studied actively. Non-task-oriented dialogue systems that handle domain-free dialogues like chats are expected be applied in various fields, but many challenges still exist in developing them. This paper addresses the problem of utterance generation for non-task-oriented dialogue systems. We search twitter data by topic words and acquire sentences. The sentences are filtered by rules and scored on the basis of training data. We acquire the sentences which have a high score as utterances. The results of an experiment demonstrate that the proposed method can generate appropriate utterances with a high degree of accuracy.
We have tackled a novel problem of predicting when a user is likely to begin speaking to a humanoid robot. The generality of the prediction model should be examined to apply it to various users. We show in this paper that the following two empirical evaluations. First, our proposed model does not depend on the specific participants whose data were used in our previous experiment. Second, the model can handle variations caused by individuality and instruction. We collect a data set to which 25 human participants give labels, indicating whether or not they would be likely to begin speaking to the robot. We then train a new model with the collected data and verify its performance by cross validation and open tests. We also investigate relationship of how much each human participant felt possible to begin speaking with a model parameter and instruction given to them. This shows a possibility of our model to handle such variations.
In Japanese, onomatopoeia (i.e., imitative or mimetic words) is frequently used in daily life conversation to express one's intuitive and sensitive feelings. Many onomatopoeic expressions are very similar to each other and their meanings seem to be vague and ambiguous so that it is hard to catch minute semantic differences among onomatopoeic expressions. However, we use onomatopoeia, even novel onomatopoeia, to express our subjective, intuitive and sensitive feelings in daily language use. Therefore, estimating information conveyed by onomatopoeia is inevitable in constructing a human-like intelligent communication system. In this study, we propose a system to estimate information conveyed by onomatopoeia based on Japanese sound symbolism. The existence of synesthetic associations between sounds and sensory experiences (sound symbolism) has been demonstrated over the decades. It is also known that the sensory-sound correspondence can be found not only in words referring to visual shapes, but also in those referring to tactile sensations. So our system quantifies images of inputted onomatopoeia using 43 adjective pair scales related to visual and tactile sensations. Our method hypothesizes that the impression created by an onomatopoeic expression could be predicted by the phonological characteristics of its constituent phonemes. To collect phonemic image data, we conducted a psychological experiment where 78 participants were asked to evaluate the impressions of 312 onomatopoeic expressions, which cover all kinds of Japanese phonemes, against 43 pairs of adjectives in seven-points SD scales. We applied the phonemic image data to our model, and calculated the impression values of each phoneme by making use of a mathematical quantification theory class I. This system estimates rich information conveyed by not only conventional but also newly created onomatopoeic expressions and differentiates among a variety of onomatopoeic expressions, which are frequently similar to each other. We conducted another psychological experiment in order to confirm the effectiveness of our system. Results showed that our system succeeded in evaluating information conveyed by onomatopoeia.
A novel text selection approach for training a language model (LM) with Web texts is proposed for automatic speech recognition (ASR) of spoken dialogue systems. Compared to the conventional approach based on perplexity criterion, the proposed approach introduces a semantic-level relevance measure with the back-end knowledge base used in the dialogue system. We focus on the predicate-argument (P-A) structure characteristic to the domain in order to filter semantically relevant sentences in the domain. Moreover, combination with the perplexity measure is investigated. Experimental evaluations in two different domains demonstrate the effectiveness and generality of the proposed approach. The combination method realizes significant improvement not only in ASR accuracy but also in semantic-level accuracy.
When the presence and the action of an android reach to those of human, andoroid can derive multi-modal action from human. How can human parties act with the android to organize the interaction and find the android as the social actor? We observed the development process of the play ``Three Sisters, Android Version'', and analyzed the multi-modal interaction between the android and human players in the process. As the result, the actors express the assessment of human likeness of the android with their utterances and body movements, and the border between human and machine was expressed with each modality in different way. Moreover, these expressions are not one-way product by the writer and director, but the product of repeated interactions between the actors and the android through the practice and rehearsals. Finally we discuss the possibility of ``media equation'' study using the direct observations of man-machine interaction.
In order to build a conversational agent that can control multiparty conversations with multiple users, the system is required to be able to identify who is the addressee for the current floor and control the floor based on the addressee hood information. This study proposes an addressee estimation mechanism that can estimate the addressee based on speech and head direction information. Then, we implement an autonomous multiparty conversational agent by incorporating the addressee estimation mechanism into a multiparty conversation system consisting of ASR, language understanding, conversation management, and generation mechanism. The results of our system evaluation experiment show that F-measure for judgement of addrsssing to the agent was 0.8 and to the partner was 0.56, confirming that the proposed multiparty conversational agent successfully controls the conversations with users.
Answer sentence generation is one of the important building blocks to achieve natural and smooth dialog in dialog systems. In conventional answer sentence generation, the system usually responds according to the user's topic or the information required by the user. However, having a dialog using only this information is not necessarily ideal. For example, in a persuasive dialog system that guides the user to the systems goal, only having a dialog according to the user's topic of interest may not achieve the systems goal. In this situation, it is important to be able to generate answers that guide users to topics related to the system goal. To achieve natural transitions from the current topic to the target topic, it is necessary to lead the conversation through related new topics that connect the current topic and the target topics. In this paper, we propose answer sentence generation methods for guiding users to new topics with answer templates. To effectively extract term pairs that apply to the handmade template from a term database, we take advantage of information from a concept dictionary and Web search. In addition, on the assumption that the user does not know the target topic, we prepare an explanation of each topic by hand. We build a dialog system that guides to the goal topic of the system from the input topic, as a method to evaluate if a dialog system using the proposed method can guide to the goal topic. We evaluate single answer sentences and the usage of the proposed method in a dialog system. The experimental results show the efficacy of the generated sentences.
While there have been many attempts to estimate the emotion of a speaker from her/his utterance, few studies have explored how her/his utterance affects the emotion of the listener. This has motivated us to investigate two novel tasks: predicting the emotion of the listener and generating a response that evokes a specific emotion in the listener's mind. We target Japanese Twitter posts as a source of dialogue data and automatically build training data for learning the predictors and generators. The feasibility of our approaches is assessed by using 1099 utterance-response pairs that are built by five human workers.
In this challenge, we develop and distribute an integrated environment to flexibly combine multiple text mining techniques. Text mining techniques include numerous tasks such as salient sentence extraction, keyword extraction, topic extraction, textual coherence evaluation, multi-document summarization, and text clustering. Although tools that individually perform one or more of the above-mentioned tasks exist, it is difficult to integrate and activate multiple tools for a particular task. We attempt to provide the flexibility to integrate numerous tools that exist in the community in our proposed text mining environment. Users can use a customized version of the proposed text mining environment for their specific tasks, thereby concentrating solely on their creative work.
Traditionally, the support in the event of a disaster were based on hardware, the expertise are strongly required to rescuers. For this reason, in the case that the disaster had been occured, the role of the people expressly divided to the victims, the rescuers, and the onlookers. On the other hand, in the case of the Great East Japan Earthquake, many engineers try to develope information support systems for the victims by information technologies. It can be estimated that one of the reasons to realize these systems is generally utilization of information infrastructures such as web in our daily life. After such situation is considered, we started a project ``Collaborative Heterogeneous Integration of Disaster and Rescue Information (CHIDRI)'' which adopted to Challenge for Realizing Early Profits (CREP) in Japanese Society for Artificial Intelligence. In this paper, we explain the frame, the purpose, and the activities of the challenge. Traditionally, the support in the event of a disaster were based on hardware, the expertise are strongly required to rescuers. For this reason, in the case that the disaster had been occurred, the role of the people expressly divided to the victims, the rescuers, and the onlookers. On the other hand, in the case of the Great East Japan Earthquake, many engineers tried to develop information support systems for the victims by information technologies. This fact shows that some of engineers become not onlookers but information providers. It can be estimated that one of the reasons why these systems were realized is generally utilization of information infrastructures such as web in our daily life. After such situation is considered, we started a project ``Collaborative Heterogeneous Integration of Disaster and Rescue Information (CHIDRI)'' which adopted to Challenge for Realizing Early Profits (CREP) in Japanese Society for Artificial Intelligence. In this paper, we explain the frame, the purpose, and the activities of the challenge.
In this study, we propose a reinforcement learning method for discernment behaviors of robot. Discernment behavior, which is a type of exploratory behaviors that support object feature extraction, is a fundamental tool for a robot to orientate itself, operate objects and establish higher classes of knowledge. In this method, a robot learns the discernment behaviors through the interaction with multiple objects. While the interaction, the robot takes reinforcement signal according to the cluster distance of the observed data. We validated the effectiveness of the model in a mobile robot simulation. Three different shaped objects were placed beside the robot one by one. In this learning, the robot learned different behaviors corresponding to each object. Then, we confirmed the kind of feature that is extracted from an object using learned exploratory behaviors.
In the current office environment connected to the Internet, a user tends to get a lot of notifications in the form of e-mails, micro-blogs, instant messages, application update alerts, and so on. A significant problem with such notifications is that they arrive as they are sent, i.e., without the system being aware of whether the user has time to receive them or not. If messages arrive at inopportune times, they can cause serious stress and reduce the user's productivity. One way of alleviating this problem would be to control the notification period in accordance with the user's state of activity. In other words, this means a system needs to estimate whether a user is interruptible or not and to send information only when he/she is interruptible There are a number of studies to estimate use state (e.g. monitor user behaviors like typing, operating a mouse, facial expression and so on). However, these methods has some problems. In this paper, we develop a novel method for estimating user states. Our method use a pressure on a desk. We use a lattice-like pressure sensor sheet and distinguish between two simple user states: interruptible or not. The pressure can be measured without the user being aware of it, and changes in the pressure reflect useful information such as typing, an arm resting on the desk, mouse operation, and so on. We carefully developed features that can be extracted from the sensed raw data, and we used a machine learning technique to identify the user's interruptibility. We conducted experiments for two different tasks to evaluate the accuracy of our proposed method and obtained promising results.
Methods of statistical causal discovery that use conditional independence (CI) tests are attractive due to their time efficiency and applications to latent variable systems. However, they often suffer from worse inference results induced by statistical errors in CI tests than other approaches. We considered part of these errors to be due to statistically weak violations of a usually used assumption, called the causal faithfulness condition. We propose a causal discovery algorithm that can reduce the numbers of unnecessarily performed CI tests in this study and so provide accurate and fast inference without loss of theoretical correctness. We also introduce unreliable directions, which can reduce orientation errors caused by the locality of CI tests in the algorithm. Further, simulations are provided to demonstrate the performance of the proposed algorithm for discrete probability systems and continuous linear structural equation models.
We present an intelligent tutoring system that teaches natural deduction to undergraduate students. An expert problem solver in the system provides basic instructional help, such as suggesting the use of a rule in the next step of solving a problem and indicating the inference drawn by applying the rule. The system provides help by using a complete problem solver as an expert instructor. Students learning with our tutoring system can vary the degree of help they receive (from low to high and vice versa). Empirical evaluation showed that the system enhanced the problem-solving performance of participants during the learning phase, and these performance gains were carried over to the post-test phase. The analysis of participants' interactions with the system revealed the between-participants adaptation of students, meaning that participants with lower scores learned using higher levels of assistance than those with higher scores. In addition, the analysis revealed the within-participants adaptation of students, meaning that they adaptively changed levels of support according to their learning progress and the degree of difficulty of the problem.
In this paper, we tackle the problem of reinforcement learning (RL) in a continuous state space. An appropriate discretization of the space can make many learning tasks tractable. A method using Gaussian state representation and the Rational Policy Making algorithm (RPM) has been proposed for this problem. This method discretizes the space by constructing a chain of states which represents a path to the goal of the agent exploiting past experiences of reaching it. This method exploits successful experiences strongly. Therefore, it can find a rational solution quickly in an environment with few noises. In a noisy environment, it makes many unnecessary and distractive states and does the task poorly. For learning in such an environment, we have introduced the concept of the value of a state to the above method and developed a new method. This method uses a temporal-difference (TD) learning algorithm for learning the values of states. The value of a state is used to determine the size of the state. Thus, our developed method can trim and eliminate unnecessary and distractive states quickly and learn the task well even in a noisy environment. We show the effectiveness of our method by computer simulations of a path finding task and a cart-pole swing-up task.
Many phenomena in the real world can be represented as multinomial relations, which involve multiple and heterogeneous objects. For instance, in social media, users' various actions such as adding annotations to web resources or sharing news with their friends can be represented by multinomial relations which involve multiple and heterogeneous objects such as users, documents, keywords and locations. Predicting multinomial relations would improve many fundamental applications in various domains such as online marketing, social media analyses and drug development. However, the high-dimensional property of such multinomial relations poses one fundamental challenge, that is, predicting multinomial relations with only a limited amount of data. In this paper, we propose a new multinomial relation prediction method, which is robust to data sparsity. We transform each instance of a multinomial relation into a set of binomial relations between the objects and the multinomial relation of the involved objects. We then apply an extension of a low-dimensional embedding technique to these binomial relations, which results in a generalized eigenvalue problem guaranteeing global optimal solutions. We also incorporate attribute information as side information to address the ``cold start"problem in multinomial relation prediction. Experiments with various real-world social web service datasets demonstrate that the proposed method is more robust against data sparseness as compared to several existing methods, which can only find sub-optimal solutions.
Social Network Analysis is a field of sociology, where networks, such as human relations, are investigated in order to understand roles of persons in social context. In this paper, we propose a method for this aim using the method of formal concept analysis (FCA) and ego-centric networks. FCA is a formal tool which extinguishes concepts from instances and their attributes. Our proposed method deals with persons in a network as instances and some graph structure appeared in a local network, which is called an ego-centric network, of a person as attributes. It clarifies the relationship between social nature and human relationship structure possessed by each individual in the network relationships with friends. We examined the proposal method using data acquired by a questionnaire on friendship and social attributes to students. Some groups characterized by the method had statistically significant ratio of students who possess some social attributes.
SLIM is an LMNtal runtime. LMNtal is a programming and modeling language based on hierarchical graph rewriting. SLIM features automata-based LTL model checking that is one of the methods to solve accepting cycle search problems. Parallel search algorithms OWCTY and MAP used by SLIM generate a large number of states for problems having and accepting cycles. Moreover, they have a problem that performance seriously falls for particular problems. We propose a new algorithm that combines MAP and Nested DFS to remove states for problems including accepting cycles. We experimented the algorithm and confirmed improvements both in performance and scalability.
In this paper, we report our attempt to realize analogical abduction as an extension of our work on meta level abductive framework for rule abduction and predicate invention. In our previous work, we gave a set of axioms to state the object level causalities in terms of first-order-logic (FOL) clauses, which represent direct and indirect causalities by transitive rules. Here we extend our formalism of the meta level abductive reasoning by adding rules to conduct analogical inference. We apply our analogical abduction method to the problem of explaining the difficult cello playing techniques of spiccato, rapid cross strings of the bow movement and one-bow staccato. Our method constructs persuasive analogical explanations about how to play them. We use a model of forced vibration mechanics as the base world for spiccato, the specification of the skeletal structure of the hand as the basis for the cross string bowing technique and behavioral similarity for one-bow staccato, respectively. These experimental studies suggest effectiveness of our approach to realize skill acquisition support by giving persuasive explanations of knacks provided by experts through analogical abduction.
This paper presents a high performance virtual screening method for drug design based on machine learning. In drug discovery with computers, drug designers often use docking softwares. They decide the docking between the compound and the protein with the result of docking software, structure of the compound, and any information of the compound. Currently, the performance of docking software is not high. This paper shows the machine learning method which uses the experiential knowledge of pharmaceutical researchers. This method calculates the docking possibility of compounds with high performance based on the results of the docking software and chemical information of compounds. The experiment shows our method have high-accuracy as 98.4 % and excellent ROC curve.
To understand various collective behaviors, we propose the multiplicity of interaction to each agent. In this study, we apply the asynchronous updating rule for each agent, which has a passive phase and an active phase. These two phases are determined by asynchronous update timing. Using the discrepancy between learning (passive) and recalling (active), agents in our model show a flexible and multiple collective behaviors such as searching, making signs and collective cognition. Noteworthy, although the territorial making behavior and the exploring are weak contradiction (because territories restricts their behavioral range, on the other hand, the exploring needs agents to spread all over the space), we can implement these two phenomena by one interaction rule. Our approach provides a new tool to understand the internal measurement, including such as self-organization, from the collective perspective.
Multi-view video consists of multiple views which are shot simultaneously by multiple cameras from different viewpoints. Users can watch the video while choosing a viewpoint from them freely. In the multi-view video streaming, we need to deal with a problem that the multi-view videos put an enormous burden on the network for delivering because of transmitting many videos at a same time. Hierarchical multi-view streaming(HMVS) has been proposed as one of the methods of reducing the transmission quantity of multi-view video. The prediction of switching viewpoints is necessary for bandwidth allocation in HMVS. In this paper, we propose a viewpoint switching prediction model based on viewing logs for HMVS. The proposed model divides the sequence of multi-view video into some scenes and learns the probability of switching viewpoints for each scene from viewing logs. Through the evaluation experiment, the proposed model achieved effective viewpoint switching predictions.
Pokémon is one of the most famous video games, which has more than 3.4 million players around the world. The interesting part of this game is to guess invisible information and the character of the opponent. However, existing Non Player Character (NPC) of this game is not a good alternative opponent to a human player because the NPC does not have variety of characteristics. In this paper, we propose a novel method to represent reflection - impulsivity characteristics of NPC by differences of the first stage prior distribution in Bayesian estimation used for decision-making of the NPC. In the experiment, we ask human players to take on three types of the proposed NPC and to answer the impression of those NPCs. As the result, the players feel different impressions from the three types of NPCs although they cannot identify the three types of the character (reflection - intermediate - impulsivity).