Crowdsourcing services are often used to collect a large amount of labeled data for machine learning. Although they provide us an easy way to get labels at very low cost in a short period, they have serious limitations. One of them is the variable quality of the crowd-generated data. There have been many attempts to increase the reliability of crowd-generated data and the quality of classifiers obtained from such data. However, in these problem settings, relatively few researchers have tried using expert-generated data to achieve further improvements. In this paper, we apply three models that deal with the problem of learning from crowds to this problem: a latent class model, a personal classifier model, and a data-dependent error model. We evaluate these methods against two baseline methods on a real data set to demonstrate the effectiveness of combining crowd-generated data and expert-generated data.
Eco-driving support systems (EDSSs) are divided into two types; a direct EDSS and an indirect EDSS. The former intervenes drivers' operation directly to improve fuel economy automatically and deprives drivers of an oppotunity to try to perform eco-driving. Therefore, they could not master the eco-driving technique. The indirect EDSS provides only information about results of the driving behavior to encourage spontaneous effort to improve fuel economy. The present study performed driving simulator experiments to verify the differences of influence of the two EDSSs on proficiency of the eco-driving. We discuss the results from a viewpoint of FUBEN-EKI (FUrther BENEfit of a Kind of Inconvenience), proposed as a novel system design methodology, and then clarify the relationship between the degree of actively device and the proficiency of eco-driving skill.
We tackle a novel problem to predict how likely a humanoid robot is to be talked by a user. A human speaker usually takes his/her addressee's state into consideration and chooses when to talk to the addressee; this convention can be used when a system interprets its audio input. We formulate this problem by using machine learning whose input features are a humanoid's behaviors such as its posture, motion, and utterrance. A possible application of the model is to reject environmental noises that occur at timing when a cooperative user hardly talks to a robot.
It is necessary to establish a methodology of developing multimodal interactive systems as the spread of various input and output devices. Previous MMI (MultiModal Interaction) description language used a state transition model for defining interaction and that has low maintainability and extensibility. In this paper, we design and implement a new MMI description language named MrailsScript from the software engineering point of view. MrailsScript consists of the datamodel definition that can be inherited from semantic web class definition, and the annotations of interaction task and dialogue initiative. By expanding the existing web application development framework, our Mrails framework can automatically generate prototype codes of MMI application which is based on MVC (Model-View-Controller) model. We also developed a helper application of MMI system, MrailsBuilder, which assists coding of MrailsScript and realizes multilingual capability of the target MMI system.
This paper describes optimization of the betting fraction parameter in compound reinforcement learning. Compound reinforcement learning maximizes the expected logarithm of compound returns in return-based MDPs. However, a new betting fraction parameter is introduced in order not to diverge values to negative infinity and it causes a problem of choosing the parameter. In this paper, we proposed a method to optimize the betting fraction with on-line gradient ascent in compound reinforcement learning.
Research on human interaction is done actively in recently years. We believe that mutual understanding between user and system has effective improvement in intimacy to system. In this paper, we propose the system which can promote mutual understanding by acquiring and reflecting user's evaluation tendency from interaction of a user and the system. User and the system expresses a simple symbol by turns, and user evaluates interaction at every evaluation phase. Proposal system learns the relation of interaction sequence and user's evaluation. We propose method of learning interaction sequence using All-Combinatorial N-grams. Thereby, the system gains interaction-rule which raises user's evaluation. We believe that the intimacy to the system improves by offering a better interaction for a user using the gained interaction-rule. The validity of the proposal system was shown by sensitivity evaluation.
We have developed the method for systematically generating very hard graph 3-colorability (3COL) instances to clarify phase transition mechanism of combinatorial search problems. In our method, 3COL instances can be generated by repeatedly embedding minimal unsolvable graphs (MUGs). In this paper, we find larger-scale MUGs, which are necessary for our generation method, using genetic algorithms (GAs). We also experimentally demonstrate that the computational cost to solve our 3COL instances generated by using MUGs found by GAs is of an exponential order of the number of vertices and our instances are extraordinarily harder than randomly generated instances.
The rapid spread of information and communication technology induces multiplexing of our communication space in recent years. As a result, diffusions of products in the market are changing because the communication affects our behaviors and states of own mind. It has been known that markets of products having ``network externality'' are greatly influenced by this change. Network externality is an effect that value of the product is increased by having many people. Strategy selling products that has network externality is important for some companies in competition among other companies. A solution to this problem is ``present strategy''. Because company using present strategy diffuse products in the market, this strategy is an effective marketing activity. But it is necessary to verify the effectiveness of present strategy in the multiplexing communication space because present strategy has a risk of loss of profit declines. In this paper, we analyze effectiveness of present strategy in the multiplexing communication space by using agent multi-agent simulation.
In this study, we developed a new method of the long-term market analysis by using text-mining of news articles. Using our method, we conducted extrapolation tests to predict stock price averages by 19 industry and two market averages, TOPIX and Nikkei225 for about 10 years. As a result, 8 sectors in 21 sectors (about 40%) showed over about 60% accuracy, and 15 sectors in 21 sectors (over 70%) showed over about 55% accuracy. We also developed a web system of financial text-mining based on our method for financial professionals.
A ``hub'' is an object closely surrounded by, or very similar to, many other objects in the dataset. Recent studies by Radovanoviç et al. emonstrated that in high dimensional spaces, objects close to the data centroid tend to become hubs. In this paper, we show that the family of kernels based on the graph Laplacian makes all objects in the dataset equally similar to the centroid, and thus they are expected to make less hubs when used as a similarity measure. We investigate this hypothesis using both synthetic and real-world data. It turns out that these kernels suppress hubs in some cases but not always, and the results seem to be affected by the size of the data---a factor not discussed previously. However, for the datasets in which hubs are indeed reduced by the Laplacian-based kernels, these kernels work well in classification and information retrieval tasks. This result suggests that the amount of hubs, which can be readily computed in an unsupervised fashion, can be a yardstick of whether Laplacian-based kernels work effectively for a given data.
In this paper, h-ASE(hesitation-based Artificial Subtle Expression) as a novel implementation of ASE is described, in which a robot expresses confidence in its advice to a human. Confidence in advice is one of robot's useful internal states, and it is an important goal to develop a practical and inexpensive methodology to correctly express it. To achieve this goal, we propose h-ASE in which a robot slowly hesitates by turning to a human before giving advice with low confidence. We conducted experiments to verify the effectiveness of h-ASE and investigate the influences of two independent variables, time-delay and slow-motion, with participants. As a result, we obtained promising results which shows the time-delay factor significantly contributed to h-ASE than the slow-motion factor.
Identification of language correspondences between two different sets of language data, which is individually provided by different researchers, is one of the main problems that should be addressed in the research of the world's languages matching. It will be effective for identifying the same language if a language code is assigned to any language as a unique identifier, but such assignment is not usually available for most cases. A method proposed by Wu and Matsuno enabled this identification by using two measures of language name similarity and language classification similarity, and having succeeded in searching 88% languages included in one set of language data that relate to another set of language data. The aim of this paper is to improve the accuracy of this identification by taking into account brother information in a language classification tree. After giving an overview of the method by Wu and Matsuno, we point out the problem that language name similarity and language classification similarity are not utilized effectively, that is, their method gave an inappropriate decision even if either of these two similarities has a complete matching. To address this problem, we define two kinds of new measures: one is a similarity of languages based on brother information, and the other is a language general similarity that integrates the similarities of language name and language classification. Our experimental result shows that our new method is more effective than the previous one.
Metonymy is a figure of speech, where one item's name represents another item which usually has a close relation with the first one. Metonymic expressions need to be correctly detected and interpreted because sentences including such expressions have different meanings from literal ones; computer systems may output inappropriate results in natural language processing.
In previous studies, detecting metonymies has been done mainly by taking one of the following two approaches: rule-based approach and statistical one. The former uses semantic networks and rules to interpret metonymy. The latter uses corpus-based metonymy resolution with machine learning techniques.
One of the problems of the current metonymy detection is that using mainly syntactic and semantic information may not be enough to detect metonymic expressions because it has been pointed out that metonymic expressions have relations to associative relations between words.
In this paper, we propose an associative approach for detecting them. By using associative information between words in a sentence, we train a decision tree to detect metonymic expressions in a sentence. We evaluated our method by comparing with four baseline methods based on previous studies that use a thesaurus or co-occurrence information. Experimental results show that our method has significantly better accuracy (0.83) of judging metonymic expressions than those of the baselines. It also achieves better recall (0.73), precision (0.85), and F-measure (0.79) in detecting Japanese metonymic expressions, achieving state-of-the-art performance.
This paper proposes a fast and simple unsupervised word segmentation algorithm that utilizes the local predictability of adjacent character sequences, while searching for a least-effort representation of the data. The model uses branching entropy as a means of constraining the hypothesis space, in order to efficiently obtain a solution that minimizes the length of a two-part MDL code. An evaluation with corpora in Japanese, Thai, English, and the ``CHILDES'' corpus for research in language development reveals that the algorithm achieves a F-score, comparable to that of the state-of-the-art methods in unsupervised word segmentation, in a significantly reduced computational time. In view of its capability to induce the vocabulary of large-scale corpora of domain-specific text, the method has potential to improve the coverage of morphological analyzers for languages without explicit word boundary markers. A semi-supervised word segmentation approach is also proposed, in which the word boundaries obtained through the unsupervised model are used as features for a state-of-the-art word segmentation method.