Success of software developping project depend on skills of developers in the teams, however, predicting such skills is not a obvious problem. In crowd sourcing services, such level of the skills is rated by the users. This paper aims to predict the rating by integrating open source software (OSS) communities and crowd soursing services. We show that the problem is reduced into the feature construction problem from OSS communities and proposes the s-index, which abstract the level of skills of the developers based on the developed projects. Specifically, we inetgrate oDesk (a crowd sourcing service) and GitHub (an OSS community), and construct prediction model by using the ratings from oDesk as a training data. The experimental result shows that our method outperforms the models without s-index for the aspect of nDCG.
Estimating the relevance or proximity between vertices in a network is a fundamental building block of network analysis and is useful in a wide range of important applications such as network-aware searches and network structure prediction. In this paper, we (1) propose to use top-kshortest-path distance as a relevance measure, and (2) design an efficient indexing scheme for answering top-k distance queries. Although many indexing methods have been developed for standard (top-1) distance queries, no methods can be directly applied to top-k distance. Therefore, we develop a new framework for top-k distance queries based on 2-hop cover and then present an efficient indexing algorithm based on the recently proposed pruned landmark labeling scheme. The scalability, efficiency and robustness of our method are demonstrated in extensive experimental results. It can construct indices from large graphs comprising millions of vertices and tens of millions of edges within a reasonable running time. Having obtained the indices, we can compute the top-k distances within a few microseconds, six orders of magnitude faster than existing methods, which require a few seconds to compute these distances. Moreover, we demonstrate the usefulness of top-k distance as a relevance measure by applying them to link prediction, the most fundamental problem in graph data mining. We emphasize that the proposed indexing method enables the first use of top-k distance for such tasks.
The Generalized Mutual Assignment Problem (GMAP) is a maximization problem in distributed environments, where multiple agents select goods under resource constraints. Distributed Lagrangian Relaxation Protocols (DisLRP) are peer-to-peer communication protocols for solving GMAP instances. In DisLRPs, agents seek a good quality upper bound for the optimal value by solving the Lagrangian dual problem, which is a convex minimization problem. Existing DisLRPs exploit a subgradient method to explore a better upper bound by updating the Lagrange multipliers (prices) of goods. While the computational complexity of the subgradient method is very low, it cannot detect the fact that an upper bound converges to the minimum. Moreover, solution oscillation sometimes occurs, which is one of the critical issues for the subgradient method. In this paper, we present a new DisLRP with a Bundle Method and refer to it as Bundle DisLRP (BDisLRP). The bundle method, which is also called the stabilized cutting planes method, has recently attracted much attention as a way to solve Lagrangian dual problems in centralized environments. We show that this method can also work in distributed environments. We experimentally compared BDisLRP with Adaptive DisLRP (ADisLRP), which is a previous protocol that exploits the subgradient method, to demonstrate that BDisLRP performed convergence faster with better quality upper bounds than ADisLRP.
In this paper, we introduce an intelligent vehicle in traffic flow where a phantom traffic jam occurs for ensuring traffic-flow stability. The intelligent vehicle shares information on the speed and gap of the leading vehicle. Furthermore, the intelligent vehicle can foresee changes in the leading vehicles through shared information and can start accelerating faster than human-driven vehicles can. We propose an intelligent vehicle model, which is a generalized Nagel-Schreckenberg model can arbitrarily set the number of leading vehicles to share information with and set maximum distance of inter-vehicle communication. We found that phantom traffic jams are suppressed by an intelligent vehicle that can share information with two or more vehicles in front and information at least 30 meters away.
Most of the existing approaches to bilingual lexicon extraction (BLE) first map words in source and target languages into a single vector space, and then measure the similarity of words across the two languages in this space. We point out that existing BLE methods suffer from the so-called hubness phenomenon; i.e., a small number of translation candidates (hub candidates) are chosen by the systems as likely translations of many source words, which consequently degrade the accuracy of extracted translations. We show that this phenomenon can be alleviated by centering the data or by using the mutual proximity measure, which are two known techniques that effectively reduce hubness in standard nearest-neighbor search settings. Our empirical evaluation shows that naive nearest-neighbor search combined with these methods outperforms a recently proposed BLE method based on label propagation.
Controlling the trade-off between the costs of disposal and losses of stock shortage have been one of the big management issue in retail stores, especially in convenient stores dealing many products of short expiration date. In this study, we aimed to evaluate the maximum gross profit and to analyze transition of waste costs and losses of purchasing opportunities in a retail store. We have developed a model of a retail store which performs simple three types of demand estimation and three types of ordering, and also created a model of customers which reflects consumer reactions to out-of-shelf situations such as brand switches and cancel purchasing by using actual Point-of-Sales and stock data. As a result of experiments by agent-based simulation, we obtained the order quantity to maximize the gross profit when the retail store made quantitative ordering. Moreover, we found that providing thresholds in minimum order amount increased gross profit and reduced losses of shortage. Finally, the combination of the demand estimation by using exponential smoothing and the threshold ordering made better performance than any other method.
Applications of product recommendation virtual agents (PRVAs) for online shopping are increasing, and various studies on how to design effective PRVAs to user's buying motivation have been done. In this paper, we try to analyze recommendation effects of PRVAs focused on the consistency between the agent's appearances and behaviors in online shopping. First, we set up a hypothesis based on the consistency of the PRVAs' appearances and behaviors, and prepare typical PRVA's appearances and suitable behaviors to them. Then we conducted within-participant experiments in which the participants evaluated their buyer motivation for all combinations of the PRVAs' appearances and the behaviors and reported their impressions on the agents. Also eye-tracking information was measured through the experiment. By investigating statistics on the participants' buying motivations and applying statistic tests to the experimental results, our hypothesis on the consistency between the PRVAs' appearances and the behaviors was supported. Furthermore, the experimental results of eye-tracking revealed some useful inspects on the users' attentions.
Teleoperation enables us to act in remote location through operated entities such as robots or virtual agents. This advantage allows us to work in places dangerous for humans or places not designed for humans such as in volcano disaster site or in narrow maintenance pipes. However, teleoperation also has a weakness, namely, several gaps (operation interface, environment, appearance, and intentionality) among ourselves and the teleoperated entities in remote. As teleoperated robots own physical bodies different from us, teleoperation requires special interfacing systems that are usually not so intuitive. Such a system requires rather long period of training for one to become familiar with it. One possible solution for this issue is to implement semi-autonomous teleoperation (SAT) facility which combines manual operation and autonomous action. With SAT, we can teleoperate remote entities in a way suitable to the teleoperated entity and to the situation in remote with the help of autonomous action. However, there is a concern that this autonomous part of SAT may decrease operators' feeling of agency and consequently, operators may lose concentration and become less efficient. To deal with this issue, we made a hypothesis that if an autonomously generated action is sufficiently suitable to the situation, operators would feel their own agency to the generated action. In this paper, we examined this hypothesis through an experiment where participants joined a conversation using a teleoperated android robot. Here we focused on autonomous generation of nodding act and evaluated operators' agency toward the generated actions. Participants listened to a speech through the teleoperated robot with different degree of autonomous motion generation. After a series of listening, they watched video recordings taken from the speaker's view that showed the teleoperated robot in action, and evaluated their agency toward the action and the appropriateness of robot's motion toward the speaker. As a result, we found that as timing of nodding become appropriate to the conversation context, the operators kept agency to robot's motion, even when the automatically generated motion and the operator's own motion were mixed. Furthermore, the robot motion which was automatically generated was evaluated more appropriate to the conversation than the operator's own motion. In conclusion, when using semi-autonomous teleoperation, if the autonomous action is appropriate to the situation, operators are able to keep agency to operated entities' acts.
Recently, prediction markets are also used for estimating preferences, whose correct answer will not be revealed even after the market is closed, and, when used for the purpose, they are called preference markets. In order to utilize a preference market for estimating the attractiveness of product concepts expressed as combinations of various attributes, two technical questions remain to be answered. Firstly, how to estimate the preference on every possible combination of the attributes under consideration based on the results from a preference market comparing only a limited number of concepts? Secondly, how to incentivize the participants in the preference market to provide their estimation truthfully? This paper, therefore, develops a new product concept evaluation system by combining preference markets with conjoint analysis, and proposes three guidelines for how to determine the payoff for prediction securities. They are (1) to determine the payoff not directly from the security prices but indirectly through a model; (2) to run multiple markets in parallel if possible and use the results from all of them when determining the payoff; and (3) to use smoothed values instead of the final values as the security prices for determining the payoff. Moreover, the proposed system is tested with a simple evolutionary game simulation.
The spinal tree adjoining grammar (TAG) parsing model of [Carreras 08] achieves the current state-of-the-art constituent parsing accuracy on the commonly used English Penn Treebank evaluation setting. Unfortunately, the model has the serious drawback of low parsing efficiency since its Eisner-CKY style parsing algorithm needs O(n4) computation time for input length n. This paper investigates a more practical solution and presents a beam search shift-reduce algorithm for spinal TAG parsing. Since the algorithm works in O(bn) (b is beam width), it can be expected to provide a significant improvement in parsing speed. However, to achieve faster parsing, it needs to prune a large number of candidates in an exponentially large search space and often suffers from severe search errors. In fact, our experiments show that the basic beam search shift-reduce parser does not work well for spinal TAGs. To alleviate this problem, we extend the proposed shift-reduce algorithm with two techniques: Dynamic Programming of [Huang 10a] and Supertagging. The proposed extended parsing algorithm is about 8 times faster than the Berkeley parser, which is well-known to be fast constituent parsing software, while offering state-of-the-art performance. Moreover, we conduct experiments on the Keyaki Treebank for Japanese to show that the good performance of our proposed parser is language-independent.
Crowdsourcing allows human intelligence tasks to be outsourced to a large number of unspecified people at low costs. However, because of the uneven ability and diligence of crowd workers, the quality of their work is also uneven and sometimes quite low. Therefore, quality control is one of the central issues in crowdsourcing research. In this paper, we address a quality control problem of enumeration tasks, in which workers are asked to enumerate as many answers satisfying certain conditions as possible. As examples of enumeration tasks, we consider text collection tasks in addition to POI collection tasks. Since workers neither necessarily provide correct answers nor provide exactly the same answers even if the answers indicate the same object because of orthographic or numerical variations, we propose a two-stage quality control method consisting of an answer clustering stage and a reliability estimation stage. The answer clustering stage with a new constrained exemplar clustering method groups answers indicating the same object into a cluster and requires a representative answer from each cluster, and then the reliability estimation stage with a modified HITS estimates the reliabilities of representative answers and removes unreliable ones. Implemented with a new constrained exemplar clustering and a modified HITS algorithm, the effectiveness of our method is demonstrated as compared to baseline methods on several real crowdsourcing datasets of POI collection tasks and text collection tasks.
This paper describes a product field failure prediction method using manufacturing parametric data where the Bayesian Network algorithm is applied. There are two technical subjects to be solved in the Bayesian Network model estimation. First, manufacturing process data may not be a normal distribution in many cases. Second, the failure ratio could be less than 1% so that the amount of failure data is much smaller than pass data. To solve these subjects, we have proposed to use binning method and parameter selection method. The binning method divides the pass data to equalized data size for each bin. This algorithm will reduce an impact of pass data in the network estimation. The parameter selection method is based on probability of observing the failure data from pass data distribution. This algorithm matches with Bayesian Network estimation algorithm called K2 algorithm. Our method is compared with another binning method which divides the data to equal interval for each bin and parameter selection method based on U test. In conclusion, our method shows higher prediction accuracy than the another method, by our experiments using actual data.
Dynamic Stacked Topic Model (DSTM) proposed here is a topic model, for analyzing the hierarchical structure and the time evolution of topics in document collections. Such document collections as news articles and scientific papers are framed hierarchical. In newspaper, for instance, an article related to the soccer is published in the sports section and that related to the election in the politics section. Furthermore, both topics and sections naturally evolve with a certain timescale. In the proposed model, to capture correlations between topics and the time sequence of topics in sections, a section is modeled as a multinomial distribution over topics based on the previous topic distribution as well as a topic assumed to be generated based on the word distribution of previous epoch. The inference and parameter estimation processes can be achieved by a stochastic EM algorithm, in which the maximum a posteriori estimation of hyperparameters and the collapsed Gibbs sampling of latent topics and sections are alternately executed. Exploring real documents also described demonstrates the effectiveness of the proposed model.
In order to develop innovative solutions in science and technology, Japan Science and Technology Agency (JST) has built J-GLOBAL knowledge (JGk), which provides papers, patents, researchers' information, technological thesaurus, and scientific data as Linked Data, which have been accumulated by JST since 1957. The total size of all datasets is about 15.7 billion triples, and the JGk website provides a SPARQL endpoint to access part of the datasets. This paper describes several issues on schema design to construct a large-scale Linked Data, and construction methods, especially for linking to external datasets, such as DBpedia Japanese. Finally, we describe performance problems and the future works.
The notion of semantic similarity between text data (e.g., words, phrases, sentences, and documents) plays an important role in natural language processing (NLP) applications such as information retrieval, classification, and extraction. Recently, word vector spaces using distributional and distributed models have become popular. Although word vectors provide good similarity measures between words, phrasal and sentential similarities derived from composition of individual words remain as a difficult problem. To solve the problem, we focus on representing and learning the semantic similarity of sentences in a space that has a higher representational power than the underlying word vector space. In this paper, we propose a new method of non-linear similarity learning for compositionality. With this method, word representations are learnedthrough the similarity learning of sentences in a high-dimensional space with implicit kernel functions, and we can obtain new word epresentations inexpensively without explicit computation of sentence vectors in the high-dimensional space. In addition, note that our approach differs from that of deep learning such as recursive neural networks (RNNs) and long short-term memory (LSTM). Our aim is to design a word representation learning which combines the embedding sentence structures in a low-dimensional space (i.e., neural networks) with non-linear similarity learning for the sentence semantics in a high-dimensional space (i.e., kernel methods). On the task of predicting the semantic similarity of two sentences (SemEval 2014, task 1), our method outperforms linear baselines, feature engineering approaches, RNNs, and achieve competitive results with various LSTM models.