Structured overlay networks that support range queries cannot hash data IDs for load balancing, in order to preserve the total order on the IDs. Since data and queries are not equally distributed on the ID-space without hashing in range-based overlay networks, uneven loads are imposed on the overlay nodes. Existing load balancing techniques for range-based overlay networks distribute the loads by using data reallocation or node migration, which makes the networks very unstable due to heavy data reallocation or frequent churn. This paper proposes a novel scheme that distributes, fairly, the loads without node migration and with little data reallocation, by sharing some ID-space regions between neighboring nodes. Our “overlapping” ID-space management scheme derives the optimal overlap based on kernel density estimations; the query loads based on the statistical theory are used to calculate the best overlap regions. This calculation is executed in a distributed manner with no central coordinator. We conduct thorough computer simulations, and show that our scheme alleviates the worst node load by 20-90% against existing techniques without node migration and with least data reallocation.
One of the properties of context-oriented programming languages is the composition of partial module definitions. While in most such language extensions the state and behavior introduced by partial definitions are treated equally at the module level, we propose a refinement of that approach to allow for both public and restricted visibility of methods and local and shared visibility of fields in our experimental language L. Furthermore, we propose a new lookup mechanism to reduce the risk of name captures.
Live virtual machine (VM) migration (simply live migration) is a powerful tool for managing data center resources. Live migration moves a running VM between different physical machines without losing any states such as network conditions and CPU status. Live migration has attracted the attention of academic and industrial researchers since replacing running VMs inside data centers by live migration makes it easier to manage data center resources. This paper summarizes live migration basics and techniques for improving them. Specifically, this survey focuses on software mechanisms for realizing basic live migration, improving its performance, and expanding its applicability. Also, this paper shows research opportunities that the state-of-the-art live migration techniques have not covered yet.
Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract parallel sentences from them for SMT. Parallel sentence extraction relies highly on bilingual lexicons that are also very scarce. We propose an unsupervised bilingual lexicon extraction based parallel sentence extraction system that first extracts bilingual lexicons from comparable corpora and then extracts parallel sentences using the lexicons. Our bilingual lexicon extraction method is based on a combination of topic model and context based methods in an iterative process. The proposed method does not rely on any prior knowledge, and the performance can be improved iteratively. The parallel sentence extraction method uses a binary classifier for parallel sentence identification. The extracted bilingual lexicons are used for the classifier to improve the performance of parallel sentence extraction. Experiments conducted with the Wikipedia data indicate that the proposed bilingual lexicon extraction method greatly outperforms existing methods, and the extracted bilingual lexicons significantly improve the performance of parallel sentence extraction for SMT.
Traditional machine-learning-based approaches to temporal relation classification use only local features, i.e., those relating to a specific pair of temporal entities (events and temporal expressions), and thus fail to incorporate useful information that could be inferred from nearby entities. In this paper, we use timegraphs and stacked learning to perform temporal inference for classification in the temporal relation classification task. In our model, we predict a temporal relation by considering the consistency of possible relations between nearby entities. Performing 10-fold cross-validation on the Timebank corpus, we achieve an F1 score of 60.25% using a graph-based evaluation, which is 0.90 percentage points higher than that of the local approach, outperforming other proposed systems.
Sharing information can be easy, but sharing emotion is much more difficult. One approach to sharing emotion effectively is to present the receiver with past similar experiences of his/her own. This study proposes an emotion sharing model by identifying the receiver's experience that best matches that of the sender as indicated by the message being sent. Experiences are found in life log data and structured to permit quantitative comparisons. In this report, the target that the sender wants to share with the receiver consists of “contents”, “experience”, “emotion”. This paper describes how to structure experiences and identify equivalent experiences. To achieve this, it elucidates the cross-correlation relations of the equivalency of experiences and the equivalency of emotions. Trials confirmed its accuracy in terms of experience matching.
The spinal tree adjoining grammar (TAG) parsing model of [Carreras 08] achieves the current state-of-the-art constituent parsing accuracy on the commonly used English Penn Treebank evaluation setting. Unfortunately, the model has the serious drawback of low parsing efficiency since its Eisner-CKY style parsing algorithm needs O(n4) computation time for input length n. This paper investigates a more practical solution and presents a beam search shift-reduce algorithm for spinal TAG parsing. Since the algorithm works in O(bn) (b is beam width), it can be expected to provide a significant improvement in parsing speed. However, to achieve faster parsing, it needs to prune a large number of candidates in an exponentially large search space and often suffers from severe search errors. In fact, our experiments show that the basic beam search shift-reduce parser does not work well for spinal TAGs. To alleviate this problem, we extend the proposed shift-reduce algorithm with two techniques: Dynamic Programming of [Huang 10a] and Supertagging. The proposed extended parsing algorithm is about 8 times faster than the Berkeley parser, which is well-known to be fast constituent parsing software, while offering state-of-the-art performance. Moreover, we conduct experiments on the Keyaki Treebank for Japanese to show that the good performance of our proposed parser is language-independent.
In this article, we present an incremental dependency parsing algorithm with an arc-eager variant of the left-corner parsing strategy. Our algorithm's stack depth captures the center-embeddedness of the recognized dependency structure. A higher stack depth occurs only when processing deeper center-embedded sentences in which people find difficulty in comprehension. We examine whether our algorithm can capture the syntactic regularity that universally exists in languages through two kinds of experiments across treebanks of 19 languages. We first show through oracle parsing experiments that our parsing algorithm consistently requires less stack depth to recognize annotated trees relative to other algorithms across languages. This result also suggests the existence of a syntactic universal by which deeper center-embedding is a rare construction across languages, a result that has yet to be quantitatively cross-linguistically examined. We further investigate the above claim through supervised parsing experiments and show that our proposed parser is consistently less sensitive to constraints on stack depth bounds when decoding across languages, while the performance of other parsers such as the arc-eager parser is largely affected by such constraints. We thus conclude that the stack depth of our parser represents a more meaningful measure for capturing syntactic regularity in languages than those of existing parsers.
Proximity of query keyword occurrences is one important evidence which is useful for effective querybiased document scoring. If a query keyword occurs close to another in a document, it suggests high relevance of the document to the query. The simplest way to measure proximity between keyword occurrences is to use distance between them, i.e., difference of their positions. However, most web pages contain hierarchical structure composed of nested logical blocks with their headings, and it affects logical proximity. For example, if a keyword occurs in a block and another occurs in the heading of the block, we should not simply measure their proximity by their distance. This is because a heading describes the topic of the entire corresponding block, and term occurrences in a heading are strongly connected with any term occurrences in its associated block with less regard for the distance between them. Based on these observations, we developed a heading-aware proximity measure and applied it to three existing proximity-aware document scoring methods: MinDist, P6, and Span. We evaluated these existing methods and our modified methods on the data sets from TREC web tracks. The results indicate that our heading-aware proximity measure is better than the simple distance in all cases, and the method combining it with the Span method achieved the best performance.