Different XML formats are widely used for data exchange and processing, being often necessary to mutually convert between them. Standard XML transformation languages, like XSLT or XQuery, are unsatisfactory for this purpose since they require writing a separate transformation for each direction. Existing bidirectional transformation languages mean to cover this gap, by allowing programmers to write a single program that denotes both transformations. However, they often 1) induce a more cumbersome programming style than their traditionally unidirectional relatives, to establish the link between source and target formats, and 2) offer limited configurability, by making implicit assumptions about how modifications to both formats should be translated that may not be easy to predict. This paper proposes a bidirectional XML update language called BiFluX (BIdirectional FunctionaL Updates for XML), inspired by the Flux XML update language. Our language adopts a novel bidirectional programming by update paradigm, where a program succinctly and precisely describes how to update a source document with a target document in an intuitive way, such that there is a unique “inverse” source query for each update program. BiFluX extends Flux with bidirectional actions that describe the connection between source and target formats. We introduce a core BiFluX language, and translate it into a formally verified bidirectional update language BiGUL to guarantee a BiFluX program is well-behaved.
Development teams benefit from version control systems, which manage shared access to code repositories and persist entire project histories for analysis or recovery. Such systems will be efficient if developers commit coherent and complete change sets. These best practices, however, are difficult to follow because multiple activities often interleave without notice and existing tools impede unraveling changes before committing them. We propose an interactive, graphical tool, called Thresher, that employs adaptable scripts to support developers to group and commit changes—especially for fine-granular change tracking where numerous changes are logged even in short programming sessions. We implemented our tool in Squeak/Smalltalk and derived a foundation of scripts from five refactoring sessions. We evaluated those scripts' precision and recall, which indicate a reduced manual effort because developers can focus on project-specific adjustments. Having such an interactive approach, they can easily intervene to accurately reconstruct activities and thus follow best practices.
The steering law is a robust model for expressing the relationship between movement time and task difficulty. Recently, a corrected model to calculate the steering time difference between narrowing and widening tunnels was proposed. However, the previous work only conducted a user study with straight paths. This paper presents an investigation of steering performance in narrowing and widening circular tunnels to confirm the corrected model as being either adequate or a limitation. The results show that the steering law achieves a good fit (R2 ＞ .98) without the corrected model, thereby indicating the limited benefit of employing the corrected model.
Generative word alignment models, such as IBMModels, are restricted to one-to-many alignment, and cannot explicitly represent many-to-many relationships in bilingual texts. The problem is partially solved either by introducing heuristics or by agreement constraints such that two directional word alignments agree with each other. However, this constraint cannot take into account the grammatical difference of language pairs. In particular, function words are not trivial to align for grammatically different language pairs, such as Japanese and English. In this paper, we focus on the posterior regularization framework (Ganchev, Graca, Gillenwater, and Taskar 2010) that can force two directional word alignment models to agree with each other during training, and propose new constraints that can take into account the difference between function words and content words. We discriminate a function word and a content word using word frequency in the same way as done by Setiawan, Kan, and Li (2007). Experimental results show that our proposed constraints achieved better alignment qualities on the French-English Hansard task and the Japanese-English Kyoto free translation task (KFTT) measured by AER and F-measure. In translation evaluations, we achieved statistically significant gains in BLEU scores in the Japanese-English NTCIR10 task and Spanish-English WMT06 task.
This paper proposes a method for searching cooking recipes by a procedure such as “a tomato is fried.” Although most of methods for cooking recipe search treat recipe text as Bag-of-Words (BoW), it misdetects such a recipe that “fry an onion deeply and serve it with a tomato cube (in which the tomato is not heated).” Our method converts a procedural text to a flow-graph automatically in advance using a dependency parsing technique. In the flow-graph, action sequence that will be performed to an ingredient is easily extracted by tracing the path from the node corresponding to the ingredient to the root node corresponding to the last action. We evaluate our method comparing with a task adapted BoW model as a baseline and the proposed method achieved a precision of 68.8% while the baseline method achieved it of 61.5%.
Recognition of sarcasm in microblogging is important in a range of NLP applications, such as opinion mining. However, this is a challenging task, as the real meaning of a sarcastic sentence is the opposite of the literal meaning. Furthermore, microblogging messages are short and usually written in a free style that may include misspellings, grammatical errors, and complex sentence structures. This paper proposes a novel method for identifying sarcasm in tweets. It combines two supervised classifiers, a Support Vector Machine (SVM) using N-gram features and an SVM using our proposed features. Our features represent the intensity and contradictions of sentiment in a tweet, derived by sentiment analysis. The sentiment contradiction feature also considers coherence among multiple sentences in the tweet, and this is automatically identified by our proposed method using unsupervised clustering and an adaptive genetic algorithm. Furthermore, a method for identifying the concepts of unknown sentiment words is used to compensate for gaps in the sentiment lexicon. Our method also considers punctuation and the special symbols that are frequently used in Twitter messaging. Experiments using two datasets demonstrated that our proposed system outperformed baseline systems on one dataset, while producing comparable results on the other. Accuracy of 82% and 76% was achieved in sarcasm identification on the two datasets.
Schemas of XML documents may be updated for various reasons. If a schema is updated, then XSLT stylesheets are also affected by the schema update. To maintain the consistencies of XSLT stylesheets with updated schemas, we have to detect the XSLT rules affected by schema updates in order to determine whether the XSLT rules need to be updated accordingly. However, detecting such XSLT rules manually is a difficult and time-consuming task, since users do not always fully understand the dependencies between XSLT stylesheets and DTDs. In this paper, we consider three classes of tree transducers as subsets of XSLT, and investigate whether XSLT rules affected by DTD updates can be detected efficiently for these classes.
The surge of social media use, such as Twitter, introduces new opportunities for understanding and gauging public mood across different cultures. However, the diversity of expression in social media presents a considerable challenge to this task of opinion mining, given the limited accuracy of sentiment classification and a lack of intercultural comparisons. Previous Twitter sentiment corpora have only global polarities attached to them, which prevents deeper investigation of the mechanism underlying the expression of feelings in social media, especially the role and influence of rhetorical phenomena. To this end, we construct an annotated corpus for multilingual Twitter sentiment understanding that encompasses three languages (English, Japanese, and Chinese) and four international topics (iPhone 6, Windows 8, Vladimir Putin, and Scottish Independence); our corpus incorporates 5,422 tweets. Further, we propose a novel annotation scheme that embodies the idea of separating emotional signals and rhetorical context, which, in addition to global polarity, identifies rhetoric devices, emotional signals, degree modifiers, and subtopics. Next, to address low inter-annotator agreement in previous corpora, we propose a pivot dataset comparison method to effectively improve the agreement rate. With manually annotated rich information, our corpus can serve as a valuable resource for the development and evaluation of automated sentiment classification, intercultural comparison, rhetoric detection, etc. Finally, based on observations and our analysis of our corpus, we present three key conclusions. First, languages differ in terms of emotional signals and rhetoric devices, and the idea that cultures have different opinions regarding the same objects is reconfirmed. Second, each rhetoric device maintains its own characteristics, influences global polarity in its own way, and has an inherent structure that helps to model the sentiment that it represents. Third, the models of the expression of feelings in different languages are rather similar, suggesting the possibility of unifying multilingual opinion mining at the sentiment level.
Ideally, tree-to-tree machine translation (MT) that utilizes syntactic parse trees onboth source and target sides could preserve non-local structure, and thus generatefluent and accurate translations. In practice, however, firstly, high quality parsers forboth source and target languages are difficult to obtain; secondly, even if we havehigh quality parsers on both sides, they still can be non-isomorphic because of theannotation criterion difference between the two languages. The lack of isomorphismbetween the parse trees makes it difficult to extract translation rules. This extremelylimits the performance of tree-to-tree MT. In this article, we present an approachthat projects dependency parse trees from the language side that has a high qualityparser, to the side that has a low quality parser, to improve the isomorphism of theparse trees. We first project a part of the dependencies with high confidence to makea partial parse tree, and then complement the remaining dependencies with partialparsing constrained by the already projected dependencies. Experiments conductedon the Japanese-Chinese and English-Chinese language pairs show that our proposedmethod significantly improves the performance on both the two language pairs.