In this paper, we derive the Integer Linear Programming (ILP) formulations to obtain extractive oracle summaries to reveal the upper bound automatic score of the extractive summarization paradigm. Then, we manually evaluate the oracle summaries in terms of the pyramid method and Quality Questions to assess the validity of the oracle summaries. We evaluated three kinds of extractive oracle summaries, sentence extraction, Elementary Discourse Unit (EDU) extraction, and subtree extraction, against ROUGE and Basic Elements (BE) on Text Analysis Conference (TAC) 2009/2011 data sets. The results demonstrated the pyramid scores and automatic scores of the oracle summaries are quite high, but the linguistic quality of them is not so good. The results imply that we can generate informative summaries by extraction, but we have to improve the linguistic quality of the summaries.
We propose a method for controlling the sentence complexity in text simplification tasks. Our text simplification method is successful in translating an input sentence into a specific grade level by considering the complexities of both sentences and words. Sentence complexities are considered by adding the target grade level to an input. In contrast, a word complexity is considered by following three different methods: (a) by extending word embedding by adding a grade level as a feature, (b) by imposing a hard constraint that prohibits outputting complex words, and (c) by imposing a soft constraint that encourages outputting of simple words. Results from the experiments indicate that a soft constraint improves the performance of a text simplification task. Although the existing model that considers only the sentence complexity may control the syntactic complexity aspects like omission, it tends to generate words beyond the target complexity. Our method achieves the control of both syntactic structures and lexical complexities.
This paper proposes a masking mechanism that learns the importance of each input token and masks unnecessary tokens for relation extraction. As a feature of relation classification models, the shortest path between target entities in the dependency tree of an input sentence is often employed since it is known to well capture important information for relation classification. However, this heuristic rule is inapplicable to exceptional relation expressions such as relations that require tokens outside of the path (e.g., the possessive “s”). We handle such inflexibility by employing a novel masking mechanism that learns a masking rule of important tokens. We performed the training in an end-to-end manner by using the loss of the relation classification task without the need for additional annotations. The experimental results show that our proposed method shows better classification performance than the models with the shortest path heuristics. Furthermore, the learned masks highly correspond to the shortest paths, while capturing some important tokens outside the shortest paths such as possessive “s”.
Recognizing lexical semantic relations of word pairs, especially noun pairs, is an important task for automatic completion and expansion of lexical knowledge bases, such as WordNet, which can be used for natural language understanding. One of the promising approaches to this task is the utilization of lexico-syntactic patterns co-occurring with target word pairs, which reflect their lexical semantic relations. These pattern-based methods require co-occurrences of the target word pairs. However, this requirement is hardly satisfied because of Zipf’s law that states, which most content words occur very rarely. To solve this problem, we propose a novel unsupervised learning method to obtain word-pair embeddings that reflect co-occurring lexico-syntactic patterns. In recognizing lexical semantic relations, our method provides relational information for word pairs that do not co-occur in a corpus because the neural network generalizes the co-occurrence between word pairs and lexico-syntactic patterns. The experimental results show that our word-pair embeddings improved the performance of the state-of-the-art neural pattern-based method on the noun pairs of four datasets and successfully alleviated the co-occurrence issue.
This paper proposes a new problem for generating recipes from photo sequences and suggests a new method to more successfully achieve this, which aims to help users obtain multimedia recipes only by taking photographs. For this purpose, the output texts should include expressions with important terms that make sense as instructions. However, traditional methods proposed in “Visual storytelling” do not consider these expressions. To select expressions with important terms to describe a photo, the proposed method incorporates a retrieval method as well as a generation model. The proposed method was implemented and tested using Japanese cooking recipes. From various experimental results, it was confirmed that the new method outperforms standard baselines.
In this paper, we propose a novel model for transformer neural machine translation that incorporates syntactic distances between two source words into the relative position representations of a self-attention mechanism. In particular, the proposed model encodes pair-wise relative depths on a source dependency tree, which are the differences between the depths of two source words, in the encoder’s self-attention. Experiments show that our proposed model outperformed non-syntactic Transformer NMT baselines on the Asian Scientific Paper Excerpt Corpus Japanese-to-English and English-to-Japanese translation tasks. In particular, our proposed model achieved a 0.37 point gain in BLEU on the Japanese-to-English task.
This study tackles the task of generating market comments from stock prices. Market comments not only describe the increase and decrease of the price but also describe how the price changes compared with the previous period and contain expressions that depend on their delivery time. Additionally, market comments typically mention numerical values, such as closing prices and differences in stock prices, that need arithmetic operations such as subtraction and rounding off to derive these values. To capture these characteristics, we propose a novel encoder–decoder model to automatically generate market comments from stock prices. The model first encodes both short- and long-term series of stock prices so that it can create short- and long-term changes in stock prices. Thereafter, we feed our model with delivery time of the market comment in the decoding phase to generate time-dependent expressions. Moreover, our model can generate a numerical value by selecting an appropriate arithmetic operation, such as subtraction or rounding off, and applying it to the input stock prices. As shown in empirical experiments, our model generates high-quality market comments with fluency and informativeness in comparison with baselines.
In this paper, we propose a novel ensemble approach for event nugget detection that consists of heterogeneous encoding models to handle diversifying linguistic expressions of events in text and a dynamic ensemble method to obtain an ensemble of reliable models for each input token dynamically. From a set of comparative evaluations in subtasks, we show that our proposed method exceeds each encoding model and soft voting in F1-score. Moreover, we prove the effectiveness of our proposal by comparing our evaluation system with the results of NIST TAC KBP2016 and KBP2017 participants in F1-scores. Lastly, we consider the usefulness of our proposed method in event nugget detection through a series of discussions on applying proposed method to recent neural network models.
The aim of this paper is to investigate how grammatical information regarding information structure effects word orders. We report results of modeling distances between NPs and predicates on which they depend by the Bayesian linear mixed model, using “BCCWJ-InfoStr,” where tags regarding information structure are annotated to NPs in the text of “Balanced Corpus of Contemporary Written Japanese.” As a result, estimated word orders of noun phrases in a sentence are as follows: (I) discourse-old NPs precede discourse-new NPs, (II) hearer-old NPs precede hearer-new NPs, (III) definite NPs precede indefinite NPs, and (IV) animate NPs precede inanimate NPs. These results support “Communicative Dynamism,” “From-Old-To-New Principle,” and “Nominal Hierarchy,” referred to in the field of functional linguistics.
Time is an important concept in human-cognition, fundamental to a wide range of reasoning tasks in the clinical domain. Results of the Clinical TempEval 2016 challenge, a set of shared tasks that evaluate temporal information extraction systems in the clinical domain, indicate that current state-of-the-art systems do well in solving event and time expression identification but perform poorly in temporal relation extraction. This study aims to identify and analyze the reason(s) for this uneven performance. It adapts a general domain tree-based bidirectional long short-term memory recurrent neural network model for semantic relation extraction to the task of temporal relation extraction in the clinical domain, and tests the system in a binary and multi-class classification setting by experimenting with general and in-domain word embeddings. Its results outperform the best Clinical TempEval 2016 system and the current state-of-the-art model. However, there is still a significant gap between the system and human performance. Consequently, this study delivers a deep analysis of the results, identifying a high incidence of nouns as events and class overlapping as posing major challenges in this task.
This paper describes a domain-specific language, HaoriBricks3 (HB3), for writing programs that compose Japanese sentences. In HB3, we write a Ruby code called brick code, which instructs how to compose a sentence. The evaluation of a brick code produces the Ruby object called brick structure, from which the surface sentence string is generated by the method to_ss. This paper presents the design philosophy and implementational innovations of HB3, as well as demonstrates the applications of HB3.
The Transformer (Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, and Polosukhin 2017), which purely depends on attention mechanism, has achieved state-of-the-art performance on machine translation (MT). However, syntactic information, which has improved many previous MT models, has not been utilized explicitly by Transformer. We propose a syntax-based Transformer for MT, which incorporates source-side syntax structures generated by the parser into the self-attention and positional encoding of the encoder. Our method is general in that it is applicable to both constituent trees and packed forests. Evaluations on two language pairs show that our syntax-based Transformer outperforms the conventional (non-syntactic) Transformer. The improvements of BLEUs on English-Japanese, English-Chinese and English-German translation tasks are up to 2.32, 2.91 and 1.03, respectively. Furthermore, our ablation study and qualitative analysis demonstrate that the syntax-based self-attention does well in learning local structural information, while the syntax-based positional encoding does well in learning global structural information.
When people verbalize what they have felt with different sensory functions, they often represent different meanings such as with temperature range using the same word cold or the same meaning by using different words (e.g., hazy and cloudy). These interpersonal variations in word meanings have the effects of not only preventing people from communicating efficiently with each other but also causing troubles in natural language processing (NLP). Accordingly, to highlight interpersonal semantic variations in word meanings, a method for inducing personalized word embeddings is proposed. This method learns word embeddings from an NLP task, distinguishing each word used by different individuals. Review-target identification was adopted as a task to prevent irrelevant biases from contaminating word embeddings. The scalability and stability of inducing personalized word embeddings were improved using a residual network and independent fine-tuning for each individual through multi-task learning along with target-attribute predictions. The results of the experiments using two large scale review datasets confirmed that the proposed method was effective for estimating the target items, and the resulting word embeddings were also effective in solving sentiment analysis. By using the acquired personalized word embeddings, it was possible to reveal tendencies in semantic variations of the word meanings.