Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Current issue
Showing 1-10 articles out of 10 articles from the selected issue
Preface
Paper
  • Akira Miyazawa, Yusuke Miyao
    2019 Volume 26 Issue 2 Pages 277-300
    Published: June 15, 2019
    Released: September 15, 2019
    JOURNALS FREE ACCESS

    The automatic generation of metaphorical expressions helps us write imaginative texts such as poems or novels. This paper proposes a new metaphor generation task, evaluation metrics, and a method to solve the task. Our task is formalized as a problem of finding metaphorical paraphrases for a literal Japanese phrase consisting of a subject, an object, and a verb. We use four evaluation metrics: synonymousness, metaphoricity, novelty, and comprehensibility. Our proposed method generates metaphorical expressions by using three automatically computable scores—similarity, figurativeness, and rarity—corresponding to one of the evaluation metrics. By crowdsourcing, we show how these scores are related to those given by humans in terms of the evaluation metrics and how they are useful in finding human’s preferred expressions in pairwise comparisons.

    Download PDF (1281K)
  • Masayuki Asahara
    2019 Volume 26 Issue 2 Pages 301-327
    Published: June 15, 2019
    Released: September 15, 2019
    JOURNALS FREE ACCESS

    This paper presents a contrastive analysis between reading time and clause boundary categories in the Japanese language in order to estimate text readability. We overlaid reading time data of BCCWJ EyeTrack, and clause boundary categories annotation on the Balanced Corpus of Contemporary Written Japanese. Statistical analysis based on the Bayesian linear mixed model shows that the reading time behaviours differ among the clause boundary categories. The result does not support the wrap-up effects of clause-final words. Another result we arrived at is that the predicate-argument relations facilitate the reading speed of native Japanese speakers.

    Download PDF (2080K)
  • Ritsuko Iwai, Daisuke Kawahara, Takatsune Kumada, Sadao Kurohashi
    2019 Volume 26 Issue 2 Pages 329-359
    Published: June 15, 2019
    Released: September 15, 2019
    JOURNALS FREE ACCESS

    To communicate with humans in a human-like manner, systems need to understand behavior and psychological states in situations of human-machine interactions, such as in the cases of autonomous driving and nursing robots. We focus on driving situations as they are part of our daily lives and concern safety. To develop such systems, a corpus annotated with behavior and subjectivity in driving situations is necessary. In this study, subjectivity includes emotions, polarity, sentiments, human judgments, perceptions, and cognitions. We construct a driving experience corpus (DEC) (261 blog articles, 8,080 sentences) with four manually annotated tags. First, we annotate spans with driving experience tags (DE). Then, three tags, other’s behavior (OB), self-behavior (SB), and subjectivity (SJ), are annotated within DE spans. In addition to describing the guidelines, we present corpus specifications, agreement between annotators, and three major difficulties during the development: the extended self, important information, and voice in mind. Automatic annotation experiments were conducted on the DEC using Conditional Random Fields-based methods. On the test set, the F-scores were about .55 for both OB and SB and approximately. 75 for SJ, respectively. We provide error analysis that reveals difficulties in interpreting nominatives and differentiating behavior from subjectivity.

    Download PDF (1002K)
  • Rui Suzuki, Kanako Komiya, Masayuki Asahara, Minoru Sasaki, Hiroyuki S ...
    2019 Volume 26 Issue 2 Pages 361-379
    Published: June 15, 2019
    Released: September 15, 2019
    JOURNALS FREE ACCESS

    All-words word-sense disambiguation (all-words WSD) involves identifying the senses of all words in a document. Since a word’s sense depends on the context, such as surrounding words, similar words are believed to have similar sets of surrounding words. Therefore, we predict target word senses by calculating Euclidean distances between the target words’ surrounding word vectors and their synonyms using word embeddings. In addition, we replace word tokens in the corpus with their concept tags, that is, article numbers of the Word List by Semantic Principles using prediction results. After that, we create concept embeddings with the concept tag sequence and predict the senses of the target words using the distances between surrounding word vectors, which consist the word and concept embeddings. This paper shows that concept embedding improved the performance of Japanese All-words WSD.

    Download PDF (808K)
  • Tatsuya Aoki, Ryohei Sasano, Hiroya Takamura, Manabu Okumura
    2019 Volume 26 Issue 2 Pages 381-406
    Published: June 15, 2019
    Released: September 15, 2019
    JOURNALS FREE ACCESS

    We focus on nonstandard usages of common words on social media, where words, sometimes, are used in a totally different manner from that of their original or standard usage. In this work, we attempt to distinguish nonstandard usages on social media from standard ones in an unsupervised manner. We also constructed new Twitter dataset consisting of 40 words with nonstandard usages and then used the dataset for evaluation in an experiment. For this task, our basic idea is that nonstandard usage can be measured by the inconsistency between the target word’s expected meaning and the given context. For this purpose, we use context embeddings derived from word embeddings. Our experimental results show that the model leveraging the context embedding outperforms other methods and also provide us with findings, for example, on how to construct context embeddings, and which corpus to use.

    Download PDF (590K)
  • Chiaki Miyazaki, Satoshi Sato
    2019 Volume 26 Issue 2 Pages 407-440
    Published: June 15, 2019
    Released: September 15, 2019
    JOURNALS FREE ACCESS

    Phonological changes reflected in text can be powerful in characterizing utterances of dialogue agents or characters’ lines in narratives. To use phonological changes to automatically characterize utterances, (i) we collected phonologically changed expressions from characters’ written utterances and (ii) formalized the knowledge required to generate phonologically changed expressions. In particular, we categorized the expressions into 137 patterns by analyzing them from the points of the phenomena concerned and the environments of the occurrences. We experimentally confirmed that the patterns cover more than 80% of the phonologically changed expressions used in novels and comics. Furthermore, (iii) to investigate whether phonological change patterns can be effective in characterization, we conducted an experiment that estimated speakers (characters) of the utterances and confirmed that the information on phonological changes improved the performance of speaker estimation for several characters.

    Download PDF (1390K)
  • Takaaki Tanaka, Masaaki Nagata
    2019 Volume 26 Issue 2 Pages 441-481
    Published: June 15, 2019
    Released: September 15, 2019
    JOURNALS FREE ACCESS

    We present a novel scheme for word-based Japanese typed dependency parsing which integrates syntactic structure analysis and grammatical function analysis such as predicate-argument structure analysis. Compared to bunsetsu-based dependency parsing, which is predominantly used in Japanese NLP, it provides a natural way of extracting syntactic constituents. This makes it possible to jointly decide dependency and predicate-argument structure, which is usually implemented as two separate steps. By using grammatical functions as dependency types, we can obtain the detailed syntactic information from parsing results, while keeping the converted bunsetsu-based dependency accuracy as high as CaboCha, one of the state-of-the-art dependency parsers.

    Download PDF (1754K)
  • Mizuki Sango, Hitoshi Nishikawa, Takenobu Tokunaga
    2019 Volume 26 Issue 2 Pages 483-508
    Published: June 15, 2019
    Released: September 15, 2019
    JOURNALS FREE ACCESS

    This paper proposes introducing domain adaptation into Japanese predicate-argument structure (PAS) analysis. Our investigation of a Japanese balanced-corpus revealed that the distribution of argument types differs across text media. The difference is particularly significant when the argument is exophoric. Previous Japanese PAS analysis research has disregarded this tendency as studies have targeted mono-media corpora. This investigation begins with a PAS analyzer based on a recurrent neural network as its baseline and extends it by introducing three kinds of domain-adaptation techniques and their combinations. Evaluation experiments using a Japanese balanced-corpus (BCCWJ-PAS) confirmed the domain dependency of the PAS analysis. The domain adaptation is effective in improving the performance of the Japanese PAS analysis, especially in the the nominative case. The maximum F1 score in the QA text analysis (0.030) improved in comparison to the baseline.

    Download PDF (565K)
  • Souta Yamashiro, Hitoshi Nishikawa, Takenobu Tokunaga
    2019 Volume 26 Issue 2 Pages 509-536
    Published: June 15, 2019
    Released: September 15, 2019
    JOURNALS FREE ACCESS

    This paper presents a model for Japanese zero anaphora resolution that deals with both intra- and inter-sentential zero anaphora. Our model resolves anaphora for multiple cases simultaneously by utilising and comparing information from other cases. This simultaneous resolution requires the consideration of many combinations of antecedent candidates, which could be a crucial obstacle in both the training and resolving phases. To cope with this problem, we have proposed an effective candidate pruning method using case frame information. We compared the model, which estimates multiple cases simultaneously, by using our proposed candidate pruning method and model, which estimates each case independently without a candidate reduction method in a Japanese balanced corpus. The results confirmed a 0.056-point increase in accuracy. Furthermore, we also confirmed that the introduction of local attention Recurrent Neural Network increases the accuracy of inter-sentential anaphora resolution.

    Download PDF (648K)
feedback
Top