Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 28, Issue 1
Displaying 1-21 of 21 articles from this issue
Preface
General Paper
  • Masatoshi Suzuki, Koji Matsuda, Hiroki Ouchi, Jun Suzuki, Kentaro Inui
    2021 Volume 28 Issue 1 Pages 3-25
    Published: 2021
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    Recent progress in language modeling is promoting research on a question answering (QA) task without reading comprehension, which is called closed-book QA. While previous studies are focused on enlarging and sophisticating a model to address this task, we take a data-oriented approach to teach a model about diverse factual knowledge efficiently. We utilize Wikipedia as an additional source of knowledge to create an augmented dataset. We empirically show that our model trained with data augmentation correctly answers questions unseen in the training data, suggesting that the model learns new knowledge from the augmented data. Accordingly, our model outperforms the previously reported best performance on Quizbowl, and performs on par with a strong baseline on TriviaQA although our model has about 20 times fewer parameters.

    Download PDF (759K)
  • Shohei Tanaka, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura
    2021 Volume 28 Issue 1 Pages 26-59
    Published: 2021
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    Dialogue continuity is a dialogue system-evaluation metric that indicates how well the system engages its users. Response coherence considered an important factor in determining dialogue continuity. Herein, we propose novel methods of selecting coherent responses for a given dialogue context. These methods improve coherence and dialogue continuity using related-event pairs such as “be stressed out” and “relieve stress.” Two re-ranking methods are proposed. The first method estimates the coherence of event pairs in a dialogue by matching event-causality pairs that are statistically extracted from web texts. The second method estimates the coherence of response candidates for the dialogue context using a coherence model. The results of an automatic evaluation show that objective coherence at the word level was improved by these re-ranking methods. In contrast, in the case of human evaluation, subjective coherence at the response level was not improved by the first method; however, the dialogue continuity was improved. These results seem to be contradictory. We conducted a correlation analysis and a case analysis to clarify the relationship between subjective coherence and dialogue continuity. The results indicate that subjective coherence does not have a strong correlation with dialogue continuity. Further, dialogue continuity may be improved if an event related to the dialogue context is selected.

    Download PDF (791K)
  • Sachi Kato, Masayuki Asahara, Nanami Moriyama, Asami Ogiwara, Makoto Y ...
    2021 Volume 28 Issue 1 Pages 60-81
    Published: 2021
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    This paper presents research on opposite information annotation on the thesaurus ‘Word List by Semantic Principles (WLSP)’. The categorised words with the same label include antonym word pairs. We extract the opposite word pairs from the categorised group and classified the opposite word pairs. Firstly, the annotators manually extracted opposite word pair candidates. Secondly, we utilised Yahoo! crowdsourcing to evaluate how many people recognise the opposite word pair as the opposites. We defined ‘opposites’ as the words judged by greater than or equal to 50% people. Thirdly, we annotated the opposites types by Muraki for the word pairs. We analysed the opposite word lists by their asymmetry, the label of WLSP, the opposite types, their frequencies and word embeddings. In the linguistic point of view, the closed opposite word pairs tend to be regarded as ‘opposites’. In the natural language processing point of view, the distances between two words in the opposite pairs correlates their replaceability of the human judge.

    Download PDF (506K)
  • Ander Martinez, Katsuhito Sudoh, Yuji Matsumoto
    2021 Volume 28 Issue 1 Pages 82-103
    Published: 2021
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    Neural machine translation (NMT) systems often use subword segmentation to limit vocabulary sizes. This type of segmentation is particularly useful for morphologically complex languages because their vocabularies can grow prohibitively large. This method can also replace infrequent tokens with more frequent subwords. Fine segmentation with short subword units has been shown to produce better results for smaller training datasets. Character-level NMT, which can be considered as an extreme case of subword segmentation in which each subword consists of a single character, can provide enhanced transliteration results, but also tends to produce grammatical errors. We propose a novel approach to this problem that combines subword-level segmentation with character-level information in the form of character n-gram features to construct embedding matrices and softmax output projections for a standard encoder-decoder model. We use a custom algorithm to select a small number of effective binary character n-gram features. Through four sets of experiments, we demonstrate the advantages of the proposed approach for processing resource-limited language pairs. Our proposed approach yields better performance in terms of BLEU score compared to subword- and character-based baseline methods under low-resource conditions. In particular, the proposed approach increases the vocabulary size for small training datasets without reducing translation quality.

    Download PDF (182K)
  • Takashi Kodama, Ribeka Tanaka, Sadao Kurohashi
    2021 Volume 28 Issue 1 Pages 104-135
    Published: 2021
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    Intelligent dialogue systems are expected to be a new interface between humans and machines. An ideal intelligent dialogue system should estimate the user’s internal states and incorporate the estimation results into its response appropriately. In this paper, we focus on the movie recommendation dialogues and propose a dialogue system that considers the user’s internal state. First, we build a movie recommendation dialogue system and collect dialogue data. Based on the analysis of the collected dialogue data, we model and annotate the user’s internal states in three aspects: knowledge, interest, and engagement. Second, we train the user’s internal state estimators on the dialogue corpus with the annotations of the user’s internal states. The trained estimator achieved high accuracy on the annotated corpus. Further, we design a set of rules that modify the system’s responses according to each user’s internal state. We confirmed that the response modifications based on the results of the user’s internal state estimator improve the naturalness of the system utterances in both dialogue evaluation and utterance evaluation.

    Download PDF (895K)
  • Takashi Kodama, Ryuichiro Higashinaka, Koh Mitsuda, Ryo Masumura, Yush ...
    2021 Volume 28 Issue 1 Pages 136-159
    Published: 2021
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    This paper concerns the problem of realizing consistent personalities in neural conversational modeling by using user generated question-answer pairs as training data. Using the framework of role play-based question-answering, we collected single-turn question-answer pairs for particular characters from online users. Meta information was also collected such as emotion and intimacy related to question-answer pairs. We verified the quality of the collected data and, by subjective evaluation, we also verified their usefulness in training neural conversational models for generating responses reflecting the meta information, especially emotion.

    Download PDF (825K)
  • Masato Mita, Tomoya Mizumoto, Masahiro Kaneko, Ryo Nagata, Kentaro Inu ...
    2021 Volume 28 Issue 1 Pages 160-182
    Published: 2021
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    Grammatical error correction (GEC) systems have been typically evaluated on the single corpus: the CoNLL-2014 benchmark. However, evaluation remains incomplete because the task difficulty should vary depending on test corpus properties including the proficiency levels of the writers and essay topics. This study explores the necessity of cross-corpora evaluation for GEC systems based on the hypothesis that a single corpus evaluation is insufficient for evaluating GEC systems. Specifically, we evaluated performance of four GEC models (based on LSTM, CNN, Transformer and SMT) against six corpora (CoNLL-2013, CoNLL-2014, FCE, JFLEG, KJ and BEA-2019). Evaluation results revealed that model rankings vary considerably depending on the corpora, indicating that single-corpus evaluation is insufficient for GEC models. Moreover, cross-sectional evaluation is useful not only as a meta-evaluation method but also for practical applications. As a case study of the usefulness of cross-sectional evaluation, we investigated the cross-sectional evaluation of one of the typical conditions for input of grammatical error correction: writer’s proficiency level as a unit of evaluation. The results showed that there was a large divergence in the evaluation between the beginner-intermediate and advanced levels of writer’s proficiency.

    Download PDF (572K)
  • Tianqi Wang, Hiroaki Funayama, Hiroki Ouchi, Kentaro Inui
    2021 Volume 28 Issue 1 Pages 183-205
    Published: 2021
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    Short Answer Grading (SAG) is the task of scoring students’ answers for applications such as examinations or e-learning. Most of the existing SAG systems predict scores based only on the answers, and critical evaluation criteria such as rubrics are ignored, which plays a crucial role in evaluating answers in real-world situations. In this paper, we propose a semi-supervised method to train a neural SAG model. We extract keyphrases that are highly related to answers scores from rubrics. Weights to words of answers are calculated as attention labels instead of manually annotated attention labels, based on span-wise alignments between answers and keyphrases. Only answers with highly weighed words are used as attention supervision. We evaluate the proposed model on two analytical assessment tasks of analytic score prediction and justification identification. Analytic score prediction is the task of predicting the score of a given answer for a prompt, and Justification identification involves identifying a justification cue in a given student answer for each analytic score. Our experimental results demonstrate that both performance of grading and justification identification is improved by integrating attention semi-supervised training, especially in a low-resource setting.

    Download PDF (1862K)
  • Yoichi Ishibashi, Katsuhito Sudoh, Koichiro Yoshino, Satoshi Nakamura
    2021 Volume 28 Issue 1 Pages 206-234
    Published: 2021
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    Word embeddings, which often represent analogic relations such as kingman + womanqueen, can be used to change an attribute of a word, including its gender. To transfer the gender attribute of king to obtain queen in this analogy, we subtract a difference vector manwoman from king based on the knowledge that king is male. However, developing such knowledge is significantly costly for words and attributes. In this work, we propose a novel method for word attribute transfer based on reflection mapping without an analogy-based operation. Experimental results show that our proposed method can transfer the word attributes of the given words without changing the words that are invariant with respect to the target attributes.

    Download PDF (938K)
  • Sora Ohashi, Mao Isogawa, Tomoyuki Kajiwara, Yuki Arase
    2021 Volume 28 Issue 1 Pages 235-252
    Published: 2021
    Released on J-STAGE: March 15, 2021
    JOURNAL FREE ACCESS

    We reduce the model size of word embeddings while preserving its quality. Previous studies composed word embeddings from those of subwords and mimicked the pre-trained word embeddings. Although these methods can reduce the vocabulary size, it is difficult to extremely reduce the model size while preserving its quality. Inspired by the observation of words with similar meanings having similar embeddings, we propose a multitask learning that mimicks not only the pre-trained word embeddings but also the similarity distribution between words. Experimental results on word similarity estimation tasks show that the proposed method improves the performance of existing methods and reduces the model size by a factor of 30 while preserving the quality of the original word embeddings. In addition, experimental results on text classification tasks show that we reduce the model size by a factor of 200 while preserving 90% of the quality of the original word embeddings.

    Download PDF (417K)
Society Column
Supporting Member Column
Information
feedback
Top