Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 25, Issue 5
Displaying 1-7 of 7 articles from this issue
Preface
Paper
  • Hao Wang, Yves Lepage
    2018 Volume 25 Issue 5 Pages 487-509
    Published: December 15, 2018
    Released on J-STAGE: March 15, 2019
    JOURNAL FREE ACCESS

    Preordering has proven useful in improving the translation quality of statistical machine translation (SMT), especially for language pairs with different syntax. The top-down bracketing transduction grammar (BTG)-based preordering method (Nakagawa 2015) has achieved a state-of-the-art performance since it relies on aligned parallel text only and deos not require any linguistic annotations. Although this online learning algorithm adopted is efficient and effective, it is very susceptible to alignment errors. In a production environment, in particular, such a preorderer is commonly trained on noisy word alignments obtained using an automatic word aligner, resulting in a worse performance compared to those trained on manually annotated datasets. In order to achieve better preordering using automatically aligned datasets, this paper seeks to improve the top-down BTG-based preordering method using various parameter mixing techniques to increase the accuracy of the preorderer and speed up training via parallelisation. The parameters mixing methods and the original online training method (Nakagawa 2015) were empirically compared, and the experimental results show that such parallel parameter averaging methods can dramatically reduce the training time and improve the quality of preordering.

    Download PDF (797K)
  • Saki Ibe, Yoshitatsu Matsuda, Kazunori Yamaguchi
    2018 Volume 25 Issue 5 Pages 511-525
    Published: December 15, 2018
    Released on J-STAGE: March 15, 2019
    JOURNAL FREE ACCESS

    It is well known that machine translation using recurrent neural networks often composes fluent sentences but may include many unknown words. Although there have been many works to address the unknown word problem, they are ineffective in Japanese to English translation. In this study, we propose a hybrid method that makes an alignment table using an attention weight matrix, detects input words that are aligned with each unknown words, and finally replaces those unknown words with the translated words using a statistical machine translation method. We evaluate our approach by using two corpora: ASPEC and NTCIR-10. The results showed that the proposed method generated no unknown words and improved the BLEU (BiLingual Evaluation Understudy) score.

    Download PDF (484K)
  • Masayuki Asahara
    2018 Volume 25 Issue 5 Pages 527-554
    Published: December 15, 2018
    Released on J-STAGE: March 15, 2019
    JOURNAL FREE ACCESS

    Japanese noun phrases are not marked by articles. The information status of Japanese noun phrases is not overt. Due to limited contextual information and world knowledge, it is difficult to estimate the information status, which is analyzed through the given/new status or indefinite/definite status. However, in Japanese language processing, the notion of the information status is yet to be understood. In this paper, we explain the information status of Japanese noun phrases. Then, we explore how the information status of Japanese noun phrases is estimated through the reading time. As a first step, we investigate the correlation between reading time and the information status of Japanese noun phrases. The statistical evaluation shows that readers’ information status affects reading time in Japanese.

    Download PDF (1316K)
  • Hiroki Asano, Tomoya Mizumoto, Kentaro Inui
    2018 Volume 25 Issue 5 Pages 555-576
    Published: December 15, 2018
    Released on J-STAGE: March 15, 2019
    JOURNAL FREE ACCESS

    In grammatical error correction (GEC), the automatic evaluation of system performance is thought to be an essential driving force. Previous methods for automated system assessment require gold-standard references, which have to be created manually and thus tend to be both expensive and limited in coverage. To address this problem, a reference-less approach has recently emerged; however, previous reference-less metrics, which only consider the grammaticality of system outputs, have not performed as well as reference-based metrics. In this study, we explore the potential of extending a prior grammaticality-based method to establish a reference-less evaluation method for GEC systems. We empirically show that a reference-less metric that combines both fluency and meaning preservation with grammaticality provides a better estimate of manual scores than that of commonly used reference-based metrics. Additionally, we show that the reference-less metric can provide appropriate evaluation at the sentence-level and that it can be applied to GEC systems.

    Download PDF (721K)
  • Isao Goto, Hideki Tanaka
    2018 Volume 25 Issue 5 Pages 577-597
    Published: December 15, 2018
    Released on J-STAGE: March 15, 2019
    JOURNAL FREE ACCESS

    Despite its promise, neural machine translation (NMT) presents a serious problem in that source content may be mistakenly left untranslated. The ability to detect untranslated content is important for the practical use of NMT. We evaluated two types of probability with which to identify untranslated content: the cumulative attention probability and the back translation probability from a target sentence to the source sentence. Experiments were conducted to discover missing content in Japanese to English patent translations. The results of the investigation revealed that both the types of probability were each effective, back translation was more effective than attention, and the combination of the two resulted in further improvements. Furthermore, we confirmed that the detection of untranslated content was effectual in terms of sentence selection for the human post-editing processing of machine translation results.

    Download PDF (935K)
  • Akiva Miura, Graham Neubig, Katsuhito Sudoh, Satoshi Nakamura
    2018 Volume 25 Issue 5 Pages 599-629
    Published: December 15, 2018
    Released on J-STAGE: March 15, 2019
    JOURNAL FREE ACCESS

    The pivot translation is useful method for translating between languages that contain little or no parallel data by utilizing equivalents in an intermediate language such as English. Commonly, phrase-based or tree-based pivot translation methods merge source–pivot and pivot–target translation models into a source–target model. This tactic is known as triangulation. However, the combination is based on the surface forms of constituent words, and it often produces incorrect source–target phrase pairs because of interlingual differences and semantic ambiguities in the pivot language. The translation accuracy is thus degraded. This paper proposes a triangulation approach that utilizes syntactic subtrees in the pivot language to avoid incorrect phrase combinations by distinguishing pivot language words by their syntactic roles. The results of the experiments conducted on the United Nations Parallel Corpus demonstrate that the proposed method is superior to other pivot translation approaches in all tested combinations of languages.

    Download PDF (331K)
feedback
Top