Journal of Natural Language Processing

Preface

[title in Japanese]

[in Japanese]

2017 Volume 24 Issue 3 Pages 321-322
Published: June 15, 2017
Released on J-STAGE: September 15, 2017

DOIhttps://doi.org/10.5715/jnlp.24.321

JOURNAL FREE ACCESS

Download PDF (103K)

Paper

Extending Various Thesauri by Finding Synonym Sets from a Formal Concept Lattice

Madori Ikeda, Akihiro Yamamoto

2017 Volume 24 Issue 3 Pages 323-349
Published: June 15, 2017
Released on J-STAGE: September 15, 2017

DOIhttps://doi.org/10.5715/jnlp.24.323

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we solve the problem of extending various thesauri using a single method. Thesauri should be extended when unregistered terms are identified. Various thesauri are available, each of which is constructed according to a unique design principle. We formalise the extension of one thesaurus as a single classification problem in machine learning, with the goal of solving different classification problems. Applying existing classification methods to each thesaurus is time consuming, particularly if many thesauri must be extended. Thus, we propose a method to reduce the time required to extend multiple thesauri. In the proposed method, we first generate clusters of terms without the thesauri that are candidates for synonym sets based on formal concept analysis using the syntactic information of terms in a corpus. Reliable syntactic parsers are easy to use; thus, syntactic information is more available for many terms than semantic information. With syntactic information, for each thesaurus and for all unregistered terms, we can search candidate clusters quickly for a correct synonym set for fast classification. Experimental results demonstrate that the proposed method is faster than existing methods and classification accuracy is comparable.

View full abstract

Download PDF (866K)
Meaning Estimation Scheme of Alphabetical Abbreviations using Conceptualization of Words and Wikipedia

Kazuto Goto, Seiji Tsuchiya, Hirokazu Watabe

2017 Volume 24 Issue 3 Pages 351-369
Published: June 15, 2017
Released on J-STAGE: September 15, 2017

DOIhttps://doi.org/10.5715/jnlp.24.351

JOURNAL FREE ACCESS

Show abstractHide abstract

It is believed that Japan is open to loanwords, and they are often used in daily activities. Particularly, in English, scenes using foreign language notation are increasing. In addition, alphabetical abbreviations, which comprise initials of each English word, are used. However, polysemy is a major concern for the alphabetical abbreviations. In this paper, we propose a scheme to estimate the meaning of an alphabetical abbreviation. The proposed scheme considers the meaning estimation of an alphabetical abbreviation as the meaning estimation of an unknown word. This scheme uses the concept base, and, Calculation of Degree of Association or Earth Mover’s Distance. The scheme allows for the conceptualization of a word and the evaluation of semasiological association between conceptualized words. In addition, Wikipedia is used to complement the lack of information due to alphabetical abbreviations. This paper makes use of 129 articles to evaluate the proposed scheme. The experiments showed that the accuracy of the proposed scheme was nearly 80% and that the scheme was more effective than other schemes.

View full abstract

Download PDF (905K)
Left-Corner Parsing for Identifying PTB-Style Nonlocal Dependencies

Yoshihide Kato, Shigeki Matsubara

2017 Volume 24 Issue 3 Pages 371-394
Published: June 15, 2017
Released on J-STAGE: September 15, 2017

DOIhttps://doi.org/10.5715/jnlp.24.371

JOURNAL FREE ACCESS

Show abstractHide abstract

Nonlocal dependencies represent syntactic phenomenon such as wh-movement, A-movement in passives, topicalization, raising, control, and right node raising. Nonlocal dependencies play an important role in semantic interpretation. This paper proposes a left-corner parser that identifies nonlocal dependencies. Our parser integrates nonlocal dependency identification into a transition-based system. We adopt a left-corner strategy in order to use the syntactic relation c-command, which plays an important role in nonlocal dependency identification. To utilize the global features captured by nonlocal dependencies, our parser uses a structured perceptron. In experimental evaluations, our parser achieved a good balance between constituent parsing and nonlocal dependency identification.

View full abstract

Download PDF (853K)
Generalized Hierarchical Word Sequence Framework for Language Modeling

Xiaoyi Wu, Kevin Duh, Yuji Matsumoto

2017 Volume 24 Issue 3 Pages 395-419
Published: June 15, 2017
Released on J-STAGE: September 15, 2017

DOIhttps://doi.org/10.5715/jnlp.24.395

JOURNAL FREE ACCESS

Show abstractHide abstract

Language modeling is a fundamental research problem that has wide application for many NLP tasks. For estimating probabilities of natural language sentences, most research on language modeling use n-gram based approaches to factor sentence probabilities. However, the assumption under n-gram models is not robust enough to cope with the data sparseness problem, which affects the final performance of language models. In this paper, we propose a generalized hierarchical word sequence framework, where different word association scores can be adopted to rearrange word sequences in a totally unsupervised fashion. Unlike the n-gram which factors sentence probability from left-to-right, our model factors using a more flexible strategy. For evaluation, we compare our rearranged word sequences to normal n-gram word sequences. Both intrinsic and extrinsic experiments verify that our language model can achieve better performance, proving that our method can be considered as a better alternative for n-gram language models.

View full abstract

Download PDF (863K)
Improving Sublanguage Translation via Global Pre-ordering

Masaru Fuji, Masao Utiyama, Eiichiro Sumita, Yuji Matsumoto

2017 Volume 24 Issue 3 Pages 421-445
Published: June 15, 2017
Released on J-STAGE: September 15, 2017

DOIhttps://doi.org/10.5715/jnlp.24.421

JOURNAL FREE ACCESS

Show abstractHide abstract

When translating formal documents, capturing the sentence structure specific to the sublanguage is extremely necessary to obtain high-quality translations. This paper proposes a novel global reordering method that focuses on long-distance reordering to capture the global sentence structure of a sublanguage. The proposed method learns global reordering models without syntactic parsing from a non-annotated parallel corpus and works in conjunction with conventional syntactic reordering. The experimental results regarding patent abstract sublanguage show concrete improvements in translation quality, both for Japanese-to-English and English-to-Japanese translations.

View full abstract

Download PDF (753K)
Improvement in Domain Specific Word Segmentation by Symbol Grounding

Suzushi Tomori, Hirotaka Kameko, Takashi Ninomiya, Shinsuke Mori, Yosh ...

2017 Volume 24 Issue 3 Pages 447-461
Published: June 15, 2017
Released on J-STAGE: September 15, 2017

DOIhttps://doi.org/10.5715/jnlp.24.447

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a novel framework for improving a word segmenter using information acquired from symbol grounding. The framework uses a dataset consisting of pairs of non-textual information and a commentary. We generate a pseudo-stochastically segmented corpus from the commentaries, and then build a neural network to predict relationships between non-textual information and the words. We generate a domain specific term dictionary by using the neural network for word segmenter. We applied our method to game records of Japanese chess with commentaries. The experimental results show that the accuracy of a word segmenter can be improved by incorporating the generated dictionary.

View full abstract

Download PDF (481K)
Selecting Syntactic, Non-redundant Segments in Active Learning for Machine Translation

Akiva Miura, Graham Neubig, Michael Paul, Satoshi Nakamura

2017 Volume 24 Issue 3 Pages 463-489
Published: June 15, 2017
Released on J-STAGE: September 15, 2017

DOIhttps://doi.org/10.5715/jnlp.24.463

JOURNAL FREE ACCESS

Show abstractHide abstract

Active learning is a framework that makes it possible to efficiently train statistical models by selecting informative examples from a pool of unlabeled data. Previous work has found this framework effective for machine translation (MT), making it possible to train better translation models with less effort, particularly when annotators translate short phrases instead of full sentences. However, previous methods for phrase-based active learning in MT fail to consider whether the selected units are coherent and easy for human translators to translate, and also have problems with selecting redundant phrases with similar content. In this paper, we tackle these problems by proposing two new methods for selecting more syntactically coherent and less redundant segments in active learning for MT. Experiments using both simulation and extensive manual translation by professional translators find the proposed method effective, achieving both greater gain of BLEU score for the same number of translated words, and allowing translators to be more confident in their translations.

View full abstract

Download PDF (861K)
Phrase Structure Annotation and Parsing for Learner English

Keisuke Sakaguchi, Ryo Nagata

2017 Volume 24 Issue 3 Pages 491-514
Published: June 15, 2017
Released on J-STAGE: September 15, 2017

DOIhttps://doi.org/10.5715/jnlp.24.491

JOURNAL FREE ACCESS

Show abstractHide abstract

Learner English often contains grammatical errors with structural characteristics such as omissions, insertions, substitutions, and word order errors. These errors are not covered by the existing context-free grammar (CFG) rules. Therefore, it is not at all straightforward how to annotate learner English with phrase structures. Because of this limitation, there has been almost no work on phrase structure annotation for learner corpora despite its importance and usefulness. To address this issue, we propose a phrase structure annotation scheme for learner English, that consists of five principles. We apply the annotation scheme to two different learner corpora and show (i) its effectiveness at consistently annotating learner English with phrase structure (i.e., high inter-annotator agreement); (ii) the structural characteristics (CFG rules) of learner English obtained from the annotated corpora; and (iii) phrase structure parsing performance on learner English for the first time. We also release the annotation guidelines, the annotated data, and the parser model to the public.

View full abstract

Download PDF (273K)

Register with J-STAGE for free!