Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 21, Issue 5
Displaying 1-5 of 5 articles from this issue
Preface
Paper
  • Chenchen Ding, Mikio Yamamoto
    2014 Volume 21 Issue 5 Pages 981-1009
    Published: September 16, 2014
    Released on J-STAGE: December 16, 2014
    JOURNAL FREE ACCESS
    We design a language model based on a generative dependency structure for sentences. The parameter of the model is the probability of a dependency N-gram, which is composed of lexical words with four types of extra tag used to model the dependency relation and valence. We further propose an unsupervised expectation-maximization algorithm for parameter estimation, in which all possible dependency structures of a sentence are considered. As the algorithm is language-independent, it can be used on a raw corpus from any language, without any part-of-speech annotation, tree-bank or trained parser. We conducted experiments using four languages, i.e., English, German, Spanish and Japanese, to illustrate the applicability and the properties of the proposed approach. We further apply the proposed approach to a Chinese microblog data set to extract and investigate Internet-based, non-standard lexical dependency features of user-generated content.
    Download PDF (659K)
  • Hiroyuki Shinnou, Minoru Sasaki
    2014 Volume 21 Issue 5 Pages 1011-1035
    Published: September 16, 2014
    Released on J-STAGE: December 16, 2014
    JOURNAL FREE ACCESS
    In this paper, we apply the learning under covariate shift to the problem of unsupervised domain adaptation for word sense disambiguation (WSD). This learning is a type of weighted learning method, in which the probability density ratio w(x) = PT(x)/PS(x) is used as the weight of an instance. However, w(x) tends to be small in WSD tasks. In order to address this problem, we calculate w(x) by estimating PT(x) and PS(x), where PS(x) is estimating by regarding the corpus combining the source domain corpus and target domain corpus as the source domain corpus. In the experiment, we use three domains -OC (Yahoo! Chiebukuro), PB (books) and PN (news papers)- in BCCWJ, and 16 target words provided by the Japanese WSD task in SemEval-2. For calculating w(x), we also use uLSIF, which directly estimates w(x) without estimating PT(x) or PS(x). Moreover, we use the “p power” method and the “relative probability density ratio” method to boost the obtained probability density ratio. These experiments prove our method to be effective.
    Download PDF (544K)
  • Katsuhiko Hayashi, Katsuhito Sudoh, Hajime Tsukada, Jun Suzuki, Masaak ...
    2014 Volume 21 Issue 5 Pages 1037-1057
    Published: September 16, 2014
    Released on J-STAGE: December 16, 2014
    JOURNAL FREE ACCESS
    This paper introduces a novel word re-ordering model for statistical machine translation that employs a shift-reduce parser for inversion transduction grammars. The proposed model also solves article generation problems simultaneously with word re-ordering. We applied it to the post-ordering of phrase-based machine translation (PBMT) for Japanese-to-English patent translation tasks. Our experimental results suggest that our method achieves a significant improvement of +3.15 BLEU scores against 29.99 BLEU scores of the baseline PBMT system.
    Download PDF (784K)
Survey paper
  • Shunji Umetani
    2014 Volume 21 Issue 5 Pages 1059-1090
    Published: September 16, 2014
    Released on J-STAGE: December 16, 2014
    JOURNAL FREE ACCESS
    The integer programming (IP) model is a general-purpose optimization model that can formulate a surprisingly wide class of real applications using integer variables in linear programming (LP) models. Recent development in IP software systems has significantly improved our ability to solve large-scale instances. However, it is still difficult for most non-expert users to formulate real applications into IP models, because all conditions need to be written in linear inequalities. This paper demonstrates how to use IP software systems and formulate real applications into IP models.
    Download PDF (728K)
feedback
Top