Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 16, Issue 1
Displaying 1-6 of 6 articles from this issue
Preface
Paper
  • Ryoji Hamabe, Kiyotaka Uchimoto, Tatsuya Kawahara, Hitoshi Isahara
    2009 Volume 16 Issue 1 Pages 1_3-1_23
    Published: 2009
    Released on J-STAGE: September 14, 2011
    JOURNAL FREE ACCESS
    Japanese dependency structure is usually represented by relationships between phrasal units called bunsetsus. One of the biggest problems with dependency structure analysis in spontaneous speech is that clause boundaries are ambiguous. This paper describes a method for detecting the boundaries of quotations and inserted clauses and that for improving the dependency accuracy by applying the detected boundaries to dependency structure analysis. The quotations and inserted clauses are determined by using an SVM-based text chunking method that considers information on morphemes, pauses, etc. The information on automatically analyzed dependency structure is also used to detect the beginning of the clauses. Our evaluation experiment using Corpus of Spontaneous Japanese (CSJ) showed that the automatically estimated boundaries of quotations and inserted clauses helped to improve the accuracy of dependency structure analysis from 77.7% to 78.7% .
    Download PDF (431K)
  • Yoichi Tomiura, Sayaka Aoki, Masahiro Shibata, Kensei Yukino
    2009 Volume 16 Issue 1 Pages 1_25-1_46
    Published: 2009
    Released on J-STAGE: September 14, 2011
    JOURNAL FREE ACCESS
    This paper proposes a method to discern the nativeness of English documents with high precision based on Bayes decision and a statistical hypothesis testing. Regarding a document as a sequence of part-of-speeches, the proposed method makes a comparison between probabilities of a document by the statistical language model of native English and by that of non-native English to discern the nativeness of the document. The statistical language model used here is a n-gram model. The n-gram model with a large n can be expected to treat well the difference between the native English and the non-native one and has the potential to discern the nativeness with high precision. However, when we use the n-gram model with a large n, the zero frequency problem and the sparseness problem become acute and we cannot rely on the maximum likelihood estimates of n-gram probabilities. The proposed method estimates the ratio of the probability of the document by the native English language model to that by the non-native English language model using a statistical hypothesis testing. The experimental result shows that the proposed method discerns the nativeness with the precision 92.5%, which is significantly higher than by traditional methods.
    Download PDF (404K)
  • Vinh Van Nguyen, Minh Le Nguyen, Akira Shimazu
    2009 Volume 16 Issue 1 Pages 1_47-1_65
    Published: 2009
    Released on J-STAGE: September 14, 2011
    JOURNAL FREE ACCESS
    In this paper, we present a Conditional Random Fields (CRFs) framework for the Clause Splitting problem. We adapt the CRFs model to this problem in order to use very large sets of arbitrary, overlapping and non-independent features. We also extend N-best list by using the Joint-CRFs (Shi and Wang 2007). In addition, we propose the use of rich linguistic information along with a new bottom-up dynamic algorithm for decoding to split a sentence into clauses. The experiments show that our results are competitive with the state-of-the art results.
    Download PDF (167K)
  • Madoka Ishioroshi, Tatsunori Mori
    2009 Volume 16 Issue 1 Pages 1_67-1_100
    Published: 2009
    Released on J-STAGE: September 14, 2011
    JOURNAL FREE ACCESS
    In this paper, we propose a method of the list-type question-answering. The list-type question-answering is the task in which a system is requested to enumerate all correct answers to given question. In the method, we utilize the distribution of the score that an existing question answering system gives to answer candidates. Answer candidates are separated into some clusters according to their scores. Here, we assume that each cluster results from a probabilistic model. Under the assumption, the parameters of these probabilistic distribution models are estimated by using the EM algorithm. Then, the method judges whether each distribution model is a source of correct answers or a source of incorrect answers. Answer candidates that originate from the distribution models corresponding to correct answers are regarded as final answers. Moreover, by comparing model parameters, we can also judge whether or not the question-answering system appropriately found correct answers. The experimental results show that the use of the score distribution is effective in the list-type question-answering.
    Download PDF (950K)
  • Satoshi Suzuki
    2009 Volume 16 Issue 1 Pages 1_101-1_116
    Published: 2009
    Released on J-STAGE: September 14, 2011
    JOURNAL FREE ACCESS
    This paper proposes a method for information extraction of hypernyms from dictionaries, and presents a result of automatic construction of word ontology based on the extracted information. The method recursively expands word definitions to get much larger word sets, which will be candidates of hypernyms of the headwords. At the same time, this method gives likelyhood of the candidates for hypernyms, which is useful for selecting hypernyms from the candidates. Computational experiments showed that the proposed method gives better results than an existing method, which regards HEAD as hypernyms by parsing the explanatory notes. Additionally, we tryed to create a word ontology with the resulting hypernyms. This method is still underconstruction, but the results showed effectiveness of the resulting hypernyms, and showed possibility of entirely automatic construction of word ontology.
    Download PDF (483K)
feedback
Top