Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 14, Issue 4
Displaying 1-6 of 6 articles from this issue
  • [in Japanese]
    2007 Volume 14 Issue 4 Pages 1-2
    Published: July 10, 2007
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (165K)
  • TAIICHI HASHIMOTO, KYOSUKE YOSHIDA, MASAKI NOGUCHI, TAKENOBU TOKUNAGA, ...
    2007 Volume 14 Issue 4 Pages 3-22
    Published: July 10, 2007
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper presents a method to retrieve sentences including the same subtree as a given query from a treebank. Our system stores the treebank in a relational database. One of the problems of the previous work in structure retrieval is efficiency for large queries. The proposed method divides a large query into several subtrees, and incrementally narrows down the result by using these subtrees as queries. The number of subtrees and the order are determined automaticaly based on the treebank statistics. We conducted experiments to evaluate the proposed method with seven treebanks and found that the proposed method significantly improved the retrieval efficiency in four out of seven treebanks.
    Download PDF (2644K)
  • SAYORI SHIMOHATA, HITOSHI ISAHARA
    2007 Volume 14 Issue 4 Pages 23-41
    Published: July 10, 2007
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper describes a method for retrieving technical terms and finding their translations from bilingual patent corpora. The method extracts terms from each monolingual corpus and finds their translations by using a list of bilingual word pairs called “seed words”. In the term extraction process, we quantify the unithood and termhood of word sequences to determine if they are technical terms. In the translation alignment process, we select seed words whose contexts are similar in the target corpora. We conducted experiments in term extraction and translation alignment with patent abstracts of Japan and the United States. In the term extraction, the proposed method has achieved a precision of 90% for Japanese term extraction and 93% for English term extraction. In the translation alignment, the accuracy was 53% (Japanese to English) and 66% (English to Japanese) for the top candidates and 83% (J to E) and 90% (E to J) for the top 10 candidates. Comparison of the results between parallel corpora and comparable corpora is also described.
    Download PDF (3176K)
  • YOSHIYUKI UMEMURA, SHIGERU MASUYAMA
    2007 Volume 14 Issue 4 Pages 43-65
    Published: July 10, 2007
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Previous work on elaboration mainly focuses on expression-level and/or structurelevel technologies such as correction of typing errors, detection and indications of the complexity of syntactic structures, fluctuations of expressions and so on. In contrast, this paper deals with technologies to detect portions in each sentence, where readers feel difficult in reasoning contexts because of information defection. We constrain sentences in business writings used as communication media to transfer information correctly. This problem is placed in a semantic-level elaboration that has not been studied sufficiently. According to “cooperative principle” in pragmatics, there are principles for information defection or information overload that are called “maxims of quantity”. This paper only deals with information defection. The reason why this paper does not deal with information overload is that information overload only imposes burden on readers not to take account of redundant information. On the other hand, information defection leads to serious problems that make readers difficult to understand. The process from preparation of experiments to analyses is as follows. Firstly, we generate sentences where adnominal regions are eliminated. Secondly, we prepare correct data sets by subjective judgements whether examinees feel explanations are insufficient or not. Finally, we apply machine learning and automatic decision on this data. We used n-gram statistics and others to evaluate smoothness of connections between regions crossing missing portion of adnominal clause of a phrase. We obtain correct decision rate 67% in the result of about 1, 000 decision tasks used with SVMs, against base-line rate 50% and upper limit of correct decision rate 76% (determined by dispersion of decisions by human subjects).
    Download PDF (6361K)
  • DAISUKE KAWAHARA, SADAO KUROHASHI
    2007 Volume 14 Issue 4 Pages 67-81
    Published: July 10, 2007
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper presents an integrated probabilistic model for Japanese syntactic and case structure analysis. Syntactic and case structure are simultaneously analyzed based on wide-coverage case frames that are constructed from a huge raw corpus in an unsupervised manner. This model selects the syntactic and case structure that has the highest generative probability. We evaluate both syntactic structure and case structure. In particular, the experimental results for syntactic analysis on web sentences show that the proposed model significantly outperforms known syntactic analyzers.
    Download PDF (1374K)
  • Nobuo Sato, Yasunari Obuchi
    2007 Volume 14 Issue 4 Pages 83-96
    Published: July 10, 2007
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we propose a new approach to emotion recognition. Prosodic features are currently used in most emotion recognition algorithms. However, emotion recognition algorithms using prosodic features are not sufficiently accurate. Therefore, we focused on the phonetic features of speech for emotion recognition. In particular, we describe the effectiveness of Mel-frequency Cepstral Coefficients (MFCCs) as the feature for emotion recognition. We focus on the precise classification of MFCC feature vectors, rather than their dynamic nature over an utterance. To realize such an approach, the proposed algorithm employs multi-template emotion classification of the analysis frames. Experimental evaluations show that the proposed algorithm produces 66.4% recognition accuracy in speaker-independent emotion recognition experiments for four specific emotions. This recognition accuracy is higher than the accuracy obtained by the conventional prosody-based and MFCC-based emotion recognition algorithms, which confirms the potential of the proposed algorithm.
    Download PDF (6096K)
feedback
Top