Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 13, Issue 3
Displaying 1-11 of 11 articles from this issue
  • [in Japanese]
    2006 Volume 13 Issue 3 Pages 1-2
    Published: July 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (220K)
  • EIJI ARAMAKI, SADAO KUROHASHI, HIDEKI KASHIOKA, NAOTO KATO
    2006 Volume 13 Issue 3 Pages 3-19
    Published: July 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Example-based machine translation (EBMT) systems, so far, rely on heuristic measures in retrieving translation examples.Such a heuristic measure costs time to adjust, and might make its algorithm unclear.This paper presents a probabilistic model for EBMT.Under the proposed model, the system searches the translation example combination which has the highest probability.The proposed model clearly formalizes EBMT process.In addition, the model can naturally incorporate the context similarity of translation examples.The experimental results demonstrate that the proposed model has a slightly better translation quality than state-of-the-art EBMT systems.
    Download PDF (1522K)
  • MANABU SASSANO
    2006 Volume 13 Issue 3 Pages 21-35
    Published: July 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We explore how virtual examples (artificially created examples) improve performance of text classification with Support Vector Machines (SVMs).We propose techniques to create virtual examples for text classification based on the assumption that the category of a document is unchanged even if a small number of words are added or deleted. We evaluate the proposed methods by Reuters-21758 test set collection.Experimental results show virtual examples improve the performance of text classification with SVMs, especially for small training sets.
    Download PDF (1313K)
  • HIDEKI HIRAKAWA
    2006 Volume 13 Issue 3 Pages 37-90
    Published: July 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Preference Dependency Grammar (PDG) is a framework for the morphological, syntactic and semantic analysis for natural language sentences.PDG gives packed shared data structures to hold the various ambiguities in each level of sentence analysis with preference scores and a method for calculating the most plausible interpretation for a sentence.This paper describes the sentence analysis model named the “Multi-level Packed Shared Data Connection Model” adopted in PDG and shows the outline of the PDG framework.This paper describes the packed shared data structures, such as the Headed Parse Forest, the Dependency Forest adopted in PDG, and shows the completeness and the soundness of the mapping between the Parse Forest and the Dependency Forest.
    Download PDF (9443K)
  • TOMOHIDE SHIBATA, SADAO KUROHASHI
    2006 Volume 13 Issue 3 Pages 91-111
    Published: July 10, 2006
    Released on J-STAGE: June 07, 2011
    JOURNAL FREE ACCESS
    In this paper, we describe a method for automatically generating summary slides from a text.The slide consists of itemizations of extracted texts, and to determine their indentation, we need to analyze relations between sentences/clauses, such as contrast and elaboration.We first analyze the discourse structure of the text by considering three types of information: cue phrases, identification of word chain and similarity between two sentences.Then, we extract topic/non-topic parts from the text and generate the slide by placing the extracted texts, whose indentations are controlled according to the discourse structure.Our experiments demonstrate that generated slides are far easier to read in comparison with original texts.
    Download PDF (6914K)
  • TORU HIRANO, RYU IIDA, ATSUSHI FUJITA, KENTARO INUI, YUJI MATSUMOTO
    2006 Volume 13 Issue 3 Pages 113-132
    Published: July 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we propose a method of reducing the cost of annotating examples with argument structure in order to increase accuracy of argument structure analysis.First, a large raw corpus is parsed, and a large scale collection of example sentences is constructed from predicate-argument examples in the parsing results.Second, the collection of example sentences is clustered by using two similarities about verb.Finally, the acquired clusters are annotated with argument structure by human.We report preliminary experiments using our proposed method, and show that the method is effective in reducing the cost of annotating.
    Download PDF (5131K)
  • ATSUSHI FUJITA, KENTARO INUI
    2006 Volume 13 Issue 3 Pages 133-150
    Published: July 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Several classes of paraphrases have a potential to be compositionally explained byreferring to syntactic and semantic properties of constituent words: e.g., composing/decomposing compounds, voice/case alternation, various verb alternation, and lexical derivation.Toward analyzing the compositionality underlying these paraphrase classes, we have examined a class-oriented framework for collecting paraphrase examples, in which sentential paraphrases are collected for each paraphrase class separately by means of automatic candidate generation based on morpho-syntactic paraphrasing patterns, followed by manual judgement.Our preliminary experiments on building two paraphrase sub-corpora have so far been producing promising results with regard to cost-efficiency, exhaustiveness, and reliability.
    Download PDF (5172K)
  • YASUHIRO SASAKI, SATOSHI SATO, TAKEHITO UTSURO
    2006 Volume 13 Issue 3 Pages 151-175
    Published: July 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes the related term collection problem and its solution.The related term collection problem is defined as collecting a dozen of technical terms that are closely related to a given seed term.In order to solve this problem, we use the Jaccard coefficient or the x2 statistics on the Web, which is calculated by the search engine hits, for measuring relatedness between the given seed term and a candidate term.These measures also verify that the candidate term is a technical term.We have implemented a related term collection system, which consists of two modules. The first module collects candidate terms from the web pages that are retrieved by a search engine.The second module selects the terms that are closely related to the given term by using one of the above two measures.Experimental results show that the system can collect a dozen of closely related terms of the given term.
    Download PDF (2556K)
  • TADAHIRO MATSUMOTO, DAIKI HARADA, DAISUKE HARA, TAKASHI IKEDA
    2006 Volume 13 Issue 3 Pages 177-200
    Published: July 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper we propose a notation system for Japanese Sign Language (JSL).This notation system is aimed to help modularize the Japanese-JSL machine translation process and to bring the JSL generation problem closer to that of traditional oral languages.Accordingly, the main concern of this notation is not detailed motions of signs themselves but linguistic structures (i.e., lexical and grammatical information) expressed through such motions.JSL sentences in our notation include signs, compounds of signs, punctuation marks, and non-manual syntactic markers.A sign is represented by the sign identifier (a Japanese word or phrase) and its inflection parameters. JSL sentences are transcribed in the text format with JIS characters.This makes existing text tools available for reading, writing and processing JSL sentences. We conducted a transcribing experiment to evaluate our notation system with 720 JSL sentences performed by native JSL signers, and found that 51 JSL expressions in the 49 sentences could not be sufficiently transcribed.We classify and investigate those expressions.
    Download PDF (7350K)
  • TAKASHI INUI, MANABU OKUMURA
    2006 Volume 13 Issue 3 Pages 201-241
    Published: July 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In these days, people can easily disseminate the information including their personal evaluative opinions for some products and services on the Internet.The massive amount of their information is beneficial for both product companies and users who are planning to purchase and use them.Because their information is mainly presented as textual form, in the research field of natural language processing, many researchers have devoted themselves to developing techniques for exploring, extracting, mining, and aggregating the opinions and sentiments.This sort of techniques are commonly called sentiment analysis.In this paper, we survey and present the research efforts of sentiment analysis from its fundamentals to the state-of-the-art methods.
    Download PDF (4525K)
  • TAMOTSU SHIRADO, SATOKO MARUMOTO, MASAKI MURATA, HITOSHI ISAHARA
    2006 Volume 13 Issue 3 Pages 243-260
    Published: July 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In Japan, politeness plays an important role in social activities, especially in conversations. However, honorific Japanese expressions are increasingly being misused. This misusage is a failure to use the honorific expressions in a way appropriate to the relative social positions assumed in a conversation.One of the causes of this misusage may be a lack of education on honorific conversations.Because honorific expressions take a long time to learn, computer assisted language learning systems for honorific expressions should be developed.We developed a computational system to check the usages of honorific expressions in Japanese speech sentences.The system can point out misused words and phrases, and can also indicate how they have been misused.The validity of the system was tested using “correct” sentences including no misused expressions, and “incorrect” sentences including misused expressions. The system was able to point out all the misusages in the incorrect sentences.It also judged most of the correct sentences as “correct” except some cases.
    Download PDF (1704K)
feedback
Top