Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

2006 Volume 13 Issue 3 Pages 1-2
Published: July 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.3_1

JOURNAL FREE ACCESS

Download PDF (220K)
Probabilistic Formalization for Example-based Machine Translation

EIJI ARAMAKI, SADAO KUROHASHI, HIDEKI KASHIOKA, NAOTO KATO

2006 Volume 13 Issue 3 Pages 3-19
Published: July 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.3_3

JOURNAL FREE ACCESS

Show abstractHide abstract

Example-based machine translation (EBMT) systems, so far, rely on heuristic measures in retrieving translation examples.Such a heuristic measure costs time to adjust, and might make its algorithm unclear.This paper presents a probabilistic model for EBMT.Under the proposed model, the system searches the translation example combination which has the highest probability.The proposed model clearly formalizes EBMT process.In addition, the model can naturally incorporate the context similarity of translation examples.The experimental results demonstrate that the proposed model has a slightly better translation quality than state-of-the-art EBMT systems.

View full abstract

Download PDF (1522K)
Using Virtual Examples for Text Classification with Support Vector Machines

MANABU SASSANO

2006 Volume 13 Issue 3 Pages 21-35
Published: July 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.3_21

JOURNAL FREE ACCESS

Show abstractHide abstract

We explore how virtual examples (artificially created examples) improve performance of text classification with Support Vector Machines (SVMs).We propose techniques to create virtual examples for text classification based on the assumption that the category of a document is unchanged even if a small number of words are added or deleted. We evaluate the proposed methods by Reuters-21758 test set collection.Experimental results show virtual examples improve the performance of text classification with SVMs, especially for small training sets.

View full abstract

Download PDF (1313K)
Preference Dependency Grammar and its Packed Shared Data Structure “Dependency Forest”

HIDEKI HIRAKAWA

2006 Volume 13 Issue 3 Pages 37-90
Published: July 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.3_37

JOURNAL FREE ACCESS

Show abstractHide abstract

Preference Dependency Grammar (PDG) is a framework for the morphological, syntactic and semantic analysis for natural language sentences.PDG gives packed shared data structures to hold the various ambiguities in each level of sentence analysis with preference scores and a method for calculating the most plausible interpretation for a sentence.This paper describes the sentence analysis model named the “Multi-level Packed Shared Data Connection Model” adopted in PDG and shows the outline of the PDG framework.This paper describes the packed shared data structures, such as the Headed Parse Forest, the Dependency Forest adopted in PDG, and shows the completeness and the soundness of the mapping between the Parse Forest and the Dependency Forest.

View full abstract

Download PDF (9443K)
Automatic Slide Generation Based on Discourse Structure Analysis

TOMOHIDE SHIBATA, SADAO KUROHASHI

2006 Volume 13 Issue 3 Pages 91-111
Published: July 10, 2006
Released on J-STAGE: June 07, 2011

DOIhttps://doi.org/10.5715/jnlp.13.3_91

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we describe a method for automatically generating summary slides from a text.The slide consists of itemizations of extracted texts, and to determine their indentation, we need to analyze relations between sentences/clauses, such as contrast and elaboration.We first analyze the discourse structure of the text by considering three types of information: cue phrases, identification of word chain and similarity between two sentences.Then, we extract topic/non-topic parts from the text and generate the slide by placing the extracted texts, whose indentations are controlled according to the discourse structure.Our experiments demonstrate that generated slides are far easier to read in comparison with original texts.

View full abstract

Download PDF (6914K)
Augmenting a Semantic Verb Lexicon with a Large Scale Collection of Example Sentences

TORU HIRANO, RYU IIDA, ATSUSHI FUJITA, KENTARO INUI, YUJI MATSUMOTO

2006 Volume 13 Issue 3 Pages 113-132
Published: July 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.3_113

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a method of reducing the cost of annotating examples with argument structure in order to increase accuracy of argument structure analysis.First, a large raw corpus is parsed, and a large scale collection of example sentences is constructed from predicate-argument examples in the parsing results.Second, the collection of example sentences is clustered by using two similarities about verb.Finally, the acquired clusters are annotated with argument structure by human.We report preliminary experiments using our proposed method, and show that the method is effective in reducing the cost of annotating.

View full abstract

Download PDF (5131K)
Building a Paraphrase Corpus Based on Class-oriented Candidate Generation

ATSUSHI FUJITA, KENTARO INUI

2006 Volume 13 Issue 3 Pages 133-150
Published: July 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.3_133

JOURNAL FREE ACCESS

Show abstractHide abstract

Several classes of paraphrases have a potential to be compositionally explained byreferring to syntactic and semantic properties of constituent words: e.g., composing/decomposing compounds, voice/case alternation, various verb alternation, and lexical derivation.Toward analyzing the compositionality underlying these paraphrase classes, we have examined a class-oriented framework for collecting paraphrase examples, in which sentential paraphrases are collected for each paraphrase class separately by means of automatic candidate generation based on morpho-syntactic paraphrasing patterns, followed by manual judgement.Our preliminary experiments on building two paraphrase sub-corpora have so far been producing promising results with regard to cost-efficiency, exhaustiveness, and reliability.

View full abstract

Download PDF (5172K)
Related Term Collection

YASUHIRO SASAKI, SATOSHI SATO, TAKEHITO UTSURO

2006 Volume 13 Issue 3 Pages 151-175
Published: July 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.3_151

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes the related term collection problem and its solution.The related term collection problem is defined as collecting a dozen of technical terms that are closely related to a given seed term.In order to solve this problem, we use the Jaccard coefficient or the x² statistics on the Web, which is calculated by the search engine hits, for measuring relatedness between the given seed term and a candidate term.These measures also verify that the candidate term is a technical term.We have implemented a related term collection system, which consists of two modules. The first module collects candidate terms from the web pages that are retrieved by a search engine.The second module selects the terms that are closely related to the given term by using one of the above two measures.Experimental results show that the system can collect a dozen of closely related terms of the given term.

View full abstract

Download PDF (2556K)
A Japanese Gloss-based Written Notation for Japanese Sign Language

TADAHIRO MATSUMOTO, DAIKI HARADA, DAISUKE HARA, TAKASHI IKEDA

2006 Volume 13 Issue 3 Pages 177-200
Published: July 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.3_177

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper we propose a notation system for Japanese Sign Language (JSL).This notation system is aimed to help modularize the Japanese-JSL machine translation process and to bring the JSL generation problem closer to that of traditional oral languages.Accordingly, the main concern of this notation is not detailed motions of signs themselves but linguistic structures (i.e., lexical and grammatical information) expressed through such motions.JSL sentences in our notation include signs, compounds of signs, punctuation marks, and non-manual syntactic markers.A sign is represented by the sign identifier (a Japanese word or phrase) and its inflection parameters. JSL sentences are transcribed in the text format with JIS characters.This makes existing text tools available for reading, writing and processing JSL sentences. We conducted a transcribing experiment to evaluate our notation system with 720 JSL sentences performed by native JSL signers, and found that 51 JSL expressions in the 49 sentences could not be sufficiently transcribed.We classify and investigate those expressions.

View full abstract

Download PDF (7350K)
A Survey of Sentiment Analysis

TAKASHI INUI, MANABU OKUMURA

2006 Volume 13 Issue 3 Pages 201-241
Published: July 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.3_201

JOURNAL FREE ACCESS

Show abstractHide abstract

In these days, people can easily disseminate the information including their personal evaluative opinions for some products and services on the Internet.The massive amount of their information is beneficial for both product companies and users who are planning to purchase and use them.Because their information is mainly presented as textual form, in the research field of natural language processing, many researchers have devoted themselves to developing techniques for exploring, extracting, mining, and aggregating the opinions and sentiments.This sort of techniques are commonly called sentiment analysis.In this paper, we survey and present the research efforts of sentiment analysis from its fundamentals to the state-of-the-art methods.

View full abstract

Download PDF (4525K)
System for Pointing Out Honorific Misusages in Japanese Speech Sentences

TAMOTSU SHIRADO, SATOKO MARUMOTO, MASAKI MURATA, HITOSHI ISAHARA

2006 Volume 13 Issue 3 Pages 243-260
Published: July 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.3_243

JOURNAL FREE ACCESS

Show abstractHide abstract

In Japan, politeness plays an important role in social activities, especially in conversations. However, honorific Japanese expressions are increasingly being misused. This misusage is a failure to use the honorific expressions in a way appropriate to the relative social positions assumed in a conversation.One of the causes of this misusage may be a lack of education on honorific conversations.Because honorific expressions take a long time to learn, computer assisted language learning systems for honorific expressions should be developed.We developed a computational system to check the usages of honorific expressions in Japanese speech sentences.The system can point out misused words and phrases, and can also indicate how they have been misused.The validity of the system was tested using “correct” sentences including no misused expressions, and “incorrect” sentences including misused expressions. The system was able to point out all the misusages in the incorrect sentences.It also judged most of the correct sentences as “correct” except some cases.

View full abstract

Download PDF (1704K)

Register with J-STAGE for free!