Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

2004Volume 11Issue 4 Pages 1-2
Published: October 10, 2004
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.11.4_1

JOURNAL FREE ACCESS

Download PDF (217K)
A Maximum Entropy Tagging Model with Unsupervised Hidden Markov Models

JUN'ICHI KAZAMA, YUSUKE MIYAO, JUN'ICHI TSUJII

2004Volume 11Issue 4 Pages 3-23
Published: October 10, 2004
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.11.4_3

JOURNAL FREE ACCESS

Show abstractHide abstract

We describe a new tagging model where the states of a hidden Markov model (HMM) estimated by unsupervised learning are incorporated as the features in a maximumentropy model.Our method for exploiting unsupervised learning of a probabilisticmodel can reduce the cost of building taggers with a small annotated corpus.Experimentalresults on English POS tagging and Japanese word segmentation showthat our method greatly improves the tagging accuracy when the model is trainedwith a small annotated corpus.Furthermore, our English POS tagger achieved astate-of-the-art PUS tagging accuracy (96.84%) when a large annotated corpus isavailable.

View full abstract

Download PDF (2309K)
A Description of Core Concepts for Basic Verbs in Japanese and English based on their Recognition Primitives

Kenji Watanabe, Masahiro Miyazaki

2004Volume 11Issue 4 Pages 25-66
Published: October 10, 2004
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.11.4_25

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper will report on how a new system of semantic processing could generatea breakthrough in concepts free from the limitations of conventional semantic processingbased on existing case patterns in existing thesauri.We will also discuss, inorder to realize a more advanced system of semantic processing, what kind of linguisticknowledge is needed.Finally, we will examine how to collect and structuralizethis knowledge.
Our assumptions are as follows: 1.A polysemy has one basic semantic core and manymeanings are derived from this semantic core, depending on how it is interpreted.2.When dealing with abstract concepts, we replace them with more concrete entitiesthat can be directly felt with five senses.Within the framework of basic Japanese and English verbs from which basic words are derived and through which we recognizeexternal objects, their core concepts will be analyzed.We will analyze “recognitionprimitives, ” from which we acquire meanings and usages for concrete objects.Wewill try to describe perceptible notions of these core concepts by analyzing a numberof important polysemous verbs.

View full abstract

Download PDF (3483K)
Verb Sense Disambiguation Based on Pairwise Alignment

KOICHI YAMASHITA, KEIICHI YOSHIDA, YUKIHIRO ITOH

2004Volume 11Issue 4 Pages 67-88
Published: October 10, 2004
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.11.4_67

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a new method for verb sense disambiguation.Word sensedisambiguation (WSD) has been recognized as one of the most important subjects innatural language processing, and there has been several reports on the subject. Mostof previous works can be classified into two approaches from the viewpoint of thetreatment of context including target word;an approach using some words around atarget word (n-word window) and one using syntactic relations (selectional restriction). However, each treatment in these two approaches is different from each other, consequently there is a limitation in an accuracy. We can make the statement thatour method has the merits on both previous approaches, because our method usesthe whole dependency structure of a sentence. We find a similarity between contextsbased on a pairwise alignment technique which is used generally to measure a similarityon DNA sequences. Using our method, we can achieve WSD in more flexiblyand robustly than the methods proposed previously. In our experiment, we obtainedan accuracy of 81.1% on average by the new method with supervised learning byhand.

View full abstract

Download PDF (2142K)
Integrated Use of Internal and External Evidence in the Alignment of Compound Words

TAKEHIKO YOSHIMI, TAKESHI KUTSUMI, KATSUNORI KOTANI, ICHIKO SATA, HITO ...

2004Volume 11Issue 4 Pages 89-103
Published: October 10, 2004
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.11.4_89

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a method of extracting English compound words and their Japanese equivalents from a parallel corpus.The aim of our research is to extractcompound words which are not listed in a dictionary of an English-to-Japanese MTsystem and appear infrequently in a parallel corpus.Our method makes its alignmenton the basis of two kinds of external evidence provided by the context in which abilingual pair appears, as well as two kinds of internal evidence within the pair.Eachkind of evidence is accompanied by a score, and the aggregate score is computed asa weighted sum of the scores.The appropriate weights are estimated with the logisticregression analysis.An experiment using a parallel corpus of Yomiuri Shimbunand The Daily Yomiuri satisfactorily found that 86.36% of the extracted bilingualpairs with the highest scores and 95.08% with the top two scores were judged to becorrect.

View full abstract

Download PDF (1729K)
A Method for Retrieving a Similar Sentence and Its Application to Speech Translation

MITSUO SHIMOHATA, EIICHIRO SUMITA, YUJI MATSUMOTO

2004Volume 11Issue 4 Pages 105-126
Published: October 10, 2004
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.11.4_105

JOURNAL FREE ACCESS

Show abstractHide abstract

When we apply input sentences of spoken language to a machine translation, wesometimes cannot get proper translations due to the characteristics of spoken language.In this paper, we propose a method for recovering proper translations bycombining similar sentence retrieval with machine translation when it is difficult toget a proper translation of the input sentence. If a given input sentence is found tobe difficult to translate properly, a sentence similar to the input sentence is retrievedfrom a corpus of translatable sentences. The similarity between the candidate and theinput sentence is determined from the ratio of the N-gram overlap. In addition, weuse two additional conditions to improve the retrieval performance: excluding candidatesentences with a content word that does not exist in the input sentence, anddecreasing the weight of functional words.In an experiment of retrieval in Japanese, our method outputs retrieved sentences for 87.2% of all input sentences and 60.4%of them are similar sentences. In an experiment of combining our method and machinetranslation, in which untranslatable input sentences are replaced with similarsentences from a translatable corpus, our method recovered proper translations from25.9%of the untranslatable input sentences.

View full abstract

Download PDF (2243K)
Resolution of Modifier-Head Relation Gaps using Automatically Extracted Metonymic Expressions

YOJI KIYOTA, SADAO KUROHASHI, FUYUKO KIDO

2004Volume 11Issue 4 Pages 127-145
Published: October 10, 2004
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.11.4_127

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a method of extracting metonymic expressions and their interpretativeexpressions from corpora and its application to the full-parsing-basedmatching method of a QA system Dialog Navigator.Namely, our method resolvesmodifier-head relation gaps between user questions and texts, by registering pairs ofmetonymic expressions (e.g. “display a GIF”) and interpretative expressions (e.g. “display a GIF file”) into the synonymous expression dictionary of Dialog Navigator.An evaluation showed that most of the extracted interpretations were correct, andan experiment using testsets indicated that introducing the metonymic expressionssignificantly improved the performance of our system.

View full abstract

Download PDF (1795K)
Study of Cover Ratio of Syntactic Sentence Patterns for Japanese Complex Sentences

SATORU IKEHARA, MASATO TOKUHISA, NAO TAKEUCHI (MURAMOTO), JIN'ICHI MUR ...

2004Volume 11Issue 4 Pages 147-178
Published: October 10, 2004
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.11.4_147

JOURNAL FREE ACCESS

Show abstractHide abstract

Pattern based MT has drawn attention for long time since it yields good translationsfor matched sentences. But it has been difficult problem how to build the patternpair dictionaries which have a huge number of semantically independent patterns toobtain a high cover ratio.This paper experimentally evaluated the cover ratio ofthe pattern pair dictionary which has recently been developed for Japanese Complexand Compound sentences and studied possibility of pattern based MT method. This dictionary contains syntactic sentence patterns of Word Level (121, 000 patterns), Phrase Level (88, 000 patterns) and Clause Level (11, 000 patterns) which are generatedfrom 150, 000 example sentence pairs for Japanese to English.Evaluation wasconducted by using 4 parameters such as “Sentence Recall Ratio, ” “Sentence Coincide Ratio, ” “Semantic Precision Ratio, ” and “Matched Pattern Precision Ratio.” The results are as follows. “Sentence Recall Ratios” are 70%, 89% and 78% foreach of Word level, Phrase Level and Clause Level sentence patterns, and “Matched Pattern Precision Ratio” of Word Level sentence patterns is 21%. Though “Matched Pattern Precision Ratio” was low, it was carified that there are many ways left toincrease the matched patterns.

View full abstract

Download PDF (3427K)

Register with J-STAGE for free!