Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 11, Issue 4
Displaying 1-8 of 8 articles from this issue
  • [in Japanese]
    2004 Volume 11 Issue 4 Pages 1-2
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (217K)
  • JUN'ICHI KAZAMA, YUSUKE MIYAO, JUN'ICHI TSUJII
    2004 Volume 11 Issue 4 Pages 3-23
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We describe a new tagging model where the states of a hidden Markov model (HMM) estimated by unsupervised learning are incorporated as the features in a maximumentropy model.Our method for exploiting unsupervised learning of a probabilisticmodel can reduce the cost of building taggers with a small annotated corpus.Experimentalresults on English POS tagging and Japanese word segmentation showthat our method greatly improves the tagging accuracy when the model is trainedwith a small annotated corpus.Furthermore, our English POS tagger achieved astate-of-the-art PUS tagging accuracy (96.84%) when a large annotated corpus isavailable.
    Download PDF (2309K)
  • Kenji Watanabe, Masahiro Miyazaki
    2004 Volume 11 Issue 4 Pages 25-66
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper will report on how a new system of semantic processing could generatea breakthrough in concepts free from the limitations of conventional semantic processingbased on existing case patterns in existing thesauri.We will also discuss, inorder to realize a more advanced system of semantic processing, what kind of linguisticknowledge is needed.Finally, we will examine how to collect and structuralizethis knowledge.
    Our assumptions are as follows: 1.A polysemy has one basic semantic core and manymeanings are derived from this semantic core, depending on how it is interpreted.2.When dealing with abstract concepts, we replace them with more concrete entitiesthat can be directly felt with five senses.Within the framework of basic Japanese and English verbs from which basic words are derived and through which we recognizeexternal objects, their core concepts will be analyzed.We will analyze “recognitionprimitives, ” from which we acquire meanings and usages for concrete objects.Wewill try to describe perceptible notions of these core concepts by analyzing a numberof important polysemous verbs.
    Download PDF (3483K)
  • KOICHI YAMASHITA, KEIICHI YOSHIDA, YUKIHIRO ITOH
    2004 Volume 11 Issue 4 Pages 67-88
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    In this paper, we propose a new method for verb sense disambiguation.Word sensedisambiguation (WSD) has been recognized as one of the most important subjects innatural language processing, and there has been several reports on the subject. Mostof previous works can be classified into two approaches from the viewpoint of thetreatment of context including target word;an approach using some words around atarget word (n-word window) and one using syntactic relations (selectional restriction). However, each treatment in these two approaches is different from each other, consequently there is a limitation in an accuracy. We can make the statement thatour method has the merits on both previous approaches, because our method usesthe whole dependency structure of a sentence. We find a similarity between contextsbased on a pairwise alignment technique which is used generally to measure a similarityon DNA sequences. Using our method, we can achieve WSD in more flexiblyand robustly than the methods proposed previously. In our experiment, we obtainedan accuracy of 81.1% on average by the new method with supervised learning byhand.
    Download PDF (2142K)
  • TAKEHIKO YOSHIMI, TAKESHI KUTSUMI, KATSUNORI KOTANI, ICHIKO SATA, HITO ...
    2004 Volume 11 Issue 4 Pages 89-103
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a method of extracting English compound words and their Japanese equivalents from a parallel corpus.The aim of our research is to extractcompound words which are not listed in a dictionary of an English-to-Japanese MTsystem and appear infrequently in a parallel corpus.Our method makes its alignmenton the basis of two kinds of external evidence provided by the context in which abilingual pair appears, as well as two kinds of internal evidence within the pair.Eachkind of evidence is accompanied by a score, and the aggregate score is computed asa weighted sum of the scores.The appropriate weights are estimated with the logisticregression analysis.An experiment using a parallel corpus of Yomiuri Shimbunand The Daily Yomiuri satisfactorily found that 86.36% of the extracted bilingualpairs with the highest scores and 95.08% with the top two scores were judged to becorrect.
    Download PDF (1729K)
  • MITSUO SHIMOHATA, EIICHIRO SUMITA, YUJI MATSUMOTO
    2004 Volume 11 Issue 4 Pages 105-126
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    When we apply input sentences of spoken language to a machine translation, wesometimes cannot get proper translations due to the characteristics of spoken language.In this paper, we propose a method for recovering proper translations bycombining similar sentence retrieval with machine translation when it is difficult toget a proper translation of the input sentence. If a given input sentence is found tobe difficult to translate properly, a sentence similar to the input sentence is retrievedfrom a corpus of translatable sentences. The similarity between the candidate and theinput sentence is determined from the ratio of the N-gram overlap. In addition, weuse two additional conditions to improve the retrieval performance: excluding candidatesentences with a content word that does not exist in the input sentence, anddecreasing the weight of functional words.In an experiment of retrieval in Japanese, our method outputs retrieved sentences for 87.2% of all input sentences and 60.4%of them are similar sentences. In an experiment of combining our method and machinetranslation, in which untranslatable input sentences are replaced with similarsentences from a translatable corpus, our method recovered proper translations from25.9%of the untranslatable input sentences.
    Download PDF (2243K)
  • YOJI KIYOTA, SADAO KUROHASHI, FUYUKO KIDO
    2004 Volume 11 Issue 4 Pages 127-145
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a method of extracting metonymic expressions and their interpretativeexpressions from corpora and its application to the full-parsing-basedmatching method of a QA system Dialog Navigator.Namely, our method resolvesmodifier-head relation gaps between user questions and texts, by registering pairs ofmetonymic expressions (e.g. “display a GIF”) and interpretative expressions (e.g. “display a GIF file”) into the synonymous expression dictionary of Dialog Navigator.An evaluation showed that most of the extracted interpretations were correct, andan experiment using testsets indicated that introducing the metonymic expressionssignificantly improved the performance of our system.
    Download PDF (1795K)
  • SATORU IKEHARA, MASATO TOKUHISA, NAO TAKEUCHI (MURAMOTO), JIN'ICHI MUR ...
    2004 Volume 11 Issue 4 Pages 147-178
    Published: October 10, 2004
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Pattern based MT has drawn attention for long time since it yields good translationsfor matched sentences. But it has been difficult problem how to build the patternpair dictionaries which have a huge number of semantically independent patterns toobtain a high cover ratio.This paper experimentally evaluated the cover ratio ofthe pattern pair dictionary which has recently been developed for Japanese Complexand Compound sentences and studied possibility of pattern based MT method. This dictionary contains syntactic sentence patterns of Word Level (121, 000 patterns), Phrase Level (88, 000 patterns) and Clause Level (11, 000 patterns) which are generatedfrom 150, 000 example sentence pairs for Japanese to English.Evaluation wasconducted by using 4 parameters such as “Sentence Recall Ratio, ” “Sentence Coincide Ratio, ” “Semantic Precision Ratio, ” and “Matched Pattern Precision Ratio.” The results are as follows. “Sentence Recall Ratios” are 70%, 89% and 78% foreach of Word level, Phrase Level and Clause Level sentence patterns, and “Matched Pattern Precision Ratio” of Word Level sentence patterns is 21%. Though “Matched Pattern Precision Ratio” was low, it was carified that there are many ways left toincrease the matched patterns.
    Download PDF (3427K)
feedback
Top