Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Volume 13, Issue 2
Displaying 1-10 of 10 articles from this issue
  • [in Japanese]
    2006 Volume 13 Issue 2 Pages 1
    Published: April 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Download PDF (130K)
  • SHIGENORI NAKANO, AKIRA ADACHI, TAKENORI MAKINO
    2006 Volume 13 Issue 2 Pages 3-26
    Published: April 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes a mothod for detecting topic boundaries by topical conhesion profile measured by term repetition distances. A set of term repetitions composes topical potential. Total topical potentials compose topical cohesion profile that corresponds to dominant topics at hills and segement boundaries at valleys. Endlines of newspaper articles which are connected sequentially indicate topical segement boundaries. In the experiment the method is applied to test how many segement boundaries are detected at article endlines. The results of the experiment showed 67.8% in recall and 61.8% in precision. The method is available for long essays and effective to small texts.
    Download PDF (2282K)
  • MANABU SASSANO
    2006 Volume 13 Issue 2 Pages 27-41
    Published: April 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We explore how active learning with Support Vector Machines works well for a nontrivial task in natural language processing. We use Japanese word segmentation as a test case. In particular, we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool, more labeled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition, we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. The experimental results show that our technique requires less labeled examples than those with the technique in previous research. To achieve 97.0% accuracy, the proposed technique needs 59.3% of labeled examples that are required when using the previous technique and only 17.4% of labeled examples with random sampling.
    Download PDF (1341K)
  • TAKESHI ABEKAWA, MANABU OKUMURA
    2006 Volume 13 Issue 2 Pages 43-62
    Published: April 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We present a method of improving Japanese dependency parsing by using largescalestatistical information.Our method takes into account two types of information, which have not been considered in previous statistical (machine learning based) parsing methods.One is dependency relations among case elements of a verb, and the other is cooccurrence relations between a verb and its case element.We can collect the information for these relations from the results of automatic dependencyparsing of large-scale corpora.To show the effectiveness of our method, we made an experiment of dependency parsing, where our method tries to rerank the outputs of an existing machine learning based parsing method.From the results, we found that our method can improve the accuracy of the existing method.Furthermore, we pointed out that the relation between a verb and its modifying noun in a relative clause affects dependency parsing, and integrated our relative clause analysis method with the proposed parsing method.
    Download PDF (2120K)
  • SETSUO YAMADA, KENJI IMAMURA, KAZUHIDE YAMAMOTO
    2006 Volume 13 Issue 2 Pages 63-78
    Published: April 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Since the expansion of MT rules is currently being performed by humans, it is taking too long and is too expensive.This paper proposes a new procedure that expands MT rules efficiently by supporting human judgements with linguistic information automatically collected from monolingual corpora.An MT rule consists of source knowledge and target knowledge.The new procedure uses the source knowledge present in an MT system as the key to retrieve source language information from corpora.It also uses the partial translations provided by the MT to acquire target language information.These two techniques can reduce labor costs without being lower translation quality in comparison with the conventional method.Experimenta results confirm this benefit.
    Download PDF (1701K)
  • KOTARO FUNAKOSHI, SATORU WATANABE, NAOKO KURIYAMA, TAKENOBU TOKUNAGA
    2006 Volume 13 Issue 2 Pages 79-97
    Published: April 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Past work of generating referring expressions mainly utilized attributes of objects and binary relations between objects to distinguish the referent from other objects. However, such an approach does not work well when there is no distinctive attribute among objects.To overcome this limitation, this paper proposes a novel generation method utilizing the perceptual groups of objects and n-ary relations among them. With the proposed method, an expression like “the rightmost ball in the left cluster of three balls” can be generated.The key is to identify groups of objects that are naturally recognized by humans.We conducted psychological experiments with 42 subjects to collect referring expressions in such situations, and built a generation algorithm based on the results.The evaluation using another 23 subjects showed that the proposed method could effectively generate proper referring expressions.
    Download PDF (7550K)
  • HIROYUKI SAKAI, SHOUJI UMEMURA, SHIGERU MASUYAMA
    2006 Volume 13 Issue 2 Pages 99-123
    Published: April 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    We propose a method for extracting expressions concerning accident cause (e.g., “mishandling of the steering wheel control”) contained in articles of traffic accidents from a newspaper corpus.It is effective to develop traffic accident prevention devices by analyzing cause of the traffic accident cases obtained by our method.Our method extracts expressions concerning accident cause from articles of traffic accidents extracted as a preprocessing from a newspaper corpus by using SVMs.Here, we define an expression modified by expressions concerning accident cause as “a seed expression”.Our method acquires expressions concerning accident cause from an initial seed expression provided manually.Moreover, our method acquires seed expressions from the expressions concerning accident cause and acquires new expressions concerning accident cause from the acquired seed expressions.By iterating these processes, expressions concerning accident cause and seed expressions are acquired. Experimental results showed that our method attained 77.2% precison and 38.6% recall.Here, we define a sentence containing both an expression concerning accident cause and a seed expression or a sentence containing an expression that adds “rashii (seemto)” to an expression concerning accident cause as acause sentence and the precision and the recall of extraction of cause sentences attained 87.2% and 40.8%, respectively.
    Download PDF (12819K)
  • KENTARO TORISAWA
    2006 Volume 13 Issue 2 Pages 125-144
    Published: April 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    This paper proposes an automatic acquisition method for preparation roles and utilization roles, which are analogues of agentive and telic roles in Generative Lexicon Theory. Utilization roles roughly express the purpose and function of a given object, and are defined as paraphrases of expressions such as “using an object” or “enjoying an object.” A preparation role of an object is defined as an expression referring to a part of the preparation process to achieve the utilization roles of an object. We regard “reading a book” as a utilizarion role of the book, and regard “buying a book” or “opening a book” as preparation roles. We developed a method to acquire these roles for Japanese according to the assumptions that utilization roles and preparation roles can be characterized in terms of co-occurrence frequencies and that preparation roles and utilization roles of an object are likely to be preparation roles and utilization roles of other objects. We expect that the acquired roles are useful in various inferences, such as plan recognition, by intelligent agents.
    Download PDF (2360K)
  • NGUYEN MY CHAU, YUKI TANAKA, TAKASHI IKEDA
    2006 Volume 13 Issue 2 Pages 145-168
    Published: April 10, 2006
    Released on J-STAGE: June 07, 2011
    JOURNAL FREE ACCESS
    This paper presents a method to translate Japanese nominal modification structure [N1 no N2] on Japanese-Vietnamese machine translation system. As for Japanese, when a noun is modified by another noun, particle [no] is almost always used to link the two nouns together. The employed [no] conveys various dependency of meaning between the two nouns: nominalization of verbial complement, nominalization of adnominal modification of predicative noun, indication of possession or the whole or a portion and so on. As for Vietnamese, based on the semantic relationships between the two nouns, this structure is using various prepositions (σ, có, cua etc.) and divided into many expression forms with different word orders. The almost same problems also happen when Japanese is translated into English (various prepositions (at, in, with etc.) are used).So far there have been quite many researches on [N1 no N2] from the viewpoints of both linguistics and machine translation butthose researches only focused on Japanese-English language pair. With Vietnamese as the target language, this paper can be seen as the first one that tackled this problem on Japanese-Vietnamese machine translation.In this paper, we analyzed the problems of noun modified by other noun in Vietnamese, classified them into 6 types comparing to Japanese [N1 no N2], and proposed translation rules for [N1 no N2] on Japanese-Vietnamese machine translation system.The rules were applied to 270 phrases including the structure [N1 no N2] on our machine translation system jaw/Vietnamese, and our rules achieved about 70% of accuracy. As for methodology to disambiguate the translation of Japanese [N1 no N2], we think that Vietnamese is not much different from English since syntactic and semantic features can be similarly used as clues to disambiguate the translation of this structure. The point is the accumulation of Vietnamese linguistic phenomena and their analysis, as well as the analysis for the corresponding Japanese expressions. This paper is a research on those matters.
    Download PDF (2513K)
  • KATSUO TAMAOKA
    2006 Volume 13 Issue 2 Pages 169-179
    Published: April 10, 2006
    Released on J-STAGE: March 01, 2011
    JOURNAL FREE ACCESS
    Using ‘decision tree’ drawn by a statistical algorism CHAID in SPSS AnswerTree 3.0J, the present study investigated the collocation frequencies of three conjunctive particles (kara, node and noni) appearing in the middle or at the end of sentences, with seven selected adverbs (nanishiro, nanise, sekkaku, gen'ni, doose, jissai, and hontooni). Collocation frequencies were taken from the corpus of the Shinchoo Bunko Collection of 100 Novels. Analysis results depicted in the decision tree predict two different particle positions (middle and end) of three conjunctive particles appearing with the seven adverbs.Five noteworthy collocation tendencies were observed.First, the conjunctive particles node and kara showed distinctive differences in the middle and ending positions when appearing with the adverbs: node was seldom seen at the end of sentences (5 times, or 4.59%), while kara was often seen at the end (220 times, or 31.56%).Second, the combination of the adverb nanishiro and the conjunctive particle kara occurred most frequently at the end of sentences (140 out of 324 times, or 43.21%). Third, although kara occasionally appeared with the adverb sekkaku, this combination was seldom observed at the end of sentences (6 out of 67 times, or 8.96%). Fourth, the conjunctive particle kara showed a similar pattern of collocation frequencies in the middle and at the end of sentences when combined with the five adverbs nanise, gen'ni, doose, jissai and hontooni. Fifth, the conjunctive particle noni appeared in the middle and at the end of sentences (78.82% in the middle and 21.18% at the end), similar to the overall percentages of the conjunctive particles (72.73% in the middle and 27.27% at the end).As such, in structurally depicting the collocation frequencies of conjunctive particles and adverbs, ‘decision tree’ analysis has considerable potential as a statistical approach in future collocation studies.
    Download PDF (2279K)
feedback
Top