Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

2006 Volume 13 Issue 2 Pages 1
Published: April 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.2_1

JOURNAL FREE ACCESS

Download PDF (130K)
Text segmentation based on term repetition distances

SHIGENORI NAKANO, AKIRA ADACHI, TAKENORI MAKINO

2006 Volume 13 Issue 2 Pages 3-26
Published: April 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.2_3

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a mothod for detecting topic boundaries by topical conhesion profile measured by term repetition distances. A set of term repetitions composes topical potential. Total topical potentials compose topical cohesion profile that corresponds to dominant topics at hills and segement boundaries at valleys. Endlines of newspaper articles which are connected sequentially indicate topical segement boundaries. In the experiment the method is applied to test how many segement boundaries are detected at article endlines. The results of the experiment showed 67.8% in recall and 61.8% in precision. The method is available for long essays and effective to small texts.

View full abstract

Download PDF (2282K)
Improving SVM Active Learning: An Empirical Study in Japanese Word Segmentation

MANABU SASSANO

2006 Volume 13 Issue 2 Pages 27-41
Published: April 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.2_27

JOURNAL FREE ACCESS

Show abstractHide abstract

We explore how active learning with Support Vector Machines works well for a nontrivial task in natural language processing. We use Japanese word segmentation as a test case. In particular, we discuss how the size of a pool affects the learning curve. It is found that in the early stage of training with a larger pool, more labeled examples are required to achieve a given level of accuracy than those with a smaller pool. In addition, we propose a novel technique to use a large number of unlabeled examples effectively by adding them gradually to a pool. The experimental results show that our technique requires less labeled examples than those with the technique in previous research. To achieve 97.0% accuracy, the proposed technique needs 59.3% of labeled examples that are required when using the previous technique and only 17.4% of labeled examples with random sampling.

View full abstract

Download PDF (1341K)
Japanese dependency parsing using co-coccurrence information and combination of case-slots

TAKESHI ABEKAWA, MANABU OKUMURA

2006 Volume 13 Issue 2 Pages 43-62
Published: April 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.2_43

JOURNAL FREE ACCESS

Show abstractHide abstract

We present a method of improving Japanese dependency parsing by using largescalestatistical information.Our method takes into account two types of information, which have not been considered in previous statistical (machine learning based) parsing methods.One is dependency relations among case elements of a verb, and the other is cooccurrence relations between a verb and its case element.We can collect the information for these relations from the results of automatic dependencyparsing of large-scale corpora.To show the effectiveness of our method, we made an experiment of dependency parsing, where our method tries to rerank the outputs of an existing machine learning based parsing method.From the results, we found that our method can improve the accuracy of the existing method.Furthermore, we pointed out that the relation between a verb and its modifying noun in a relative clause affects dependency parsing, and integrated our relative clause analysis method with the proposed parsing method.

View full abstract

Download PDF (2120K)
Efficient Expansion of MT Rules using Corpora

SETSUO YAMADA, KENJI IMAMURA, KAZUHIDE YAMAMOTO

2006 Volume 13 Issue 2 Pages 63-78
Published: April 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.2_63

JOURNAL FREE ACCESS

Show abstractHide abstract

Since the expansion of MT rules is currently being performed by humans, it is taking too long and is too expensive.This paper proposes a new procedure that expands MT rules efficiently by supporting human judgements with linguistic information automatically collected from monolingual corpora.An MT rule consists of source knowledge and target knowledge.The new procedure uses the source knowledge present in an MT system as the key to retrieve source language information from corpora.It also uses the partial translations provided by the MT to acquire target language information.These two techniques can reduce labor costs without being lower translation quality in comparison with the conventional method.Experimenta results confirm this benefit.

View full abstract

Download PDF (1701K)
Generating Referring Expressions based-on Perceptual Grouping

KOTARO FUNAKOSHI, SATORU WATANABE, NAOKO KURIYAMA, TAKENOBU TOKUNAGA

2006 Volume 13 Issue 2 Pages 79-97
Published: April 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.2_79

JOURNAL FREE ACCESS

Show abstractHide abstract

Past work of generating referring expressions mainly utilized attributes of objects and binary relations between objects to distinguish the referent from other objects. However, such an approach does not work well when there is no distinctive attribute among objects.To overcome this limitation, this paper proposes a novel generation method utilizing the perceptual groups of objects and n-ary relations among them. With the proposed method, an expression like “the rightmost ball in the left cluster of three balls” can be generated.The key is to identify groups of objects that are naturally recognized by humans.We conducted psychological experiments with 42 subjects to collect referring expressions in such situations, and built a generation algorithm based on the results.The evaluation using another 23 subjects showed that the proposed method could effectively generate proper referring expressions.

View full abstract

Download PDF (7550K)
Extraction of Expressions concerning Accident Cause contained in Articles on Traffic Accidents

HIROYUKI SAKAI, SHOUJI UMEMURA, SHIGERU MASUYAMA

2006 Volume 13 Issue 2 Pages 99-123
Published: April 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.2_99

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a method for extracting expressions concerning accident cause (e.g., “mishandling of the steering wheel control”) contained in articles of traffic accidents from a newspaper corpus.It is effective to develop traffic accident prevention devices by analyzing cause of the traffic accident cases obtained by our method.Our method extracts expressions concerning accident cause from articles of traffic accidents extracted as a preprocessing from a newspaper corpus by using SVMs.Here, we define an expression modified by expressions concerning accident cause as “a seed expression”.Our method acquires expressions concerning accident cause from an initial seed expression provided manually.Moreover, our method acquires seed expressions from the expressions concerning accident cause and acquires new expressions concerning accident cause from the acquired seed expressions.By iterating these processes, expressions concerning accident cause and seed expressions are acquired. Experimental results showed that our method attained 77.2% precison and 38.6% recall.Here, we define a sentence containing both an expression concerning accident cause and a seed expression or a sentence containing an expression that adds “rashii (seemto)” to an expression concerning accident cause as acause sentence and the precision and the recall of extraction of cause sentences attained 87.2% and 40.8%, respectively.

View full abstract

Download PDF (12819K)
Automatically Acquiring Natural Language Expressions Representing Preparation and Utilization of an Object

KENTARO TORISAWA

2006 Volume 13 Issue 2 Pages 125-144
Published: April 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.2_125

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes an automatic acquisition method for preparation roles and utilization roles, which are analogues of agentive and telic roles in Generative Lexicon Theory. Utilization roles roughly express the purpose and function of a given object, and are defined as paraphrases of expressions such as “using an object” or “enjoying an object.” A preparation role of an object is defined as an expression referring to a part of the preparation process to achieve the utilization roles of an object. We regard “reading a book” as a utilizarion role of the book, and regard “buying a book” or “opening a book” as preparation roles. We developed a method to acquire these roles for Japanese according to the assumptions that utilization roles and preparation roles can be characterized in terms of co-occurrence frequencies and that preparation roles and utilization roles of an object are likely to be preparation roles and utilization roles of other objects. We expect that the acquired roles are useful in various inferences, such as plan recognition, by intelligent agents.

View full abstract

Download PDF (2360K)
Translation of Structure [N₁ no N₂] in Japanese-Vietnamese Machine Translation

NGUYEN MY CHAU, YUKI TANAKA, TAKASHI IKEDA

2006 Volume 13 Issue 2 Pages 145-168
Published: April 10, 2006
Released on J-STAGE: June 07, 2011

DOIhttps://doi.org/10.5715/jnlp.13.2_145

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper presents a method to translate Japanese nominal modification structure [N₁ no N₂] on Japanese-Vietnamese machine translation system. As for Japanese, when a noun is modified by another noun, particle [no] is almost always used to link the two nouns together. The employed [no] conveys various dependency of meaning between the two nouns: nominalization of verbial complement, nominalization of adnominal modification of predicative noun, indication of possession or the whole or a portion and so on. As for Vietnamese, based on the semantic relationships between the two nouns, this structure is using various prepositions (σ, có, cua etc.) and divided into many expression forms with different word orders. The almost same problems also happen when Japanese is translated into English (various prepositions (at, in, with etc.) are used).So far there have been quite many researches on [N₁ no N₂] from the viewpoints of both linguistics and machine translation butthose researches only focused on Japanese-English language pair. With Vietnamese as the target language, this paper can be seen as the first one that tackled this problem on Japanese-Vietnamese machine translation.In this paper, we analyzed the problems of noun modified by other noun in Vietnamese, classified them into 6 types comparing to Japanese [N₁ no N₂], and proposed translation rules for [N₁ no N₂] on Japanese-Vietnamese machine translation system.The rules were applied to 270 phrases including the structure [N₁ no N₂] on our machine translation system jaw/Vietnamese, and our rules achieved about 70% of accuracy. As for methodology to disambiguate the translation of Japanese [N₁ no N₂], we think that Vietnamese is not much different from English since syntactic and semantic features can be similarly used as clues to disambiguate the translation of this structure. The point is the accumulation of Vietnamese linguistic phenomena and their analysis, as well as the analysis for the corresponding Japanese expressions. This paper is a research on those matters.

View full abstract

Download PDF (2513K)
Possibility of ‘Decision Tree’ Analysis on Collocation Frequencies: In the Case of Conjunctive Particles kara, node and noni Co-occurring with Adverbs at the Middle and End of Sentences

KATSUO TAMAOKA

2006 Volume 13 Issue 2 Pages 169-179
Published: April 10, 2006
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.13.2_169

JOURNAL FREE ACCESS

Show abstractHide abstract

Using ‘decision tree’ drawn by a statistical algorism CHAID in SPSS AnswerTree 3.0J, the present study investigated the collocation frequencies of three conjunctive particles (kara, node and noni) appearing in the middle or at the end of sentences, with seven selected adverbs (nanishiro, nanise, sekkaku, gen'ni, doose, jissai, and hontooni). Collocation frequencies were taken from the corpus of the Shinchoo Bunko Collection of 100 Novels. Analysis results depicted in the decision tree predict two different particle positions (middle and end) of three conjunctive particles appearing with the seven adverbs.Five noteworthy collocation tendencies were observed.First, the conjunctive particles node and kara showed distinctive differences in the middle and ending positions when appearing with the adverbs: node was seldom seen at the end of sentences (5 times, or 4.59%), while kara was often seen at the end (220 times, or 31.56%).Second, the combination of the adverb nanishiro and the conjunctive particle kara occurred most frequently at the end of sentences (140 out of 324 times, or 43.21%). Third, although kara occasionally appeared with the adverb sekkaku, this combination was seldom observed at the end of sentences (6 out of 67 times, or 8.96%). Fourth, the conjunctive particle kara showed a similar pattern of collocation frequencies in the middle and at the end of sentences when combined with the five adverbs nanise, gen'ni, doose, jissai and hontooni. Fifth, the conjunctive particle noni appeared in the middle and at the end of sentences (78.82% in the middle and 21.18% at the end), similar to the overall percentages of the conjunctive particles (72.73% in the middle and 27.27% at the end).As such, in structurally depicting the collocation frequencies of conjunctive particles and adverbs, ‘decision tree’ analysis has considerable potential as a statistical approach in future collocation studies.

View full abstract

Download PDF (2279K)

Register with J-STAGE for free!