Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

2002 Volume 9 Issue 5 Pages 1-2
Published: October 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.5_1

JOURNAL FREE ACCESS

Download PDF (192K)
Chunking with Support Vector Machines

TAKU KUDO, YUJI MATSUMOTO

2002 Volume 9 Issue 5 Pages 3-21
Published: October 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.5_3

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we apply Support Vector Machines (SVMs) to identify English base phrases (chunks).It is well-known that SVMs achieve high generalization performance even using input data with a high dimensional feature space.Furthermore, by introducing the Kernel principle, SVMs can carry out training with smaller computational cost independent of the dimensionality of the feature space.In order to improve accuracy, we also apply majority voting with 8 SVMs which are trained using distinct chunk representations.Experimental results show that our approach achieves better accuracy than other conventional frameworks.

View full abstract

Download PDF (1838K)
Hierarchical Phrase Alignment Harmonized with Parsing

KENJI IMAMURA

2002 Volume 9 Issue 5 Pages 23-42
Published: October 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.5_23

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a hierarchical phrase alignment method that aims to acquire translation knowledge.Previous methods utilize the correspondence of sub-trees between bilingual parsing trees after determining the parsing result.The method described in this paper combines partial tree candidates, and selects the best sequence of partial trees.Then, a structural similarity measure (called a`phrase score') is used for evaluation.A forward DP backward Asearch algorithm is applied in order to combine partial trees.Using this method, about twice as many as equivalent phrases were extracted experimentally, and almost no deterioration was observed.
This method employs word alignment.The accuracy of the phrase alignment increases when we consider the word correspondences between not only content words but also functional words.In addition, we found that a word alignment method with a high recall rate is suitable for this method.

View full abstract

Download PDF (1834K)
Counting documents that contain substrings more than κ times

KYOJI UMEMURA, AKIKO SANADA

2002 Volume 9 Issue 5 Pages 43-70
Published: October 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.5_43

JOURNAL FREE ACCESS

Show abstractHide abstract

The statistics we compute is dfκ: the number of documents which contain certain strings more than κ times.We can hardly keep the statistics of all substrings because we need 0 (N²) space where N is the size of corpus.Yamamoto et al.show that it is possible to produce a table for κ=1 in 0 (N) space using Suffix Array and the concept of “class of string”.However, this method cannot solve the problem where κ≥2.We present an algorithm that can be used for κ≥2 and we can compute the statistics by using the table.In this report, we explain dfκ and compare the proposed algorithm with simple methods.This algorithm takes O (N log N) time and O (N) space to produce the table and O (log N) time to obtain statistics from the table.

View full abstract

Download PDF (2055K)
A Method of Metaphoricity Detection using Probabilistic Measurements

FUMITO MASUI, JUN'ICHI FUKUMOTO, TSUTOMU SHIINO, ATSUO KAWAI

2002 Volume 9 Issue 5 Pages 71-92
Published: October 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.5_71

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose a method to detect metaphoricity between words with probabilistic measurements. In order to detect metaphoricity, we have introduced two probabilistic measurements: “salience gap” and “novelty.” The salience gap measures strength of closed-up property set between a concept pair and has contribution to separate concept pairs into anomalous and others. The measurement can be computed by probabilities of properties in each concept representation. The novelty measures how surprisingly a concept combination is, and contributes to extract anomalous relation rom concept pairs. The measurement can be calculated using word similarity. Using both measurements, concept pairs can be classified into metaphorical, literal and anomalous. For the evaluation of our metaphoricity detection model, we have used one-year newspaper articles and 100 sets of word combinations including three kinds of relations: metaphorical, literal and anomalous. In the experimental results, precision attained 70 percent for dividing metaphorical word pairs from others. It can be considered that performance of our method is useful.

View full abstract

Download PDF (2545K)
A Method of a Concept-base Construction for an Association System

Deciding Attribute Weights Based on the Degree of Attribute Reliability

KAZUHIDE KOJIMA, HIROKAZU WATABE, TSUKASA KAWAOKA

2002 Volume 9 Issue 5 Pages 93-110
Published: October 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.5_93

JOURNAL FREE ACCESS

Show abstractHide abstract

To realize computers understanding natural language needs an association-system which outputs words strongly related to input words. This study aims to construct a concept-base which is a main element of the association-system. In the concept-base, the meaning of a word is defined by a set of an attribute expressing the feature of a word and the weight representing the importance to the word. In our study, we model concepts as a chain of words defined by the concept-base. The first concept-base automatically constructed contains not a few unsuitable attributes and, therefore, the reliability of weights is also questionable. Making the automatically constructed concept-base a starting point, we are aiming to achieve a new refining method based on the reliability of attributes so that noises will be removed and more appropriate weight will be gained. Moreover, this paper shows effects of the proposed method by presenting an evaluation by human senses and an experiment that utilizes the degree of association in test data.

View full abstract

Download PDF (1899K)
A Translation Method of Expressions Containing the “Toritate” Words “mo, sae, demo” in Japanese-Chinese Machine Translation

ZHAOHUI BU, JUN XIE, TAKASHI IKEDA

2002 Volume 9 Issue 5 Pages 111-130
Published: October 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.5_111

JOURNAL FREE ACCESS

Show abstractHide abstract

Words such as “mo”, “sae” and “demo” are particular function words in Japanese, and are known as “toritate” words. They have a variety of syntactic and semantic uses, and complicated corresponding relations to Chinese leads to ambiguities in Japanese-Chinese machine translation. Thus the use of current commercially available machine translation software results in numerous mistranslations of these words, in terms of vocabulary selected and word order determination.
In this paper, we propose a method for disambiguating the meaning of expressions containing the “toritate” words “mo”, “sae” and “demo” by referring to the following syntactic and semantic features: (1) the features of the scope of the “toritate” word (it may be NP or VP), (2) the features of the predicate that are related to the “toritate” word, and (3) the features of the corresponding Chinese word for the Japanese scope of the “toritate” word. The positions of these “toritate” words in Chinese are determined according to their syntactic rules as well as the grammatical role of the scope of the “toritate” words in Chinese.
We evaluated our translation algorithm manually using 100 example sentences each for “mo”, “sae” and “demo”. The translation accuracy for each of these words was over 80%, indicating that our method provides a more accurate Japanese-Chinese translation than currently commercially available translation software.

View full abstract

Download PDF (2009K)
Supporting Conference Program Production Using Natural Language Processing Technologies

HIROMI ITOH OZAKU, MASAO UTIYAMA, MASAKI MURATA, KIYOTAKA UCHIMOTO, HI ...

2002 Volume 9 Issue 5 Pages 131-148
Published: October 10, 2002
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.9.5_131

JOURNAL FREE ACCESS

Show abstractHide abstract

We applied natural language processing technologies to automatically produce a program for the sixth annual meeting of the Association for Natural Language Processing. In this paper, we describe experiments used to automatically generate the program using the fifth annual meeting data.We produce the sixth annual meeting program on the basis of the experiments.We report the process of making the sixth annual meeting program in practice and show to what extent the natural language processing technologies are efficient for this task. Furthermore, we show the results of a questionnaire targeting the participants of the sixth annual meeting.

View full abstract

Download PDF (1989K)

Register with J-STAGE for free!