Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

2007Volume 14Issue 5 Pages 1-2
Published: October 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.5_1

JOURNAL FREE ACCESS

Download PDF (223K)
A Study of the Position of Discourse Markers: Focusing on the Texts Whose Target Audience Was Intermediate Non-native Speakers of English

Xinyu Deng, Jun-ichi Nakamura

2007Volume 14Issue 5 Pages 3-40
Published: October 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.5_3

JOURNAL FREE ACCESS

Show abstractHide abstract

As an international language, English has become more and more important for nonnative speakers. Therefore, writers ought to consider the needs of non-native speakers, i.e. write English in a way that can be understood quite well by non-native audience. In this paper, we investigate the position of six discourse markers within the texts whose target audience was intermediate non-native speakers of English. The six discourse markers are: because and since, which represent “reason” relation; if and when, which represent “condition” relation; although and while, which represent “concession” / “contrast” relation. First, we created a corpus (200, 000 words) containing the texts (domain: natural and pure science) whose target audience was intermediate non-native speakers. We selected 1072 examples of the six discourse markers from the corpus, and annotated them. Second, a machine learning program C4.5 was applied to induce the classification models of the position of the discourse markers. And then we used Support Vector Machine (SVM) to verify the experiment results of C4.5. To our knowledge, this study is the first one on exploring the position of discourse markers within the texts whose target audience was intermediate non-native speakers. The experiment results can be applied to text generation and homepage creation for intermediate non-native speakers of English.

View full abstract

Download PDF (3828K)
A Construction of Large-scale Concept-base for Calculation of Degree of Association between Concepts

NORIYUKI OKUMURA, SEIJI TSUCHIYA, HIROKAZU WATABE, TSUKASA KAWAOKA

2007Volume 14Issue 5 Pages 41-64
Published: October 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.5_41

JOURNAL FREE ACCESS

Show abstractHide abstract

We human beings associate various words in daily conversation. For example, we naturally associate ‘Tire’, ‘Engine’, ‘Accident’, and so on with ‘Automobile’, and expand contents of conversation by association. Concept-base is the key role for achievement of association mechanism on computers. The meanings of words (concepts) are defined by attributes and weights in Concept-base. As construction method of Concept-base, it is suggested that concepts (about 40000 words) and attributes are picked up from descriptive texts on electronic dictionaries. However, the number of concepts and attributes picked up from dictionaries are small, and Concept-base has some problems about accuracy of association.
In this paper, Concept-base is expanded by coincidence information of general texts such as electronic newspapers based on Concept-base which is constructed from descriptive texts on electronic dictionaries, and it is suggested that a construction method of 120, 000 words scale Concept-base. In extension of Concept-base, first, basic concepts are gotten from descriptive texts on electronic dictionaries about each words which are mentioned in dictionaries and get attributes which have high reliability. Co-occurring words are gotten based on Concept-base which is made from electronic dictionaries as nomination of attributes from electronic newspapers. After this manipulation, improper attributes (noise attributes) are cut off using Degree of Association of attributes, and attributes' quality is made higher. In addition, weights (attributes' weights) of each attributes are given as weights often used in information retrieval and text mining by ascribing Concept-base to virtual documents. At the last, it is shown that accuracy of Concept-base made by suggested method is higher than accuracy of Concept-base made by only dictionaries using experiment of Degree of Association.

View full abstract

Download PDF (5112K)
A solution for the problem of Existential Expressionsin Japanese-Chinese Machine Translation

YIOU WANG, TAKASHI IKEDA

2007Volume 14Issue 5 Pages 65-105
Published: October 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.5_65

JOURNAL FREE ACCESS

Show abstractHide abstract

Existential sentence as one of primitive sentence patterns is very important for each language, and has characteristics of itself for different language. However the variety of syntactic and semantic use of existential expression and complicated correspondence to Chinese leads to ambiguities in Japanese-Chinese machine translation. Therefore there are numerous mistranslations by the currently commercially available translation software in existential expression, such as vocabulary selection and word order determination. In this paper, we propose a method for handling the existential verbs based on the constraint of Japanese syntactic and semantic features, Chinese syntactic features, the attributes of the related nouns and so on. Furthermore we implement the translation rules in Jaw/Chinese which is the Japanese to Chinese translation system developed by our lab and evaluate our rules. And we also made manual experiment over 700 existential sentences and get an accuracy of about 90%, which is rather high compared to the currently commercially available translation software. Both of the evaluations indicate that our method provides a high accuracy and is available.

View full abstract

Download PDF (7909K)
Ensemble Document Clustering Using Weighted Hypergraph Generated by NMF

HIROYUKI SHINNOU, MINORU SASAKI

2007Volume 14Issue 5 Pages 107-122
Published: October 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.5_107

JOURNAL FREE ACCESS

Show abstractHide abstract

In this paper, we propose a new ensemble clustering method using Non-negative Matrix Factorization (NMF).
NMF is a kind of the dimensional reduction method which is effective for high dimensional and sparse data like document data. NMF has the problem that the result depends on the initial value of the iteration. The standard countermeasure for this problem is that we generate multiple clustering results by changing the initial value, and then select the best clustering result estimated by the NMF decomposition error. However, this selection does not work well because the NMF decomposition error does not always measure the accuracy of the clustering.
To improve the clustering result of NMF, we propose a new ensemble clustering method. Our method generates multiple clustering results by using the random initialization of NMF. And they are integrated through the weighted hypergraph, which can directly be constructed through the result of NMF, instead of the traditional binary hypergraph.
In the experiment, we compared the k-means, NMF, the ensemble method using the standard hypergraph and the ensemble method using the weighted hypergraph (our method). Our method achieved best.

View full abstract

Download PDF (5130K)
A Dictionary of Japanese Functional Expressions with Hierarchical Organization

SUGURU MATSUYOSHI, SATOSHI SATO, TAKEHITO UTSURO

2007Volume 14Issue 5 Pages 123-146
Published: October 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.5_123

JOURNAL FREE ACCESS

Show abstractHide abstract

The Japanese language has a lot of functional expressions, each of which consists of more than one word and behaves like a single function word. A remarkable characteristic of Japanese functional expressions is that each functional expression has many different surface forms. This paper proposes a dictionary of Japanese functional expressions with hierarchical organization. We use a hierarchy with nine abstraction levels: the root node is a dummy node that governs all entries; a node in the first level is a headword in the dictionary; a leaf node corresponds to a surface form of a functional expression. We have compiled the dictionary with 341 headwords and 16, 771 surface forms, which covers almost all of surface forms for each headword.

View full abstract

Download PDF (2477K)
An Efficient and User-friendly Sinhala Input Method Based on Phonetic Transcription

Sandeva Goonetilleke, Yoshihiko Hayashi, Yuichi Itoh, Fumio Kishino

2007Volume 14Issue 5 Pages 147-166
Published: October 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.5_147

JOURNAL FREE ACCESS

Show abstractHide abstract

We propose an application-independent Sinhala character input method called Sri Shell with a principled key assignment based on phonetic transcription of Sinhala characters. A good character input method should fulfill two criteria, efficiency and user-friendliness. We have introduced several quantification methods to quantify the efficiency and user-friendliness of Sinhala character input methods. Experimental results prove the efficiency and user-friendliness of our proposed method.

View full abstract

Download PDF (1652K)
Automatic Detection of Japanese Compound Functional Expressions and its Application to Statistical Dependency Analysis

TAKAO SHIME, MASATOSHI TSUCHIYA, SUGURU MATSUYOSHI, TAKEHITO UTSURO, S ...

2007Volume 14Issue 5 Pages 167-197
Published: October 10, 2007
Released on J-STAGE: June 07, 2011

DOIhttps://doi.org/10.5715/jnlp.14.5_167

JOURNAL FREE ACCESS

Show abstractHide abstract

The Japanese language has many compound functional expressions which consist of more than one words including both content words and functional words, e.g., “_??_” and “_??_”. However, recognition and semantic interpretation of compound functional expressions are especially difficult because it often happens that one compound expressions may have both a literal content word usage and a non-literal functional usage. This paper proposes an approach of processing Japanese compound functional expressions by identifying them and analyzing their dependency relations through a machine learning technique. First, we formalize the task of identifying Japanese compound functional expressions in a text as a machine learning based chunking problem. Next, against the results of identifying compound functional expressions, we apply the method of dependency analysis based on the cascaded chunking model. In the experimental evaluation, we first show that the proposed method of chunking compound functional expressions significantly outperforms existing Japanese text processing tools. Next, we further show that, for many types of functional expressions, the cascaded chunking model applied to the results of identifying compound functional expressions outperforms the one applied to the results without identifying compound functional expressions.

View full abstract

Download PDF (8186K)
A Game-Theoretic Model of Referential Coherence and Its Statistical Verification Based on Large Japanese and English Corpora

SHUN SHIRAMATSU, KAZUNORI KOMATANI, KOITI HASIDA, TETSUYA OGATA, HIROS ...

2007Volume 14Issue 5 Pages 199-239
Published: October 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.5_199

JOURNAL FREE ACCESS

Show abstractHide abstract

Referential coherence represents smoothness of discourse resulting from topic continuity and pronominalization. By what principle do we select coherent expressions and interpretations? Centering theory, the standard theory of referential coherence, has not modeled the mechanism for selection of coherent expressions and interpretations. Our goals are as follows: (1) We aim to verify the hypothesis that models the principle of selecting expressions and interpretations on the basis of game theory (Hasida et al. 1995; Shiramatsu et al. 2005), using corpora of multiple languages.(2) We aim to investigate whether we can use expected utility as selection criterion, and to develop the mechanism of selecting expressions and interpretations for discourse processing systems in various languages.
For these purposes, we improved the meaning-game-based centering model (MGCM).Our improvement, the statistical design of the language-dependent parameters, enabled to acquire the parameters from a corpus of the target language. It also enabled verification of MGCM using corpora of various languages. We verified MGCM using Japanese and English corpora. We found out statistical evidences which supported the hypothesis that referential coherence was caused by selection with higher expected utility. This result indicates language universality of MGCM and the hypothesis.

View full abstract

Download PDF (5744K)

Register with J-STAGE for free!