Journal of Natural Language Processing

[title in Japanese]

[in Japanese]

2007 Volume 14 Issue 1 Pages 1-2
Published: January 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.1

JOURNAL FREE ACCESS

Download PDF (265K)
A Linear-Time Algorithm for Dependency Analysis of Japanese

MANABU SASSANO

2007 Volume 14 Issue 1 Pages 3-18
Published: January 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.3

JOURNAL FREE ACCESS

Show abstractHide abstract

We present a novel algorithm for Japanese dependency analysis. The algorithm allows us to analyze dependency structures of a sentence in linear-time while keeping a stateof-the-art accuracy. In this paper, we show a formal description of the algorithm and discuss it theoretically with respect to time complexity. In addition, we evaluate its efficiency and performance empirically against the Kyoto University Corpus Version 2. The proposed algorithm with improved models for dependency yields the best accuracy in the previously published results on the Kyoto University Corpus Version 2.

View full abstract

Download PDF (1582K)
Japanese Parser Generating Suitable Syntactic Structures for Meaning

YUTAKA TAKEMOTO, MASAHIRO MIYAZAKI

2007 Volume 14 Issue 1 Pages 19-42
Published: January 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.19

JOURNAL FREE ACCESS

Show abstractHide abstract

Japanese is a free word order language and ellipsis frequently occur compared with English. Japanese sentences are not suitable to be parsed by the phrase structure analysis and is generally parsed by the dependency analysis. Tree structures by the phrase structure analysis does not describe well in expressions with crossed dependencies and in words with duality of parts of speech. The major syntactic analysis based on Japanese school grammar does not generate suitable syntactic structures for meaning. In contrast with this situation, a syntactic analysis based on Tokieda grammar can explain meaning of Japanese sentences well. Miura grammar is a Japanese grammar on the Constructive Process Theory proposed by M. Tokieda, and developed by T. Miura. This paper proposes a solution to the above problems in Japanese Syntactic Analysis by grammar descriptions based on Miura language model and Miura grammar, and by controls of grammar application. A trial parser based on the ideas generates suitable syntactic structures for meaning, in one-to-N or N-to-one dependency relations, in local nest in sentences, in distinction between topic marker “ha” and contrast marker “ha” in Japanese particle “ha”, and in words with duality of parts of speech.

View full abstract

Download PDF (3839K)
Semantic Role Labeling based on Japanese FrameNet

SHINSUKE HIZUKA, HIROYUKI OKAMOTO, HIROAKI SAITO, KYOKOHIROSE OHARA

2007 Volume 14 Issue 1 Pages 43-66
Published: January 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.43

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper proposes a stochastic model for semantic role labeling based on Japanese FrameNet and suggests a method to acquire it by machine learning. The model distinguishes semantic roles which cannot be separated by surface cases. The model receives a sentence and its predicate, identifies its predicate argument structure, then identifies the arguments to be labeled, and finally labels them with adequate semantic roles. The system based on the model achieved 77% precision and 68% recall in identifying the semantic roles of the pre-segmented arguments under the condition that the system labels the role whose certainty is more than the threshold. For more difficult tasks of identifying the arguments which should be labeled and their roles, the system attained 63% precision and 43% recall under the same condition. The system also achieved to label different semantic roles to the arguments whose surface cases are identical.

View full abstract

Download PDF (2291K)
A Keyword Extraction Method by Document Expansion

KENTA NAGAMACHI, YOSHIYUKI TAKEDA, KYOJI UMEMURA

2007 Volume 14 Issue 1 Pages 67-86
Published: January 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.67

JOURNAL FREE ACCESS

Show abstractHide abstract

Keyword extraction is one of the technology essential for information retrieval. There are methods based on the statistical information of words and features in syntax of the documents as existing keyword extraction methods. Among these methods, there is a keyword extraction method based on only a statistics which is called adaptation without using the dictionary. This method has a problem that it is not possible to extract a keyword as a long character string when the number of documents is limited. In this research, we introduce an idea of repetition occurrence in two ormore documents based on the idea of query expansion. And, we propose a novel keyword extraction method based on this idea. The F measure of the proposed keyword extraction method is improved. And, it becomes possible to extract the keyword that was not able to be extract before. In conclusion, we report the usefulness of the document expansion in the keyword extraction.

View full abstract

Download PDF (3123K)
Generating Referring Expressions Using Perceptual Grouping

KOTARO FUNAKOSHI, SATORU WATANABE, TAKENOBU TOKUNAGA

2007 Volume 14 Issue 1 Pages 87-110
Published: January 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.87

JOURNAL FREE ACCESS

Show abstractHide abstract

Referring expressions are the linguistic representations that identify the referent among objects. Past work of generating referring expressions mainly utilized attributes of objects and binary relations between objects. However, such an approach does not work well when there is no distinctive attribute among objects. To overcome this limitation, we proposed a generation method using perceptual grouping of objects. However, this method can deal with very limited situations. This paper proposes an extended method using perceptual grouping that can deal with more general situations. The psychological experiments with 18 subjects showed that the extented method could effectively generate proper referring expressions.

View full abstract

Download PDF (11269K)
Chunking Japanese Compound Functional Expressions by Machine Learning

MASATOSHI TSUCHIYA, TAKAO SHIME, TOSHIHIRO TAKAGI, KIYOTAKA UCHIMOTO, ...

2007 Volume 14 Issue 1 Pages 111-138
Published: January 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.111

JOURNAL FREE ACCESS

Show abstractHide abstract

The Japanese language has many compound functional expressions which consist of more than one word including both content words and functional words. They are very important for recognizing the syntactic structures of Japanese sentences and for understanding their semantic contents. We formalize detection of Japanese compound functional expressions as a chunking problem against a morpheme sequence, and propose to learn a detector of them using a machine learning method. The chunker YamCha based on Support Vector Machines (SVMs) is applied to this task. Through experimental evaluation, we achieve the cross validation result of the F-measure as 92, when the number of morphemes constituting a compound functional expression, and the position of each morpheme within a functional expression are considered as features of SVM.

View full abstract

Download PDF (3247K)
Text Generation for Intermediate Non-native Speakers of English

Xinyu Deng, Jun-ichi Nakamura

2007 Volume 14 Issue 1 Pages 139-161
Published: January 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.139

JOURNAL FREE ACCESS

Show abstractHide abstract

This paper describes the microplanner of the SILK system which can generate texts appropriate for intermediate non-native users on discourse level. Four factors (i.e. nucleus position, between-text-span punctuation, embedded discourse markers and punctuation pattern) are regarded to affect the readability at discourse level. It is the preferences among these factors that decide the readability. Since the number of possible combinations of the preferences is huge, we use Genetic Algorithm to solve such a problem. We adopt two methods to evaluate the system: one is evaluating the reliability of the SILK system by analysing how often it re-generates corpus texts, another is judging readability by human subjects. The evaluation results show that the system is reliable and the generation results are appropriate for intermediate non-native speakers on discourse level.

View full abstract

Download PDF (2097K)
Automatic F-term Classification of Japanese Patent Documents Using the k-Nearest Neighborhood Method and the SMART Weighting

Masaki Murata, Toshiyuki Kanamaru, Tamotsu Shirado, Hitoshi Isahara

2007 Volume 14 Issue 1 Pages 163-189
Published: January 10, 2007
Released on J-STAGE: March 01, 2011

DOIhttps://doi.org/10.5715/jnlp.14.163

JOURNAL FREE ACCESS

Show abstractHide abstract

Patent processing is important in various fields such as industry, business, and law. We used F-terms (Schellner 2002) to classify patent documents using the k-nearest neighborhood method. Because the F-term categories are fine-grained, they are useful when we classify patent documents. We clarified the following three points using experiments: i) which variations of the k-nearest neighborhood method are the best for patent classification, ii) which methods of calculating similarity are the best for patent classification, and iii) from which regions of a patent terms should be extracted. In our experiments, we used the patent data used in the F-term categorization task in the NTCIR-5 Patent Workshop (NTCIR committee 2005; Iwayama, Fujii, and Kando 2005). We found that the method of adding the scores of k extracted documents to classify patent documents was the most effective among the variations of the k-nearest neighborhood method used in this study. We also found that SMART (Singhal, Buckley, and Mitra 1996; Singhal, Choi, Hindle, and Pereira 1997), which is known to be effective in information retrieval, was the most effective method of calculating similarity. Finally, when extracting terms, we found that using the abstract and claim regions together was the best method among all the combinations of using abstract, claim, and description regions. The results were confirmed using a statistical test. Moreover, we experimented with changing the amount of training data and found that we obtained better performance when we used more data, which was limited to that provided in the NTCIR-5 Patent Workshop.

View full abstract

Download PDF (2535K)

Register with J-STAGE for free!