Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Text Segmentation with Multiple Surface Linguistic Cues
HAJIME MOCHIZUKITAKEO HONDAMANABU OKUMURA
Author information
JOURNAL FREE ACCESS

1999 Volume 6 Issue 3 Pages 43-58

Details
Abstract
In general, a text consists of multiple sentences, and there are some semantic relations among them. A certain range of sentences in a text, is widely assumed to form a coherent unit which is usually called a discourse segment. While sentences in a segment have semantic relations with each other, segments in a discourse have some relations with each other. The global discource structure of a text can be constructed by relating the segments with each other. Therefore, identifying the segment boundaries is a first step to recognize the structure of a text. There are many surface linguistic cues which help for identifing text segmentations in a text. In this paper, we describe a method for identifying segment boundaries of a Japanese text with the aid of multiple surface linguistic cues, though our experiments might be small-scale. We calculate a weighted sum of the scores for all cues that reflects their contribution to identifying the correct segment boundaries. We also present a method of training the weights for multiple linguistic cues automatically without the overfitting problem.
Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top