Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Text segmentation based on term repetition distances
SHIGENORI NAKANOAKIRA ADACHITAKENORI MAKINO
Author information
JOURNAL FREE ACCESS

2006 Volume 13 Issue 2 Pages 3-26

Details
Abstract

This paper proposes a mothod for detecting topic boundaries by topical conhesion profile measured by term repetition distances. A set of term repetitions composes topical potential. Total topical potentials compose topical cohesion profile that corresponds to dominant topics at hills and segement boundaries at valleys. Endlines of newspaper articles which are connected sequentially indicate topical segement boundaries. In the experiment the method is applied to test how many segement boundaries are detected at article endlines. The results of the experiment showed 67.8% in recall and 61.8% in precision. The method is available for long essays and effective to small texts.

Content from these authors
© The Association for Natural Language Processing
Previous article Next article
feedback
Top