Abstract
The difficulty in processing long documents is due to the variety of topics they contain. Long documents such as technical papers and reports include more topics than do short documents such as news articles. Since each topic in a long document tends to be relevant to only a small portion of the document, conventional text categorization, which tries to assign predefined topics to the entire document, results in limited effectiveness. In this paper we study the use of probabilistic passage categorization, assigning predefined topics to each passage contained in a document. We show that the performance of passage categorization is superior to that of conventional text categorization especially for long documents. We also discuss possibility of applying passage categorization to topic-dependent text summarization, and show some preliminary experimental results.