It is known that there are fewer sentences and the sentences are longer in a TV news text compared with those in a newspaper article. If we would like to summarize such TV news texts by selecting important sentences, since each sentence is rather long, we end up with losing a good amount of information by omitting a whole sentence. Therefore, we adopt a method in which we partition a long sentence into shorter sentences before summarization. To evaluate how the partitioning affects text summarization, we select two basic measures for text summarization, and examine how they vary before and after the partitioning of long sentences. The two measures are first ranking of sentences in the text by their importance, and second, the number of characters removed from the text by applying the same set of rules for shortening and deleting the text. All the sentences in the text are ranked by their importance by hand and by sentence extraction system. First, we examine how the ranks of the sentences judged important by human vary before and after the partitioning. We found that there are more partitioned important sentences whose difference in ranking is greater or equal to three (3) than those whose difference in ranking is one (1). This suggests that the partitioning is good for sentence extraction. Then, we compare the rankings of the human and the system for all the sentences in the text using Spearman's rank correlation coefficient. We found that the coefficient increases between 0.0318 and 0.065, which means the rankings of the human and the system for the partitioned texts are more similar that those for the original texts. Lastly, we investigate how the partitioning affects the shortening the text. Here we found that the number of characters that are deleted increases for the partitioned texts and a compaction ratio (the number of characters of the shortened text divided by the number of characters in the original text) decreases by 2.71 percent to 2.78 percent. It shows that the partitioning long sentences makes shortening method work better.
View full abstract