2010 Volume 2010 Issue SWO-022 Pages 03-
We address the task of summarizing numerous short documents on microblogs including\nTwitter. On microblogs, thousands of short documents on a certain topic such as sports games\nor TV dramas are posted by users. Noticeable characteristics of microblog data are that documents\nare often very highly redundant and are aligned on timeline. There can be dozens of documents\non one event in the topic. Two very similar documents will refer to two distinct events when the\ndocuments are temporally distant. We examine the microblog data to gain more understanding\nof those characteristics, and propose a summarization model for numerous short documents on\ntimeline, along with an approximate fast algorithm for generating summary. We empirically show\nthat our model generates a good summary on the dataset of microblog documents on sports games.