Although it is easy to record speech, it is not easy to refer to audio recordings.If it is able to index or summarize audio recordings, referring to them would become easier.In this paper, we aim at extracting automatically the summarization of spoken lectures.For this purpose, at first we compared results of summarization extracted by human subjects.We found large differences with every subject.Then we investigated relations between linguistic surface information and human's results, and we obtained useful surface linguistic information.Next, we summarized spoken lectures based on this information, and compared them with human's results.Additionally, we focused on prosodic features;
F0 and power.We conducted the same experiments on them.Lastly, we combined linguistic surface information and prosodic information.As a result, we obtained a better
F-measure (0.599) and κ-value (0.420), comparable with human's results.
View full abstract