Journal of Natural Language Processing
Online ISSN : 2185-8314
Print ISSN : 1340-7619
ISSN-L : 1340-7619
Paper
Effect of Singular Value Decomposition and Weighting by Singular Value of Document-Term Matrix, for Large-scale Data Perspective and Targeted Data Extraction
Mariko HiranoTakeshi S. Kobayakawa
Author information
JOURNAL FREE ACCESS

2013 Volume 20 Issue 3 Pages 335-365

Details
Abstract
We analyzed tweets broadcasted until four days after the occurrence of the Great East Japan Earthquake, which are provided by the Project 311. After obtaining a general view from tweets clustering, we created a set of targeted extraction categories from them and constructed a tweet extractor tailored to the target. In a sequence of such processes, improvement of the clustering, which is used to discover the target category for extraction, becomes very important. A method is proposed that utilizes the Singular Value as weights for features, while the well-known conventional use of Singular Value Decomposition is limited to reducing its dimension. In addition, we proposed an evaluation criterion for a human-aided clustering task, and conducted experiments to compare these criteria, including commonly-used ones, with the actual time spent by humans for performing such a task. The experiments show the effectiveness of the proposed weighting method and the competency of our criterion, mainly from the perspective of time efficiency of the task. As for the targeted data-extraction task, which is also a classification problem, some improvement in accuracy is observed although the training process itself involves a weighting mechanism.
Content from these authors
© 2013 The Association for Natural Language Processing
Previous article Next article
feedback
Top