SCIS & ISIS
SCIS & ISIS 2008
Session ID : FR-E2-1
Conference information

Online Text Mining System based on M2VSM
*Yasufumi TakamaTakashi OkadaToru Ishibashi
Author information
CONFERENCE PROCEEDINGS FREE ACCESS

Details
Abstract
This paper proposes an online text mining system that is developed based on M2VSM (Meta keyword-based Modified VSM). When conventional vector space model (VSM) is applied to document clustering, it is difficult to adjust the granularity of cluster in terms of topic. In order to solve the problem, M2VSM is proposed as an extended VSM so that it can consider meta keywords such as adjectives and adverbs, as additional value of indexing terms. The similarity between documents is calculated by considering the matching of meta keywords for each index term, which makes it possible to cluster documents with various granularities in terms of topic. The online text mining system is developed MUSASHI, which is one of the most popular open source data mining tools. By using the system, users can perform a series of text mining process online, including preprocessing, feature selection, clustering, and visualization of results. Experimental results show that clustering results by M2VSM match the results by test subjects in both rough and detailed clustering. It is also shown that the system can process database containing 5,000 documents within 7 minutes.
Content from these authors
© 2008 Japan Society for Fuzzy Theory and Intelligent Informatics
Previous article Next article
feedback
Top