IEEJ Transactions on Electronics, Information and Systems
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<Information Processing, Software>
Text Filtering for Harmful Document Classification Method Using Three words Co-occurrence and Large-scale Data Processing
Takanobu OtsukaDeyue DengTakayuki Ito
Author information
JOURNAL FREE ACCESS

2014 Volume 134 Issue 1 Pages 168-175

Details
Abstract

In recent years, young people are increasingly using internet. However, problem of received information to adversely affect the young people. Therefore, we propose a method to automatically classify to harmful sentences. Recently research on the information filtering have been improved the performance of the filter by introducing the co-occurrence information. Extended by two words co-occurrence information, which is commonly studied in this study we have created a training data using the co-occurrence information with three words. However, compared with the words two co-occurrence information processing time becomes a problem is increased the amount of training data. In addition, we have found that noise is caused by increase the co-occurrences, exceeds the number of double-precision floating-point calculation. We realized the processing speed by implementing a text filtering system with three-word co-occurrence using a Bayesian filter, to parallelize fast MyISAM database. In addition, by removing the noise caused by the increase in the number of co-occurrence BigDecimal, We realized the high F value.

Content from these authors
© 2014 by the Institute of Electrical Engineers of Japan
Previous article Next article
feedback
Top