In recent years, young people are increasingly using internet. However, problem of received information to adversely affect the young people. Therefore, we propose a method to automatically classify to harmful sentences. Recently research on the information filtering have been improved the performance of the filter by introducing the co-occurrence information. Extended by two words co-occurrence information, which is commonly studied in this study we have created a training data using the co-occurrence information with three words. However, compared with the words two co-occurrence information processing time becomes a problem is increased the amount of training data. In addition, we have found that noise is caused by increase the co-occurrences, exceeds the number of double-precision floating-point calculation. We realized the processing speed by implementing a text filtering system with three-word co-occurrence using a Bayesian filter, to parallelize fast MyISAM database. In addition, by removing the noise caused by the increase in the number of co-occurrence BigDecimal, We realized the high F value.
抄録全体を表示