電気学会論文誌C(電子・情報・システム部門誌)
Online ISSN : 1348-8155
Print ISSN : 0385-4221
ISSN-L : 0385-4221
<情報処理・ソフトウェア>
3単語共起フィルタリングによる有害文書分類手法と大規模データ処理
大塚 孝信Deyue Deng伊藤 孝行
著者情報
ジャーナル フリー

2014 年 134 巻 1 号 p. 168-175

詳細
抄録

In recent years, young people are increasingly using internet. However, problem of received information to adversely affect the young people. Therefore, we propose a method to automatically classify to harmful sentences. Recently research on the information filtering have been improved the performance of the filter by introducing the co-occurrence information. Extended by two words co-occurrence information, which is commonly studied in this study we have created a training data using the co-occurrence information with three words. However, compared with the words two co-occurrence information processing time becomes a problem is increased the amount of training data. In addition, we have found that noise is caused by increase the co-occurrences, exceeds the number of double-precision floating-point calculation. We realized the processing speed by implementing a text filtering system with three-word co-occurrence using a Bayesian filter, to parallelize fast MyISAM database. In addition, by removing the noise caused by the increase in the number of co-occurrence BigDecimal, We realized the high F value.

著者関連情報
© 2014 電気学会
前の記事 次の記事
feedback
Top