The growth of unsolicited bulk e-mails (spams) is a crucial problem on e-mails of the Internet. There are many anti-spam tools based on automatic classification by learning, such as Bayesian filters. They are dependent on language of e-mails because they have lexical analyzer to get words from e-mails. However, spams are written in various languages, such as English, Japanese, Chinese, and so on. This paper proposes a language independent method for filtering spams. By the method, e-mails are classified into spams and no-spams by SVM which uses frequencies of sub-strings extracted from e-mails. This paper also describes a result of test of the method with sample e-mails written in English, Japanese, Chinese, and some other languages, and discusses about the result and future works.
抄録全体を表示