Abstract
Spam email detection system uses keywords in a blacklist to detect spam emails. However the keywords are written as misspellings, for example ""baank"", ""ba-nk"" and ""bankk"" instead of ""bank"", and change the misspellings from time to time. The system needs to constantly update the blacklist to detect spam emails containing such misspellings. However it is impossible to predict all possible misspellings for a given keyword to add those to the blacklist. We propose a framework to solve this problem. A keyword is represented as a Markov chain where letters are states. A Markov model is then built for the keyword. In order to decide an unknown word as a misspelling of a given keyword, a statistical hypothesis test and fuzzy normalization method were used. Experiments showed that the proposed method achieved the detection error rate of 0.2%.