Fuzzy Normalization for Spam Email Detection

Dat Tran; Wanli Ma; Dharmendra Sharma

doi:10.14864/softscis.2006.0.1505.0

SCIS & ISIS 2006

Session ID : FR-J4-1

DOI https://doi.org/10.14864/softscis.2006.0.1505.0

Conference information

Host: Japan SOciety for Fuzzy Theory and intelligent informatics

Co-host: The Korea Fuzzy Logic and Intelligent Systems Society, IEEE Computational Intelligence Society, The International Fuzzy Systems Association, 21th Century COE Program "Creation of Agent-Based Social Systems Sciences"

FR-J4 Soft Computing Approaches for Systems Optimization and Analysis

Fuzzy Normalization for Spam Email Detection

*Dat Tran, Wanli Ma, Dharmendra Sharma

Author information

Keywords: Spam email detection, fuzzy normalization, Markov keyword model

CONFERENCE PROCEEDINGS FREE ACCESS

Details

Abstract

Spam email detection system uses keywords in a blacklist to detect spam emails. However the keywords are written as misspellings, for example ""baank"", ""ba-nk"" and ""bankk"" instead of ""bank"", and change the misspellings from time to time. The system needs to constantly update the blacklist to detect spam emails containing such misspellings. However it is impossible to predict all possible misspellings for a given keyword to add those to the blacklist. We propose a framework to solve this problem. A keyword is represented as a Markov chain where letters are states. A Markov model is then built for the keyword. In order to decide an unknown word as a misspelling of a given keyword, a statistical hypothesis test and fuzzy normalization method were used. Experiments showed that the proposed method achieved the detection error rate of 0.2%.

Corresponding author

Register with J-STAGE for free!