2024 Volume 32 Pages 1105-1113
Malicious URL is a security problem that has plagued the Internet for a long time. Previously, people usually used the method of establishing blacklists to distinguish between malicious URLs and benign URLs, but to solve the shortcomings of using blacklist method to detect malicious URLs, such as slow update speed, the research of using machine learning to detect malicious URLs is increasing. These research projects have proposed their own methods and obtained great accuracy, but the summary research on malicious URLs detection is insufficient. In this paper, we propose a three-step framework: Segmentation step, Embedding step and Machine Learning step, for malicious URLs detection, which makes sense for systematically summarizing different machine learning based malicious URL detection methods. We overview 14 related works by our three-step framework and find that almost all research on malicious URLs detection using machine learning can be classified by the three-step framework. We evaluate some context-considering methods, the methods that consider the corpus's context during the vector generation, and machine learning models to test their suitability using our three-step framework. According to the results, we verify the importance of considering context and find that context-considering embedding methods are more important and the malicious URLs detection accuracy improved with context-considering methods.