Journal of Information Processing
Online ISSN : 1882-6652
ISSN-L : 1882-6652
 
Malicious JavaScript Detection in Realistic Environments with SVM and MLP Models
Ngoc Minh PhungMamoru Mimura
Author information
JOURNAL FREE ACCESS

2024 Volume 32 Pages 748-756

Details
Abstract

Malicious JavaScript detection using machine learning models has shown many great results over the years. However, real-world data only has a small fraction of malicious JavaScript. Many previous techniques ignore most of the benign samples and focus on training a machine learning model with a balanced dataset. This paper continues the previous work (Phung and Mimura, 2023), uses Support vector machine (SVM) and Multi-layer perceptron (MLP) as classifiers, trains the models with a Doc2Vec-based filter that can quickly classify JavaScript malware using Natural Language Processing (NLP) and feature re-sampling. In this paper, the total features of the benign samples will be reduced using a combination of word vectors and a clustering model. Random seed oversampling will generate new training malicious data based on the original training dataset. We evaluate our models with a dataset of over 30,000 samples obtained from top popular websites, PhishTank, and GitHub. The experimental result shows that Abstract syntax tree (AST) parsing has the most effect on the improvement of the detection scores.

Content from these authors
© 2024 by the Information Processing Society of Japan
Previous article Next article
feedback
Top