2019 Volume 27 Pages 555-563
In recent years, the number of targeted email attacks which use Microsoft (MS) document files has been increasing. In particular, malicious VBA (Visual Basic for Applications) macros are often contained in the MS document files. Some researchers proposed methods to detect malicious MS document files. However, there are a few methods to analyze malicious macros themselves. This paper proposes a method to detect unseen malicious macros with the words extracted from the source code. Malicious macros tend to contain typical functions to download or execute the main body, and obfuscated strings such as encoded or divided characters. Our method represents feature vectors from the corpus with several NLP (Natural Language Processing) techniques. Our method then trains the extracted feature vectors and labels with basic classifiers, and the trained classifiers predict the labels from unseen macros. Experimental results show that our method can detect 89% of new malware families. The best F-measure achieves 0.93.