A significant number of logs are generated in dynamic malware analysis. Consequently, a method for effectively compressing these logs is required to reduce the amount of memory and storage consumed to store such logs. In this study, we evaluated the efficacy of grammar compression methods in compressing call traces in malware analysis logs. We hypothesized that grammar compression can be useful in compressing call traces because its algorithm can naturally express the dynamic control flows of program execution. We measured the compression ratio of three grammar compression methods (SEQUITUR, Re-Pair, and Byte Pair Encoding (BPE)) and three well-known compressors (gzip, bzip2, and xz). In experiments conducted in which API call sequences collected from thousands of Windows malware were compressed, the Re-Pair grammar compression method was found to outperform both gzip and bzip2.
2017 by the Information Processing Society of Japan