Abstract
Malware is an effective tool for cyber attacks not only for professional hackers but also for amateur hackers. In a situation where new malware is increasing every day, developing efficient malware detection methods has become a global challenge. In recent years, machine learning-based methods, especially image-based malware detection approaches, have demonstrated their high accuracy. Convolutional neural networks (CNNs) are widely used for image-based malware detection, but CNNs require deep architectures and GPUs for parallel processing to achieve high performance. In contrast, a simple model based on multilayer perceptrons, called MLP-mixer, has been attracting attention because it can run in environments without GPUs, but it still falls short of CNNs in terms of performance. Therefore, this study attempts to improve the performance of the MLP-mixer by applying an Autoencoder, which is widely used not only to identify essential elements of the input data but also as a dimensionality reduction to remove noise. In this study, we propose a lightweight ensemble architecture by combining MLP-mixer and Autoencoder. Experiments show that the efficiency of the proposed method outperforms typical CNN models in terms of both performance and computation time.