2018 Volume E101.D Issue 4 Pages 1199-1202
Malware phylogeny refers to inferring evolutionary relationships between instances of families. It has gained a lot of attention over the past several years, due to its efficiency in accelerating reverse engineering of new variants within families. Previous researches mainly focused on tree-based models. However, those approaches merely demonstrate lineage of families using dendrograms or directed trees with rough evolution information. In this paper, we propose a novel malware phylogeny construction method taking advantage of persistent phylogeny tree model, whose nodes correspond to input instances and edges represent the gain or lost of functional characters. It can not only depict directed ancestor-descendant relationships between malware instances, but also show concrete function inheritance and variation between ancestor and descendant, which is significant in variants defense. We evaluate our algorithm on three malware families and one benign family whose ground truth are known, and compare with competing algorithms. Experiments demonstrate that our method achieves a higher mean accuracy of 61.4%.