About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
ICCCN 2022
Conference paper
Evaluating Feature Robustness for Windows Malware Family Classification
Abstract
Machine learning approaches to classify malware by family save analysts valuable time during incident response. A key challenge for these approaches is selecting features that are robust against concept drift, which describes the change in malware over time. In this paper, we evaluate a dynamic feature set based on Windows handles (e.g., files, registry keys) for malware family classification. Specifically, we examine the features' vulnerabilities and evaluate their robustness against concept drift. We curated a novel dataset that simulates the manipulations that attackers may invoke on malware samples. We demonstrate improved robustness to concept drift over traditional API call-based features by training machine learning classifiers on malware collected in the wild, and testing the classifiers against samples that underwent manipulations. Further, we investigate time decay due to concept drift using temporally consistent evaluations that do not assume access to newer information. The evaluation shows that our features are robust against malware obfuscation. Furthermore, we empirically demonstrate how malware labeling conventions (malware type or family) can affect results, and make recommendations for dataset construction.