Many organizations rely on Security Information and Event Management (SIEM) systems to discover intruders in their network from security-related events such as host and firewall logs. However, our work shows that adversaries can easily evade a large fraction of popular SIEM detection rules that aim to detect malicious command executions on Windows systems. To mitigate these detection blind spots, we introduce a novel concept called adaptive misuse detection, which utilizes supervised machine learning to discover potential rule evasions while keeping false alerts to a minimum. Finally, we present our open-source proof-of-concept implementation of adaptive misuse detection, AMIDES, and demonstrate its fitness for application in large enterprise networks.
Cyberattacks have grown into a major risk for organizations. Attackers often succeed to break into target systems despite elaborate preventive measures, with common consequences being data theft or sabotage. In this case, intruders should at least be detected as soon as possible to stop them from reaching their final goals – or understand their impact when it is already too late. To this end, many organizations operate Security Information and Event Management (SIEM) systems, which centrally collect and scan security-related events (e.g., firewall and endpoint logs) for attack indicators. Since manually inspecting the usually huge amount of data is prohibitive, suspicious and malicious behavior need to be detected automatically, raising security alerts that are then further analyzed by security experts.
Despite a great deal of research in cyberattack detection and consequently a large number of proposed detection methods, the foremost method found in practice still is simple yet effective misuse detection (also called rule-based or signature-based detection) [1, 2, 3]. Misuse detection applies a set of expert-written rules to each relevant event, where each rule contains one or more signatures that define conditions on which the rule should trigger, e.g., a regular expression matching a certain field. Aside from vendor-specific rulesets in commercial SIEM products, there is a notable and growing open-source community for SIEM rules, with the most prominent example being Sigma [4] – a generic and open signature format for SIEM systems that allows flexible rules in YAML format for detecting malicious or suspicious behavior in any type of text-based log data. Sigma is widely used in practice by many organizations.
Though being the prime means for detecting cyberattacks in enterprise networks, misuse detection is not a silver bullet. As we will show in this article, detection rules can often be evaded, i.e., the malicious activity can be performed successfully without triggering a rule by slightly modifying the attack. As often seen in malicious binaries or scripts, attackers try to evade detection by applying different obfuscation techniques, e.g., by inserting dummy characters into command lines to avoid matching signatures [5, 6, 7, 8]. Evasions can also be the result of undetected attack variants, e.g., by executing a command line with slightly different arguments than covered by the respective rule. Altogether, this leads to critical detection blind spots, i.e., attacks can remain undetected even though comprehensive rulesets are deployed.
To find out how big the risk of detection blind spots through evasions really is, we analyzed a representative set of Sigma SIEM rules with respect to potential evasions by creating concrete rule evasions. For our analysis, we chose a subset of Sigma rules, namely, those acting on Windows process creation events (i.e., events triggered when a new process is started on a Windows system, including context information such as the full command line and the parent process). Process creation events are known to be a valuable source for threat detection [9]. Consequently, detection rules acting on such events are the most frequent rule type in the Sigma ruleset (41% of all rules at the time of analysis).
During our analysis, we reviewed all Sigma process creation rules in detail and re-enacted the malicious behavior that should be detected by these rules on a Windows 10 system. For example, we executed PowerShell with a command line that creates a startup entry for a malware binary in the Windows registry. During this process, we manually reviewed the Windows event log to verify the executed commands and ran scripts to check for Sigma rules matching these events. In case of a successful match for the currently analyzed rule, we tried to construct command lines that perform the exact same action, but without triggering the rule (i.e., evasions).
We succeeded to create a large number of evasions by applying five relatively simple obfuscation techniques: (1) Insertion of ignored characters into the command line (e.g., double quotes or spaces), (2) substitution of synonymous characters or arguments (e.g., a hyphen instead of a slash before an argument), (3) omission of unnecessary characters (e.g., shortening arguments), (4) reordering of arguments, and (5) recoding of arguments. Examples for the different evasion types are given in Table 1.
Ultimately, of the 292 analyzed Sigma rules, we found 110 (38%) to be fully evadable and 19 (7%) partially evadable (i.e., a rule contains OR-branches of which at least one could be evaded, but not all). For another 51 rules (17%), we found that evasion might be possible but no concrete evasion instances could be confirmed either due to unavailable target software (mostly malware) or prohibitive effort for conclusive analysis. In conclusion, the results of our rule analysis show that the risk of detection blind spots through rule evasions is indeed high for the analyzed Sigma rules.
Since our rule analysis showed that many Sigma rules can be easily evaded, we asked ourselves how these detection blind spots can be remedied. Our answer to this question was the new concept of adaptive misuse detection as presented in our paper [10], which extends (conventional) misuse detection with additional machine learning components to detect rule evasions.
During our analysis for evasions, we observed that most SIEM events of successful evasions are still very similar to those of the original attack. This is because the SIEM events that are predominantly used for attack detection are captured at kernel level. At this level, most obfuscations must already be resolved, else the operating system could not execute the desired action. Therefore, a promising approach for evasion detection is to look for events that are similar to signatures contained in the ruleset. However, it is crucial to avoid false alerts, since large enterprise networks often deal with millions or billions of SIEM events per day and the number of benign events is usually several orders of magnitude larger than the number of attack-related events. Consequently, when aiming to detect evasions based on their similarity to rules, it is indispensable to avoid an overly broad definition of similarity.
Our idea of adaptive misuse detection is based on supervised machine learning with the goal of detecting as many evasions as possible while keeping false alerts extremely low. More precisely, we classify incoming events according to whether they are more similar to deployed SIEM rules or to historical benign events, both taken from the enterprise network where the system is operated. By taking the knowledge of what is malicious from the ruleset, which – in case of Sigma – is publicly available, comprehensive, and regularly updated, we overcome a common issue of supervised learning-based solutions, namely, the necessity to create a comprehensive set of attacks for the training process. The approach is called adaptive misuse detection since it adapts to the target environment by training against its benign activity to properly distinguish potential attacks from benign events.
Besides being able to raise alerts on potential evasions, our approach also allows for a feature that we call rule attribution. When a conventional misuse detection rule triggers, an analyst can simply view this rule to find out what happened, since rules usually contain an expressive title and description along the signature(s). However, many machine learning-based systems lack this advantage and only raise an alert without further context. Since adaptive misuse detection learns from SIEM rules, information on which features represent which rule is available during training, allowing us to estimate which rule(s) were probably evaded. This information is added to the evasion alert to ease investigation.
The Adaptive Misuse Detection System (AMIDES) [11] is our proof-of-concept implementation of adaptive misuse detection. Its components and operating principle are shown in Figure 1. AMIDES employs established methods for feature extraction (preprocessing, tokenization, filtering) to SIEM events and transforms the resulting tokens into numeric vectors. AMIDES has two machine learning components, namely, misuse classification and rule attribution, which utilize support vector machines (SVMs) as classifiers. Both classifiers are trained with signatures from the supplied SIEM ruleset (labeled as malicious) versus relevant fields of the supplied benign events (labeled as benign).
During operation, incoming SIEM events are passed to the rule matching component (as for conventional misuse detection) and at the same time to the feature extraction component, which generates a feature vector from the event. This vector is first passed to the misuse classification component, classifying the event as either malicious or benign. If classified as malicious, the feature vector is also passed to the rule attribution component, which generates a ranked list of rules potentially evaded by the event. Finally, potential alerts of the rule matching and machine learning components are merged to create one single alert per malicious event.
We evaluated AMIDES’ performance using four weeks of benign process creation events collected from a large enterprise network with more than 50 000 users. In total, ~155 000 000 events were collected. AMIDES was first trained with Weeks 1 and 2 of the benign events versus the Sigma process creation rules. Then, to simulate live operation, AMIDES was provided with Weeks 3 and 4 versus the evasions that we crafted for these rules within the context of our rule analysis.
Using its default sensitivity, AMIDES successfully detected 70% of our evasions (true positive) and raised zero false alerts (false positive). In case false alerts are acceptable in exchange for a higher detection rate, AMIDES’ sensitivity can be adapted. Compared to a benchmark approach, namely, a classifier that learns from our crafted evasions versus benign events, both approaches perform almost identical. Notably, this benchmark approach requires manually crafted attack events for each SIEM rule, which is prohibitively costly in practice. AMIDES, on the other hand, learns from existing SIEM rules and is therefore realistically suited for operation in enterprise networks.
We also evaluated AMIDES’ rule attribution: For 63% of the evasions correctly detected by the misuse classification, the true evaded rule was the top proposition. For 95% of the evasions, the true evaded rule at least appeared in the top 10 propositions. This result indicates that the rule attribution largely succeeds in helping analysts to find out which rules were evaded.
To show that AMIDES is actually suited for operation in an enterprise network, we verified that it is able to handle enterprise-level throughput even when run on commodity hardware. Using our dataset, the measurements showed that AMIDES is 330 times faster than required for live operation in our sample enterprise network and allows for daily (re-)training if desired.
Finally, we analyzed if AMIDES and the concept of adaptive misuse detection are applicable to event types other than Windows process creation. Since such other event types from the enterprise network were not accessible, we used synthetic benign events from the testbed SOCBED [12]. AMIDES achieved perfect misuse classifications on three other event types (web, registry, and PowerShell events) and the rule attribution correctly ranked the evaded rules highest for each type. While the gathered results are not statistically significant due to the small number of evasions and the synthetically generated benign events, they still indicate that adaptive misuse detection is applicable to diverse rule and event types.
Detection blind spots of SIEM rules can lead to critical rule evasions within enterprise networks. Our analysis of a subset of Sigma rules showed that almost half of them can be at least partially evaded with straightforward techniques. To remedy this situation, we propose the novel concept of adaptive misuse detection where incoming events are compared to SIEM rules on the one hand and known-benign events on the other hand. The evaluation of our open-source proof-of-concept implementation AMIDES has illustrated the feasibility of adaptive misuse detection. We showed that the benefits remain even under real-world constraints. Therefore, AMIDES significantly reduces detection blind spots as revealed by our analysis of widespread SIEM rules. Future work should further examine widespread SIEM rules for potential evasions and expand the concept and implementation of adaptive misuse detection by additional event types and fields.
More information on the concept of adaptive misuse detection and AMIDES can be found in our USENIX Security ’24 paper [10].