Cloud API logs are a significant blind spot for many organizations and often factor into large-scale, publicly announced data breaches. They pose several challenges to security teams:
Cloud API transactions do not leave network or host-based evidence. For this reason, they cannot be monitored, searched, or analyzed using conventional security tools and products like network security devices or endpoint-based security agents. This tends to create significant blind spots in cloud threat detection as we will see in case studies throughout this blog.
They tend to resist detection using conventional search rules. A cloud API transaction log message created by unauthorized or malicious activity can be indistinguishable (apart from very subtle contextual nuances) from the thousands or millions of similar messages that were benign.
Distinguishing between routine and malicious messages can require technical expertise and environmental familiarity to understand nuance, such as the calling user context or source location. Alerting on such threats with search rules would require prior knowledge of how common attacks manifest and which user contexts are likely to be compromised.
For all of these reasons, cloud API logs are resistant to conventional threat detection and hunting techniques. Even if there were enough technical personnel for all organizations to maintain robust threat hunting teams, it is infeasible to manually sift millions or billions of CloudTrail events in order to find the few outliers that indicate malicious activity. In this blog, we’ll take an in-depth look at detection techniques through cloud API logs analysis, exploring two use cases of public record incidents in order to apply real-world examples of how threats can slip through conventional detection methods. We will walk through how to use search-based detection rules and show how to use Elastic’s machine learning based anomaly detection to identify rare and unusual activity in your cloud API logs. Case study 1: Exfiltration via snapshotsLet’s consider some examples. Recently, there was a public record data breach affecting cloud-hosted user information. While there were many dimensions to this particular intrusion set, the cloud dimensions are interesting because they featured tactics that had not been seen before. The attackers in this case decided that the most efficient way to copy the target data in bulk was to share snapshots in the victim’s cloud account with themselves using a built-in sharing feature — one commonly used for rapid instance deployment or disaster recovery. Once the snapshots were shared with the attacker’s account, they could copy or download data in case their cloud account was suspended, once incident response began. Moving snapshots in this way can yield a copy of the virtual disks attached to virtual machines running in a cloud account — potentially including the contents of each virtual server’s complete file system and any databases or other data structures stored there. In other words, it is a way of forklifting bulk data from one account to another.
Figure 1 - Public record description of the first known case of the “Transfer data to cloud account,” (t1537) technique, being used for bulk data theft in the cloud If we’re ingesting and monitoring CloudTrail logs, we can alert on this kind of activity with a search rule. Sharing a snapshot to another AWS account invokes the ModifySnapshotAttribute API call that is recorded in a CloudTrail log message and includes changes to snapshot permissions:
Figure 2 - CloudTrail events utilized by the detection rule “AWS EC2 Snapshot Activity”We included a search rule for this event among the CloudTrail rules we shipped in version 7.9 of the Elastic Stack. The rule name is “AWS EC2 Snapshot Activity” and it has its own MITRE ATT&CK® technique in the cloud matrix: “Transfer Data to Cloud Account (T1537),” which is a technique in the Exfiltration tactic category:
Figure 3 - Details of an alert created by the detection rule “AWS EC2 Snapshot Activity”Using a search rule for this is workable because while modifying snapshot permissions does happen normally, it does not usually happen hundreds or thousands of times per week, so we can alert on this without risk of alert fatigue. This rule can be further tuned, of course, by exempting user contexts who normally or routinely modify snapshot permissions. Manual hunting for anomalous snapshot activity can also be undertaken using the search in this rule: event.dataset:aws.cloudtrail and event.provider:ec2.amazonaws.com and event.action:ModifySnapshotAttribute Hunting will also tend to reveal what normal snapshot workflow looks like in an environment. This can be useful for tuning this rule with exceptions for normal snapshot activity. Case study 2: Multistage lateral movementLet’s next consider a harder problem in detection. In 2019, another public record incident took place that also featured a cloud attack surface. In this case, data was reportedly taken from a particular set of S3 (the AWS Simple Storage Service) buckets. In order to accomplish this, the attacker first relayed connections through an insecure WAF (web application firewall) in order to connect to the metadata service, which is not normally reachable from the Internet as a security precaution due to the sensitive nature of the data it contains. It is normally reachable only from a running EC2 instance.
Figure 4 - Public record details of the 2019 cloud incidentThere were five phases to this complex attack that effectively combined a number of techniques in order to obtain credentialed access. While none of these were necessarily critical by themselves, combining them into a sort of chained exploit made it possible to transition from an anonymous to credentialed user and then steal large amounts of data from the S3 bucket:
A server-side request forgery (SSRF) attack was made through an insecure web application firewall instance to gain access to the metadata service API
The attacker then routed commands and queries to the metadata service, which is not otherwise remotely accessible, through the WAF
Credentials for the WAF service account were retrieved from the metadata service
The WAF service account credentials retrieved were used to enumerate data in the S3 using APIcommands like “ListBuckets”
The WAF service account credentials were subsequently used to retrieve data from the S3
This attack leaves several pieces of evidence in the CloudTrail log messages. There would be events showing that a role associated with the WAF began calling methods related to S3, such as ListBuckets. In addition, the logs would show this activity sourced from an IP address associated with a VPN service and a set of TOR (the onion router) exit nodes, both of which were unusual and suspicious. VPN services and TOR are commonly used to obscure or hide one’s source IP address on the Internet, and are thus frequently used to anonymize network traffic. These may or may not be useful detection methods for every enterprise, though. Organizations could maintain a list of TOR exit nodes (where TOR meets the Internet) in order to alert on activity sourcing from these kinds of networks but this activity tends to be ever-present in most diverse Internet traffic — we don’t always want to alert on each and every connection of this kind to avoid alert fatigue. We could refine our approach and restrict our alerting to authenticated activity from TOR nodes to reduce traffic volumes associated with scanning activity (very common), which tends to be much more interesting — but what about VPN services? How many VPN services are there? How many do enterprises use legitimately? And how would we enumerate the IP addresses they use if those aren’t published? In the final analysis, trying to enumerate “good” and “bad” source IP addresses quickly becomes infeasible. Even if we could perfectly identify both TOR nodes and VPN services, this attack could just as well have come from somewhere else, such as a compromised third-party network or even another cloud instance. Let’s consider the ListBuckets command. This command, along with other S3 commands, is used normally thousands of times per week in most cloud environments — far too often to alert on without drowning the SOC in false positive alerts and inflicting alert fatigue on analysts. What we need to do is to imitate what threat hunters do and consider this activity relative to the normal spectrum of normal behavior in the spectrum. There are at least two kinds outliers in these particular events:
The user context for the ListBuckets command. The WAF role normally does not use this command, as it is unrelated to the WAF function.
The source IP address for the commands would have been unusual.
In order to hunt for these without generating a flood of alerts, we can use the combination of Elastic unsupervised machine learning technology and machine learning rules to find outliers in the CloudTrail data and turn these results into detection alerts. There are five different machine learning rules in the CloudTrail package. One looks for unusual commands for a user context. When the ListBuckets command is invoked in my dev environment (a command I have not used before) this action is flagged by the machine learning job that powers a rule named Unusual AWS Command for a User.
Figure 5 - An ML job detecting anomalous use of the ListBuckets API callThis anomaly result can also be turned into a detection alert in Elastic Security. Many of these rules contain playbooks in the “investigation notes” field that contain suggested procedures and avenues of investigation for unusual cloud activity, and this one is no exception:
Figure 6 - Details of an alert created by a machine learning rule detecting the anomalous use of the ListBuckets commandThis rule package contains rules that will also find unusual logins for users who do not normally access the management console. In this case, the model has detected an unusual username authenticating to the console:
Figure 7 - The same ML job detecting an unusual username logging into the consoleAnother of the included machine learning
Login to add comment
Other posts in this group
Version 7.17.27 of the Elastic Stack was released today. We recommend you upgrade to this latest version. We recommend 7.17.27 over the previous versi