Combining supervised and unsupervised machine learning for DGA detection

It is with great excitement that we announce our first-ever supervised ML and security integration! Today, we are releasing a supervised ML solution package to detect domain generation algorithm (DGA) activity in your network data. In addition to a fully trained detection model, our release contains ingest pipeline configurations, anomaly detection jobs, and detection rules that will make your journey from setup to DGA detection smooth and easy. Navigate to our detection rules repository to check out how you can get started using supervised machine learning to detect DGA activity in your network and start your free trial with Elastic Security today. DGAs: A breakdownDomain generation algorithms (DGA) are a technique employed by many malware authors to ensure that infection of a client machine evades defensive measures. The goal of this technique is to hide the communication between an infected client machine and the command & control (C & C or C2) server by using hundreds or thousands of randomly generated domain names, which ultimately resolve to the IP address of a C & C server. To more easily visualize what’s occurring in a DGA attack, imagine for a moment you’re a soldier on a battlefield. Like many soldiers, you have communication gear that uses radio frequencies for communication. Your enemy may try to disrupt your communications by jamming your radio frequencies. One way to devise a countermeasure for this is by frequency hopping — using a radio system that changes frequencies very quickly during the course of a transmission. To the enemy, the frequency changes appear to be random and unpredictable, so they are hard to jam. DGAs are like a frequency-hopping communication channel for malware. They change domains so frequently that blocking the malware’s C2 communication channel becomes infeasible by means of DNS domain name blocking. There are simply too many randomly generated DNS names to try and identify and block them. This technique emerged in the world of malware with force in 2009, when the “Conficker” worm began using a very large number of randomly generated domain names for communication. The worm’s authors developed this countermeasure after a consortium of security researchers interrupted the worm’s C2 channel by shutting down the DNS domains it was using for communication. DNS mitigation was also performed in the case of the 2017 WannaCry ransomware global outbreak. Blending inIf the best place to hide a tree is in a forest, malware operators have long recognized that blending in with normal web traffic is one of the best ways to go undetected. An HTTP request with a randomly generated domain name is a hard problem in network security monitoring and detection. The vast amount of HTTP traffic in modern networks makes manual review infeasible. Some malware and bots have unusual user agent strings that can be alerted on with search rules, but malware authors can easily leverage a user agent string that looks no different from a web browser. With the rise of mobile and IoT, user agent strings have become so numerous that manual review for suspicious activity is also becoming infeasible. Web proxies have long used categorization to look for URLs that are known to be suspicious, but DGA domains are so voluminous and short-lived that they are often not categorized. Threat intelligence feeds can identify IP addresses and HTTP requests that are associated with known malware families and campaigns, but these are so easily changed by malware operators that such lists are often outdated by the time we put them to use in searches. The sheer volume of network traffic collected in many organizations and the random nature of DGA-generated domains makes detection of this activity a challenge for rule-based techniques — and a perfect fit for our supervised machine learning model! Using Inference, Elastic’s DGA detection ML model will examine packetbeat DNS data as it is being ingested into your Elasticsearch cluster, automatically determining which domains are potentially malicious. Follow the steps in the next section to get started. Getting startedTo get started with DGA detection within the security app, we have released a set of features to our publicly available rules repository to assist with the importing of machine learning models to the Elastic Stack. This repo not only provides our community a place to collaborate on threat detection, but also acts as a place to share the tools required to test and validate rules. Please see our previous blog and webinar for additional information on the initiative. If you don’t already have an Elastic Cloud subscription, you can try it out through our free 14 day cloud trial to start experimenting with the supervised ML solution package to detect DGA activity Part of this rule toolkit is a CLI (command line interface) to not only test rules, but also interact with your stack. For instance, we have released various Python libraries to interact with the Kibana API. This was critical in making an easier process for importing the model dependencies to get your rules operational. To start enriching DNS data and receiving alerts for DGA activity, follow these three steps: Step one: Importing the modelFirst, you must import the DGA model, painless scripts, and ingest processors into your stack. Currently, DGA models and any unsupervised models for anomaly detection (more to come) are available in the detection-rules repo using github releases. To upload, run the following CLI command: python -m detection_rules es experimental setup-dga-model -t Following the upload, you will need to update your packetbeat configuration, as the model will enrich packetbeat DNS events with a DGA score. This can easily be done by adding the additional configuration to your Elasticsearch output configuration: output.elasticsearch: hosts: ["your-hostname:your-port"] pipeline: dns_enrich_pipeline The supervised model will then analyze and enrich Packetbeat DNS events, which contain these ECS fields: dns.question.name dns.question.registered_domain The model will then add these fields to processed DNS events:

    Field name

    Description

    ml_is_dga.malicious_prediction

    A value of “1” indicates the DNS domain is predicted to be the result of malicious DGA activity. A value of “0” indicates it is predicted to be benign.&nbsp;

    ml_is_dga.malicious_probability

    A probability score, between 0 and 1, that the DNS domain is the result of malicious DGA activity.

A sample screenshot of enriched DNS data is shown below:

Note: For more detailed information, please consult the detection-rules readme. About the DGA RulesNow let’s look at some conditional search rules that detect and alert on DGA activity. Two search rules are provided in the package that can be enabled and run in the detection engine in the Elastic Security app:

Machine Learning Detected a DNS Request Predicted to be a DGA Domain
Machine Learning Detected a DNS Request With a High DGA Probability Score

The first rule matches any DNS event that has a DGA prediction value of 1, indicating the DNS domain name was probably the product of a domain generation algorithm and is therefore suspicious. The rule, found here, simply looks for the following condition: event.category:network and network.protocol:dns and ml_is_dga.malicious_prediction: 1 The second rule matches any DNS event that has a DGA probability higher than 0.98, indicating the DNS domain name was probably the product of a domain generation algorithm and is therefore suspicious. The rule, found here, simply looks for the following condition: event.category:network and network.protocol:dns and ml_is_dga.malicious_probability > 0.98 Like all rules in the Elastic Detection Engine, they can be forked and customized to suit local conditions. The probability score in the second rule can be adjusted up or down if you find that a different probability score works better with your DNS events. Either rule can have its risk score increased if you wish to raise the priority of DGA detections in your alert queue. Exceptions can be added to the rules in order to ignore false positives such as content distribution network (CDN) domains that may use pseudorandom domain names. Another future possibility we plan to explore is to use event query language (EQL) to look for clusters of anomaly or search-based alerts using multivariate correlation. For example, if we see a cluster of alerts from a host engaged in probable DGA activity, confidence increases that we have a significant malware detection that needs attention. Such a cluster could consist of DGA alerts combined with other anomaly detection alerts such as a rare process, network process, domain, or URL. These additional anomaly detections are produced by the library of machine learning packages included in the Elastic Security app. Step two: Importing the rulesThe rules in the DGA package can be imported using the kibana rule-upload feature in the detection-rules CLI (in the format of .toml). Since the rules provided in detection-rules repo releases are in .toml format, simply run the following command to upload a rule from the repo: python -m detection_rules kibana upload-rule -h Kibana client: Options: --space TEXT Kibana space -kp, --kibana-password TEXT -ku, --kibana-user TEXT --cloud-id TEXT -k, --kibana-url TEXT Usage: detection_rules kibana upload-rule [OPTIONS] TOML_FILES... Upload a list of rule .toml files to Kibana. Options: -h, --help Show this message and exit. Step three: Enable rule and profitNow that we have the trained supervised ML model imported into the stack, DNS events being enriched, and rules at our disposal, all that is left to do is confirm that the rule is enabled and wait for alerts! When viewing the rule in the Detection Engine, you can confirm that it is activated as seen below:

And now wait for alerts. Once an alert is generated, you can use the Timeline feature to investigate the DNS event and start your investigation.

    Note: Detection of command and control (C2) activity, such as that associated with DGA, may produce la

Creată 4y | 18 dec. 2020, 14:20:43

Autentifică-te pentru a adăuga comentarii

Alte posturi din acest grup

Protecting California's workforce: EDD’s cybersecurity evolution after COVID-19

https://www.elastic.co/blog/california-employment-development-department-cybersecurity-covid

10 ian. 2025, 18:30:10 | Elasticsearch

Almudena Sanz Olivé advice for other women in tech? Find what motivates you

https://www.elastic.co/blog/culture-advice-for-women-in-tech

10 ian. 2025, 16:10:10 | Elasticsearch

Transform financial services with AI: Unlock growth, innovation, and insights

https://www.elastic.co/blog/how-banks-use-existing-data-ai-business-challenges

8 ian. 2025, 20:10:05 | Elasticsearch

5 insights from public sector leaders: Solving organizational challenges with data and AI

https://www.elastic.co/blog/public-sector-leaders-insights-ai

7 ian. 2025, 16:20:07 | Elasticsearch

The evolving role of SREs: Balancing reliability, cost, and innovation

https://www.elastic.co/blog/site-reliability-engineer-role-evolution

19 dec. 2024, 22:30:02 | Elasticsearch

Reducing CVEs in Elastic container images

https://www.elastic.co/blog/reducing-cves-in-elastic-container-images

19 dec. 2024, 20:10:05 | Elasticsearch

The evolution of AI regulation in Asia: A comparative analysis

https://www.elastic.co/blog/artificial-intelligence-regulation-asia-comparative-analysis

18 dec. 2024, 23:20:03 | Elasticsearch

Techie