Automating the triage and incident response for malware alerts
Welcome to the second post in our four-part blog series where we walk through the steps to automate some of the most common SOC processes. Last week, we went through applying security automation to the process of managing, investigating and responding to phishing alerts. This week, we take a look at addressing malware.
As a reminder, four factors make certain processes ideal candidates for automation.
Malware makes our list for two main reasons. First, malware alerts have inherently low fidelity, especially in large organizations. The sheer volume of malware-related alerts can easily inundate SOC teams, who need to correlate data from various sources/alerts to gain context but are faced with low signal-to-noise ratios (i.e. too many false positives). This causes confusion, thereby preventing analysts from identifying the real threat and taking appropriate action.
Secondly, because some malware infections can contaminate several systems in a very short period of time, quick response is an absolute necessity. If the malware has worm-like attributes, it can spread through you network, and even to adjoining networks, in just a matter of hours. WannaCry, which was a ransomware with worm-like attributes, managed to infect more than 230,000 systems in over 150 countries in just one day.
Automation can take care of the entire data collection process and present analysts with actionable information in a fraction of the time it would take them to manually aggregate the necessary details. Instead of spending a lot of time on a swivel-chair interface for data integration/correlation, analysts can view all the information they need through a single pane of glass and go straight to decision-making. This ultimately brings down the response time from hours or even days to just mere seconds, enabling you to nip potential malware outbreaks in the bud.
Let’s now go over a typical malware alert process flow or playbook and understand where security automation can provide great value.
Before SOC teams can respond to a malware alert, they need to go through a time-consuming and tedious process that begins with data gathering and user or host enrichment. Raw data from a single or even a handful of malware alerts is not enough to provide actionable information. In order to understand the implications of a particular alert, you’ll need to refer to other alerts. This will help you gain context about the issue on hand.
You’ll also need to compare the data you just gathered with your threat intelligence and web intelligence. What do they say about the hash you just obtained? Is it associated with a known malware? What do they say about the URL you discovered the suspected malware was connecting to? Is it a known C&C server?
You will also need to know more about the host that generated the alert. Was it a critical web server? Or was it just a secondary computer? That way, you will know, for instance, whether a quarantine would be a sufficient course of action or if you need to do more, like fire up a backup system.
To get all that information and obtain the best context, you’ll have to run the suspected malware through a series of scans, tests and a host of other procedures on different security solutions.
- VirusTotal for a hash
- SEP (Symantec Endpoint Protection) for additional context
- Nessus for vulnerability information
- SSCM to get context from asset information
- And so on
Collecting all this information can be time-consuming. Automation can do that all for you in just a couple of seconds.
As not all malware (zero-day threats in particular) can be detected through signature and basic heuristic-based scans, you’ll often need to send the file to a sandbox like Cuckoo for further analysis. There, the file can be allowed to run and its behavior recorded and analyzed in an isolated environment.
Again, this has to be done for every single instance of suspected malware. Security automation can save time by taking charge of sending the suspicious files to the sandbox environment, obtaining the results, and delivering them to your screen in a concise report.
First-level determination refers to that stage wherein analysts make an initial assessment based on the information gathered from the previous two stages and then arrive at a decision. Although some organizations might opt to do this manually, i.e. leaving the decision-making to the analyst, it’s also something that can be completely delegated to automation solutions that leverage machine learning-powered analytics platforms.
Some cases require deeper investigation. This would typically entail things like looking into your endpoint tools to obtain other pieces of information
- What were the other hosts (if any) in the organization where the hash in question manifested?
- What were the activities going on in those endpoints over the last 10 minutes when that specific alert was generated?
- Who were the end users logged in?
- What network connections were involved?
Again, these pieces of information can also be obtained from a variety of sources - log queries, sensors, network forensic tools (e.g. NetWitness), endpoint-related queries, etc. The task of querying and gathering all that data can be automated, so that the analyst can simply look at his/her screen and either make an decision right then and there or escalate the incident if necessary.
Cases that are escalated are addressed in different ways, depending on what the specific situation requires. Some require that the machine(s) in question be quarantined. Others call for the enforcement of certain policies to gain better visibility into a particular account. Still others may require you to conduct a full-blown incident response operation. Some of these escalation/response activities - quarantine of hosts and enforcement of policies - can be automated up to a certain extent, but often will require the expertise and hands-on participation of an analyst.
Last but not the least is the feedback/remediation stage. At this stage, SOC teams typically perform a series of tasks that improve the organization’s security posture - blacklisting the hash or URL, performing an intelligence update, updating security sensors, re-imaging systems, and so on. All these - you guessed it - can be partially or fully automated, depending on the policies within your organization.
Analysts devote so much time processing malware alerts. But a substantial portion of that time is consumed by mundane tasks such as data collection, basic analysis, forwarding of files, and several others that can actually be delegated to automation.
In our next post, we will cover security automation for DLP. If the suspense is killing you, get the full scoop now by checking out our webinar on Security Automation Quick Wins.