Automating Breach Detection For The Way Security Professionals Think
The missing ingredient in making a real difference in the cumbersome process of evaluating a flood of alerts versus a small, actionable number is context.
Solving the data breach crisis requires a deliberate shift in strategy and thinking. Obviously, older approaches to early detection of targeted breaches are not effective today, given that dwell time hovers around six months.
One major shift the industry needs to make is to move away from a reliance on the manual efforts of highly trained security analysts to more automated processes that speed up the amount of time and reduce the overall effort in finding an active attacker inside a network.
I say this because one of the biggest problems in breach detection is not the lack of data or even analytical tools and capabilities. The biggest issue is actually knowing how and where to look for intruders among the massive amounts of existing data, and then pinpointing them quickly and accurately. When was the last time you heard a company victimized by a data breach complain that they didn’t have enough data or analytical tools? Most companies are drowning in security data. Then, after the fact, they point back to an alert or notice what was missed amidst the noise. Automation could potentially change all of this.
By automation, I don’t mean the processing of manual tasks without human intervention. Automation for breach detection should resemble human thinking as much as possible. Nor can it be a replacement for security professionals themselves. Instead, automation should offload tasks and free security people from more menial, repetitive operations and let them concentrate on those with greater impact and value.
Likewise, hiring more security practitioners is unlikely to solve the data breach problem. While some lucky security teams may find the necessary talent (despite the acute industry-wide shortage), the problems they need to solve are growing faster than their hiring ability. Instead, organizations need to focus existing team on the relevant cases -- and not on filtering false positives. In all but the largest security organizations, that means being able to investigate a handful of cases per day.
How can you reduce the number of events from hundreds or thousands of alerts per day to just a fraction of that? The first step is focusing on the right events.
Security teams need to look for events that indicate the operational activity of an intruder who has circumvented perimeter security and is already inside the network, specifically, behavior of users and hosts. The activities that are strong breach indicators tend to be lateral movement, active command and control, internal reconnaissance, misuse of privileged credentials, and unexpected administrative behaviors.
By focusing on the right events, you can eliminate two common sources of false positives: IDS signatures that fire randomly even after tuning the system because they are weak, and intrusion attempts such as when viruses detected in emails by a sandbox trigger an intrusion alert rather than indicate what may be the real source -- a compromised system.
Context is key
Just flagging indicative behaviors will generate a vast amount of false positives. The missing ingredient in making a real difference between a flood of alerts or a small, actionable number is context. Based on context, a human security operator can understand that certain behaviors may be standard IT operations even if they come from privileged accounts, while some similar behaviors may easily be part of a breach. Automating context-based analysis is the step that can dramatically reduce the number of events by orders of magnitude.
In order to understand context, one has to profile every user and device. Profiling is not simply a list of network connections made in the most recent hour. Ideally, profiling generates a long-term representation of the function of the user and different dimensions of their behavior. As humans, we do a lot of profiling in our head. We can cognitively grasp the function of different people and hosts. To automate this profiling process, you’ll need to create individual profiles for the users and hosts that are multifaceted, and compare a new behavior to the learned profiles before firing events.
It is difficult to do this type of automation based on rules or even event correlations, since both are static and don’t take into account a profiling context. While the very largest organizations may have sufficient resources to employ programmers and data scientists to develop this logic, there are also some commercial products now available that perform this type of automated breach detection out of the box. Either way, here are three ways to illustrate how context can be used to focus on the most relevant events:
Proxy or IDS logs. If you look at proxy or IDS logs, you might see many events showing hosts connecting to suspicious or malicious websites. Most of them are probably not from actual infections but the result of users independently clicking on bad links or going to a suspicious website. In most cases nothing happens to the user’s computer because it is (ideally) patched and updated. The missing context here is the long-term behavior of the host. If the host is connecting to a suspicious server every day, it implies that a command and control channel is active.
Peer group analysis. Another context to consider is what other hosts normally do through peer group analysis. If many other hosts connect to the same service, the action is most likely a normal and common service. On the other hand, if it is very rare it may raise the suspicion level and will be worth further investigation.
Privileged credentials. Attackers often use administrative operations for lateral movement, so it is important to look at remote access to systems and the use of privileged credentials. These events occur thousands of times per day and are impossible to evaluate manually. The missing context here would be the baseline behavior of the user and previous administrative operations they performed. In an attack context, the attacker is likely conducting some new administrative activities that were not previously seen by the same user. The relevant events would be administrative behaviors that are not normal for the user or for the network.
Through automation, teams no longer have to face the drudgery of false positive after false positive,and the futility of hard work that results in nothing. It’s time to shift strategy.
About the Author
You May Also Like