6 Tips for Using Big Data to Hunt Cyberthreats
You need to be smart about harnessing big data to defend against today’s security threats, data breaches, and attacks.
Ask 10 different people what big data is, and you may get 10 different answers. For the sake of this article, big data refers to the mining of usable information from the large amounts of data being created around the world every day. While companies look to take advantage of all this data to improve operations, increase sales, and lower costs, many are discovering that it can also be used for security by offering a broader view of risk and vulnerabilities.
Big data offers the ability to analyze massive numbers of potential security events and make connections between them to create a prioritized list of threats. With big data, disparate data can be connected, which allows cyber security professionals to take a proactive approach that prevents attacks.
In today’s complex network environments, Advanced Persistent Threats (APTs) and other cyberthreat eradication may be accomplished by leveraging intelligence from data providers who invest time and materials into searching for, finding, and identifying where threats are coming from and to which IPs and URLs they are talking to.
To use this approach against cyberthreats, appliances should be monitoring threat feeds from trusted providers for indicators of compromise (IOCs), including big data feeds like domain name systems (DNS) feeds, command and control (C2) feeds, and black/white lists, in order to correlate and hunt threats in a data set.
Here are six tips for using big data to help wipe out cyberthreats in your organization. You can start with these six data feeds:
DNS feed
DNS data feed can provide lists of newly registered domains, domains commonly used for spamming, and newly created domains. All of these lists can be incorporated into black and white lists, and all of them should be null routed and logged for further analysis.
Using the networks own DNS servers to check outgoing queries would yield domains that are not resolving. The fact that any of this is occurring could mean you have discovered a domain generation algorithm (DGA). All of this information can be readily used in the fight to defend your company’s network. One other gem here is that the incident response team members would have the data they need to track down the suspect machine -- the LAN data. Here’s yet another reason to log all of the LAN traffic possible.
C2 systems
Incorporating C2 data will provide black lists of IP addresses and domains, and there are plenty of these lists out there. Under no circumstances should network traffic be reaching out to known C2 systems from a corporate network. Should the network have an incident response team with time to investigate cybercrime, this would be a good place to relocate an infection over to one of the network honeypot machines.
Threat intelligence
There are IP address and domain reputation feeds that may be used to help determine if an address is likely safe or not. Some feed providers return a binary answer. For example, a "go" or "no go." Some more modern feed vendors are beginning to reply with threat-level indicators.
For example, an IP address may be evaluated and the answer would contain a "threat rating" instead of a binary response. The individual appliance managers could then decide what level of risk they were willing to accept on their networks based on the threat rating and scale being used.
Network traffic logs
There are plenty of vendors out there that offer appliances that will log all of the networks traffic or even just parts of it. When using big data to hunt threats, it is very easy to get lost in the noise and the day-to-day work cycles. However, to hunt threats effectively and validate the work being done, a log of the network traffic, in some fashion, would be a basic requirement. This log would also aid in analysis of a data breach.
At least one honeypot
Honeypots can be effective in identifying malware targeting a particular network. The implications here are tremendous. Additionally, the payloads dropped could have signatures made for them and then be hunted on other LAN machines, and monitor the LAN for similar network traffic.
This information is priceless in the case of a targeted attack that may not have been detected by any of the antivirus (AV) vendors. This means your AV software and appliances would not yet know of this attack vector. If they are unable to identify it themselves, there is no protection for the LAN.
Data quality matters
Finally, it is important to call attention to the data feed itself. There are lots of vendors selling data. The quality and precision of this data must be of the utmost concern. For this reason, it is critical that organizations develop in-house data evaluation teams that can effectively raise questions from a mile-high perspective about vendor data quality. For example:
How recently was the most recent data added?
Are significant sample sets available during evaluations?
How many new entries are added daily?
Is all or part of this data available freely?
How long has the vendor been collecting this data?
How large is its team?
Can we see a sample contract? (The terms should be no more than a year at a time.)
Undoubtedly, security breaches and fraud incidents will continue to make headlines. Even though organizations are taking steps to address APTs and other attacks, the fact remains that traditional security technologies lack the sophisticated capabilities necessary to detect and protect against such attacks. By utilizing big data, organizations can create more robust threat and risk detection programs -- and prevent malicious activity on a wider and deeper scale.
About the Author
You May Also Like