Machine Learning In Security: Good & Bad News About Signatures
Why security teams that rely solely on signature-based detection are overwhelmed by a high number of alerts.
First in a series of two articles about the history of signature-based detections, and how the methodology has evolved to identify different types of cybersecurity threats.
Used in the context of an outdated and manually intensive technology focused on older classes of threats, there’s little wonder why vendors would seek to distance the legacy term “signature” from their advanced detection technology. Vendors haven’t necessarily been deceptive in the labeling of their latest generation of techniques; it’s often just easier to create a new label for something than to fully explain the context and evolution of what preceded it.
Over the years, signature-based systems have changed and advanced, but the core concepts still lie at the heart of all modern detection systems – and will continue to be integral for the foreseeable future. To understand what a “signature system” is in reality, we need to understand the evolution of the detection path as directed and discovered by human intervention.
One-dimensional signatures: Blacklists and whitelists are examples of one-dimensional signature systems. They are found throughout security and exist in practically all detection and protection technologies. They are by far the fastest and most efficient way of categorizing a data artifact (e.g. a domain name, IP address, user-agent, MD5 hashes, etc.). As a Boolean operation, what you’re looking for is either on the list or it is not.
Two-dimensional signatures: Classic regular-expression functions and string matching are examples of two-dimensional signature systems. They are the fundamental building blocks of anti-malware, intrusion detection, and data leakage detection systems. In malware, they are often used to search a binary file for known strings which help to label the type of threat it represents. Two-dimensional signatures came to the fore as a means of detecting network-based threats within the content-level of traffic – easily capable of identifying previously known exploits and host enumeration techniques.
Data leakage prevention (DLP) is a more recent security technology that relies heavily upon two-dimensional signatures. Messages and file attachments are often scanned for specific strings (e.g. serial numbers, passwords, etc.) or construction formats (e.g. social security numbers of the format nnn-nn-nnnn with a regular expression of ^\d{3}-\d{2}-\d{4}$ ).
Multidimensional signatures: Security vendors developed a hybrid system as the threat spectrum grew and attackers found new ways to obfuscate the elements of their attacks that were most exposed to one-dimensional and two-dimensional signatures. Instead of triggering on a single signature, a multi-dimensional signature was created. In both sandboxing and network behavioral monitoring, certain actions and activities are labeled as either suspicious or bad.
When a threshold of good or bad activities is reached, the threat is classified and labeled. For example, a suspicious file is executed within a virtual environment. The file attempts to write to the Windows registry (neither good nor bad), add a file to the Windows startup path (suspicious), disable Windows updates (bad), read from the user’s contacts list (neither good nor bad), and then send email to every address listed in the contacts list (bad).
Together, all of these individual actions (i.e. signatures) are combined and tallied and a decision is made that the suspicious file is in fact malicious and most likely a spambot.
Signature systems all share the same characteristic of being able to promptly identify and label a threat. As signature systems have evolved, they have become capable of detecting and classifying a broader range of threats. In modern detection and prevention systems, a combination of different signature systems are used together so they can most accurately label a known threat, but this also has the problem of generating a high number of alerts that can overwhelm a team that solely relies on signature-based detection for security purposes.
Historically, the linear progression and sophistication of signature-based detection systems have been dependent upon human signature writers. For each new threat, a unique signature or signature artifact is created by a skilled engineer or security researcher. This pairing between signature and its human creator means that as the number of threats have increased, so too have the number of skilled personnel needed to develop and support the signatures that detect them. For obvious reasons, this is not a scalable business proposition – for neither the vendor or customer.
New developments in machine learning – in particular supervised and unsupervised learning algorithms – are now being applied to information security and are paving the way to a new class of signature systems capable of economically scaling to the threat.
Next in the series: Machine Learning In Security: Seeing the Nth Dimension in Signatures
Related Content:
Find out more about security threats at Interop 2016, May 2-6, at the Mandalay Bay Convention Center, Las Vegas. Click here for pricing information and to register.
About the Author
You May Also Like