Machine Learning for Ransomware Defense

Ransomware keeps getting more dangerous but defense is improving, too. Machine learning might be the key to actually keeping up with the level of attacks.

Bogdan Botezatu, Senior E-threat Analyst, Bitdefender

December 11, 2017

5 Min Read

In the past several years, ransomware has inflicted financial losses estimated at billions of dollars -- and that's only what victims have reported to law enforcement. Described by security researchers as one of the most prolific and financially stimulating malware categories, ransomware has successfully been ported to all operating systems (OS), including mobile OSs.

While originally developed to target computers running Windows, as of 2016 Apple's MacOS and Linux have seen their own distributions of ransomware. In fact, the ransomware business has been so prolific that cybercriminals have even turned it into an as-a-service offering. Ransomware-as-a-service lets even less tech-savvy users rent ransomware services and start infecting victims for their own financial gain.

Relying on tools such as encryption, obfuscation and polymorphism, ransomware has caused serious concerns for the security industry, as traditional detection mechanisms are ill-equipped to detect each victim-specific ransomware samples. Consequently, machine learning algorithms designed to automatically and correctly tag ransomware samples based on their behavior or similarity with known ransomware have become a necessity.

While machine learning plays a significant part in detecting ransomware, no single machine learning algorithm can spot all ransomware. That would require an ensemble of specifically trained machine learning algorithms working together, each designed to identify either a specific ransomware family, a ransomware-disseminating website, or packers (a technique commonly referred to as "executable compression" and used to compress the ransomware payload to make it difficult for security tools to analyze it).

Ransomware doesn't discriminate
Ransomware indiscriminately selects targets, with the sole aim of locking access to critical files and demanding payment to restore access. However, it has recently undergone some transformations that have allowed it to infect victims even without any user interaction.

WannaCry was the first ransomware outbreak that leveraged a Windows vulnerability to automatically spread across networks and infect victims without interaction with the victim. Simply having a vulnerable PC with an Internet connection would have been enough to get infected, which is why hundreds of thousands of computers were rendered inoperative during its relatively short outbreak. GoldenEye was yet another example of a ransomware pandemic that affected several European countries, including Poland, Germany, Italy, Spain and France.

While in both cases the attackers made little money from ransoms, these incidents proved highly disruptive and demonstrated just how easily unpatched vulnerabilities can be exploited to deliver any type of threat, even one as pervasive as ransomware.

The Internet of Things is not immune either, but the business model is fundamentally different from that of the PC. Instead of locking you out of your files, for example, ransomware designed for IoT devices and smart homes could lock you out of your home. Even medical devices -- including implantable ones -- could be exploited and used to extort victims. For example, the Internet connection of Dick Cheney's Pacemaker was allegedly disabled for fear that terrorists -- or even ransomware -- could threaten his life.

Machine learning steps in
Where traditional file-based detection security technologies fail, machine learning algorithms succeed. Neural networks and deep learning algorithms can detect unknown ransomware samples if they're properly trained and adjusted to produce a low number of false positives. Augmenting cloud-based detections with machine learning and genetic algorithms is also effective in combating the rampant growth of ransomware caused by its polymorphic behavior.

In a nutshell, it all starts with a large dataset of ransomware files and an even larger set of clean files. The algorithm is tasked with finding some characteristics for each file in the training set, and normalizing it into a number that is usually called a "feature." As one characteristic may create more than one feature, only a subset of those features will be used to train a model for the sample set.

When using neural networks to create models used for ransomware identification, all samples are usually mapped in a matrix comprising tens of thousands of features. Instead of having a three-dimensional matrix that describes three features necessary for a file to be considered ransomware, imagine an n-dimensional matrix that has more than 40,000 features. That might sound extremely complicated, but the end result is actually a mathematical equation -- also known as a model -- that acts as a condition that, once satisfied, will tag a file as ransomware.

A major benefit of using machine learning models to spot ransomware is that it increases the number of possible ransomware files it can detect -- if enough ransomware features are present in an unknown ransomware sample, the file is likely ransomware.

The second benefit is that machine learning models are extremely small, usually around 1 kilobyte, which makes them easy to deploy across the entire user base. The only downside of using machine learning models to detect ransomware is that they have to be extensively tested before deployment to avoid incorrectly tagging clean files as malicious.

Some machine learning algorithms can even identify suspicious URLs that are either used to disseminate ransomware or act as command and control servers. Using Natural Language Processing (NLP) algorithms and various clustering methods to parse texts, they can potentially block new or never-before-seen links from being accessed by victims, preventing the actual ransomware payload from reaching the computer.

Machine learning algorithms for ransomware identification can be used as a proactive method for combating ransomware threats, regardless of whether they're designed for PCs, mobile devices or even IoTs. The main benefit of machine learning is that it can be used as a tool to augment existing security layers, giving them proactivity, efficacy and performance.

Ransomware is here to stay: So is defense
It's highly unlikely that ransomware will go away any time soon, especially since digitalization has brought increased interconnectivity between systems. With a proven and tested business model and financial gains in the billions of dollars, ransomware is likely the biggest mass-market threat to both end users and organizations.

However, machine learning algorithms can augment all security layers to detect and plug threats at pre-execution, on-execution and post-execution, making ransomware less of threat and more of a nuisance.

Related posts:

&emdash; Bogdan Botezatu is living his second childhood at Bitdefender as senior e-threat analyst.

Read more about:

Security Now

About the Author

Bogdan Botezatu

Senior E-threat Analyst, Bitdefender

Bogdan Botezatu is living his second childhood at Bitdefender as senior e-threat analyst. When he is not documenting sophisticated strains of malware or writing removal tools, he teaches extreme sports such as surfing the Web without protection or how to rodeo with wild Trojan horses. He believes that most things in life can be beat with strong heuristics and that anti-malware research is like working as a secret agent: you need to stay focused at all times, but you get all the glory when you catch the bad guys.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights