Using Artificial InteIligence to Power Application Security
AI is more than one million ‘If-Else’ statements in a trench coat. It’s technology we use to decrease threat vector identification times and increase the speed for delivering real-time security risk assessments.
Artificial intelligence (AI) is a bit of a buzzword that has been thrown around quite a bit in the past few years. But, today, many companies are making real game-changing use of the technology. At WhiteHat, we use AI to improve both the speed and accuracy of our application security platform.
For example, application security teams are constantly caught between the need to keep pace with security testing and the ability to allow developer teams to operate in the rapid DevOps environment. Our AI software dramatically decreases threat vector identification times and improves the efficiency of false positive identification. As a result, enterprises can increase the speed at which developers are made aware of potential application security vulnerabilities and deliver real-time security risk assessments.
To fully understand the impact of this technology, let’s first investigate the meaning of the terms themselves.
What are Artificial Intelligence and Machine Learning?
AI is not just 1,000,000 If-Else statements in a trench coat. It is a broad category that can be summed up as "the study of intelligent agents." However, what we actually use at WhiteHat is a subset of AI, called machine learning (ML).
ML uses a sophisticated process to ingest extremely large amounts of labelled data in order to "train" a model that can later accurately classify new information. This is usually what is working behind the scenes for programs such as facial recognition.
As our senior engineer likes to say, "In a nutshell, machine learning is a tool you use to reverse engineer a statistical model when building it the classical way would be way too complex or impossible."
How Does Machine Learning Apply to Application Security?
At its core, machine learning is a heavy-duty classification engine. It can answer questions like:
Whose face is this? Amy’s, Betsy’s or Christie’s?
Is this a picture of a dog or a cat?
Is this thing a risk or not?
The algorithm then (usually) assigns a weight to each answer: "This picture looks 87% like Amy, 26% like Betsy and 3% like Christie," for instance. From that, you can conclude that is probably a picture of Amy.
A large part of running a successful security program is having metrics, identifying risks and identifying incidents, all of which are forms of classification. Does this set of signals and responses look more like a risk or more like normal, everyday responses?
What Does This Mean for the User?
When all is said and done, AI and ML are just supporting technologies. Generally speaking, to the user, saying "our product uses AI" should mean no more than if we said, "some of our backend services are written using Go."
One of the challenges to any machine learning project is getting the required training data. Generally speaking, the more complex the problem, the more training data you need, and the more training data you have, the more accurate your model will be. The data also has to be labeled in a way that a machine can read it. It’s not enough to just have 1,000 pictures of Amy, Betsy and Christie. You have to have someone label the faces in each picture. This can be a major roadblock for many startup projects.
We are lucky at WhiteHat to be able to sidestep this issue. We have nearly two decades worth of findings from our scanner, all of which have already been labeled as “vulnerable” or “not vulnerable” by our Threat Research Center. Our engineers were able to take this data and use it to train models that aid our research center in verifying findings, resulting in faster overall delivery of service.
Future Research
Our machine learning team is currently involved in several research projects.
The first, and probably most obvious, is to help classify findings. This will help answer: "Is this finding obviously a false positive or obviously vulnerable?"
The second major focus is to identify "landing spaces." This is a more technical concept that refers to allowing the scanner to infer metadata about the surrounding response in the same way a human attacker or researcher would.
The third focus is form classification. This will help answer: "Is this a search form or a contact-us form?” “Does this form delete or create anything?” This is necessary to allow for more accurate testing, while still remaining safe in a production environment.
While WhiteHat is utilizing AI and ML, it is only the beginning and we look forward to sharing upcoming developments.
About The Author: Bryan Becker, Product Manager, WhiteHat Security
Bryan Becker is product manager at WhiteHat Security, an application security solutions platform. With over ten years of software engineering and security research experience, Brian is focused on defining and delivering products that make real change in the cybersecurity industry.
WhiteHat Security is the leading advisor for application security with the most comprehensive platform powered by artificial and human intelligence. For more information, visit www.whitehatsec.com.
About the Author
You May Also Like