News, news analysis, and commentary on the latest trends in cybersecurity technology.
First Step in Securing AI/ML Tools Is Locating Them
Security teams need to start factoring for these tools when thinking about the software supply chain. After all, they can't protect what they don't know they have.
January 19, 2024
The growing number of applications incorporating artificial intelligence (AI) capabilities and tools that make it easier to work with machine learning (ML) models have created new software supply chain headaches for organizations, whose security teams now have to assess and manage the risks posed by these AI components. Security teams are looking at the software supply chain and thinking about open source components, but they have to recognize that ML is part of that and adjust the organization's security controls and data governance policies to reflect the reality that AI is already present.
One of the big challenges surrounding the growing use of AI in organizations is actually the same shadow IT problem enterprise defenders have been grappling with for years, says Gary McGraw, co-founder of the Berryville Institute of Machine Learning. Shadow ML and shadow AI exist because in many organizations, business groups and individual employees are the ones selecting and adopting ML applications and AI tools as part of their work processes. Security teams are often not informed when these tools are brought into the organization, and the lack of visibility means they aren't able to manage them or protect the data being used.
The question over AI use came up during a recent meeting with executives from a large Fortune 100 company and application security startup Legit Security, says McGraw (who is a member of Legit Security's advisory board). Legit Security's platform, which maps out the entire software development life cycle from development to delivery, has visibility in every tool and application being used.
"The CISO said, 'We don't allow machine learning by policy. Nobody uses it,'" McGraw says, and the team was able to use the platform to show multiple instances of ML and AI in use.
Not knowing what ML and AI is in use is a growing concern and not limited to just this company. In a recent Dark Reading research survey, 74% of respondents said they believed employees were using public generative AI (GenAI) tools, such as ChatGPT, for work purposes, even though the tools were not officially authorized. About a fifth of respondents (18%) said one of their top five concerns about AI was that they could not stop employees from using public GenAI tools.
How to Find AI
Legit Security is looking at how GenAI technologies change software development, such as using AI to generate code or embedding large language models (LLMs) into products, says Liav Caspi, co-founder and CTO of Legit Security. The by-product of platform mapping how the organization develops and delivers software is a "very detailed inventory" of all the open source and closed source software components being used, build tools, frameworks, plug-ins, and even the servers the developers are using, Caspi says. Since new technology doesn't always get introduced into the organization's technology stack in a top-down manner, this asset catalog is often the first time the executives learn about all of the software components and developer tools in use.
"It turns out that what people think is the case and what is really the case don't match up sometimes," McGraw says. "When you say to the CISO or the person who's running software security, 'Hey, did you know there are six of these?' and they go, 'I thought we only had two,' [there's a problem.]"
Along with answering questions such as how many bug-tracking systems the organization is running, how many instances of GitHub has been installed, how many instances of JIRA exist, or which developers are using which compilers, Legit Security answers questions such as where the developers are using LLMs, which applications are connecting to which AI service, what data is being sent to those services, and what models are actually being used. The question about whether any data is being sent to other services is particularly important for entities in regulated industries, says Caspi.
"[We] discovered things that big organizations didn't know, like they didn't know they're using a specific library. They didn't know they have developers overseas that copied the code into an account somewhere," Caspi says.
Another area of concern is whether the organization is relying on any AI-generated code, something that is becoming more relevant with the introduction of various tools and copilots that help developers write code, Caspi says. In the Dark Reading survey, 28% of respondents cited concerns over possible vulnerabilities in AI-generated code.
Currently, the platform crawls the development infrastructure to identify all the parts touched by AI. It looks through the repositories and detects LLMs that are embedded in applications, whether code-generation tools were used, which software libraries have been added, what APIs calls are being made, and what kind of licenses are used. For example, huggingface is widely used in software development projects to build and incorporate machine learning models. Security teams need to be thinking about version numbers as well as how models are being implemented, Caspi says.
How to Secure ML
To properly secure machine learning, the enterprise needs to be able to do three things: find where machine learning is being used, threat model the risk based on what was found, and put in controls to manage those risks.
"We need to find machine learning [and] do a threat model based on what you found," McGraw says. "You found some stuff, and now your threat model needs to be adjusted. Once you do your threat model and you've identified some risks and threats, you need to put in some controls right across all those problems."
Once the security team knows what is being used, they can either block the component or determine an appropriate policy to add safety checks, or "guardrails," to the development process, Caspi notes. For example, it is possible for an application to go to production without someone reviewing the auto-generated code to ensure issues have not been introduced into the codebase. The code block may not actually contain any vulnerabilities, but security teams can create a policy requiring auto-generated code to be reviewed by two people before it can be merged into the codebase, he says.
"We have a lot of detections that will tell you you're missing a guardrail," Caspi says. The security team gets "as much information as possible on what we found," so that they can use the information to take some kind of action, Caspi says. In some cases, there will be some guidance on the best course of action or best practices.
There is no one tool or platform that can handle all three things, but McGraw happens to be on the advisory boards for three companies corresponding to each of the areas. Legit Security finds everything, IriusRisk helps with threat modeling, and Calypso AI puts controls in place.
"I can see all the parts moving," McGraw says. "All the pieces are coming together."
About the Author
You May Also Like