LLMs Are a New Type of Insider AdversaryLLMs Are a New Type of Insider Adversary
The inherent intelligence of large language models gives them unprecedented capabilities like no other enterprise tool before.
COMMENTARY
Today, security teams are treating large language models (LLMs) as a vital and trusted business tool that can automate tasks, free up employees to do more strategic functions, and give their company a competitive edge. However, the inherent intelligence of LLMs gives them unprecedented capabilities like no other enterprise tool before. The models are inherently susceptible to manipulation, so they behave in ways they aren't supposed to, and adding more capabilities makes the impact of that risk even more severe.
This is particularly risky if the LLM is integrated with another system, such as a database containing sensitive financial information. It's similar to an enterprise giving a random contractor access to sensitive systems, telling them to follow all orders given to them by anyone, and trusting them not to be susceptible to coercion.
Because LLMs lack critical thinking capabilities and are designed to just respond to queries with guardrails of limited degrees of strength, they need to be treated as potential adversaries, and security architectures should be designed following a new "assume breach" paradigm. Security teams must operate under the assumption that the LLM can and will act in the best interest of an attacker and build protections around it.
LLM Security Threats to the Enterprise
There are a number of security risks LLMs pose to enterprises. One common risk is that they can be jailbroken and forced to operate in a way they were not intended for. This can be accomplished by inputting a prompt in a manner that breaks the model's safety alignment. For example, many LLMs are designed to not provide detailed instructions when prompted for how to make a bomb. They respond that they can't answer that prompt. But there are certain techniques that can be used to get around the guardrails. An LLM that has access to internal corporate user and HR data could conceivably be tricked into providing details and analysis about employee working hours, history, and the org chart to reveal information that could be used for phishing and other cyberattacks.
A second, bigger threat to organizations is that LLMs can contribute to remote code execution (RCE) vulnerabilities in systems or environments. Threat researchers presented a paper at Black Hat Asia this spring that found that 31% of the targeted code bases — mostly GitHub repositories of frameworks and tools that companies deploy in their networks — had remote execution vulnerabilities caused by LLMs.
When LLMs are integrated with other systems within the organization, the potential attack surface expands. For example, if an LLM is integrated with a core business operation like finance or auditing, a jailbreak can be used to trigger a particular action within that other system. This capability could lead to lateral movement to other applications, theft of sensitive data, and even making changes to data within financial documents that might be shared externally, impacting share price or otherwise causing harm to the business.
Fixing the Root Cause Is More Than a Patch Away
These are not theoretical risks. A year ago, a vulnerability was discovered in the popular LangChain framework for developing LLM-integrated apps, and other iterations of it have been reported recently. The vulnerability could be used by an attacker to make the LLM execute code, say a reverse shell, which would give access to the server running the system.
Currently, there aren't sufficient security measures in place to address these issues. There are content filtering systems, designed to identify and block malicious or harmful content, possibly based on static analysis or filtering and block lists. And Meta offers Llama Guard, which is an LLM trained to identify jailbreaks and malicious attempts at manipulating other LLMs. But that is more of a holistic approach to treating the problem externally, rather than addressing the root cause.
It's not an easy problem to fix, because it's difficult to detect the root cause. With traditional vulnerabilities, you can patch the specific line of code that is problematic. But LLMs are more obscure, and we don't have visibility into the black box that we need to do specific code fixes like that. The big LLM vendors are working on security, but it's not a top priority; they're all competing for market share, so they're focused on features.
Despite these limitations, there are things enterprises can do to protect themselves. Here are five recommendations to help mitigate the insider threat that LLMs can become:
Enforce the principle of least privilege: Provide the bare minimum privilege needed to perform a task. Ask yourself: How does providing least privilege materially affect the functionality and reliability of the LLM?
Don't use an LLM as a security perimeter: Only give it the abilities you intend it to use, and don't rely on a system prompt or alignment to enforce security.
Limit the LLM's scope of action: Restrict its capabilities by making it impersonate the end user.
Sanitize the training data and LLM output and the training data: Before using any LLM, make sure there is no sensitive data going into the system, and validate all output. For example, remove XSS payloads that are in the form of markdown syntax or HTML tags.
Use a sandbox: In the event you want to use the LLM to run code, you will want to keep the LLM in a protected area.
The OWASP Top 10 list for LLMs has additional information and recommendations, but the industry is in the early stages of research in this field. The pace of development and adoption has happened so quickly that threat intel and risk mitigation haven't been able to keep up. Until then, enterprises need to use the insider threat paradigm to protect against LLM threats.
About the Author
You May Also Like