Cybersecurity In-Depth: Feature articles on security strategy, latest trends, and people to know.

Assume Breach When Building AI Apps

AI jailbreaks are not vulnerabilities; they are expected behavior.

Michael Bargury, CTO & Co-Founder, Zenity

August 19, 2024

4 Min Read
person in a dark suit and beard breaking out of a white wall
Source: Pinkypills via iStock

COMMENTARY

If you are still a skeptic about artificial intelligence (AI), you won't be for long. I was recently using Claude.ai to model security data I had at hand into a graph for attack path analysis. While I can do this myself, Claude took care of the task in minutes. More importantly, Claude was just as quick to adapt the script when significant changes were made to the initial requirements. Instead of me having to switch between being a security researcher and data engineer — exploring the graph, identifying a missing property or relation, and adapting the script — I could keep on my researcher hat while Claude played the engineer.

These are moments of clarity, when you realize your toolbox has been upgraded, saving you hours or days of work. It seems like many people have been having those moments, becoming more convinced of the impact AI is going to have in the enterprise.

But AI isn't infallible. There have been a number of public examples of AI jailbreaking, where the generative AI model was fed carefully crafted prompts to do or say unintended things. It can mean bypassing built-in safety features and guardrails or accessing capabilities that are supposed to be restricted. AI companies are trying to solve jailbreaking; some say they have either done so or are making significant progress. Jailbreaking is treated as a fixable problem — a quirk we'll soon get rid of.

As part of that mindset, AI vendors are treating jailbreaks as vulnerabilities. They expect researchers to submit their latest prompts to a bug-bounty program instead of publishing them on social media for laughs. Some security leaders are talking about AI jailbreaks in terms of responsible disclosure, creating a clear contrast with those supposedly irresponsible people who disclose jailbreaks publicly.

Reality Sees Things Differently

Meanwhile, AI jailbreaking communities are popping up on social media and community platforms, such as Discord and Reddit, like mushrooms after the rain. These communities are more akin to gaming speedrunners than to security researchers. Whenever a new generative AI model is released, these communities race to see who can find a jailbreak first. It usually takes minutes, and they never fail. These communities do not know about, of care about, responsible disclosure.

To quote an X post from Pliny the Prompter, a popular social media account from the AI breaking community: "circumventing AI 'safety' measures is getting easier as they become more powerful, not harder. this may seem counterintuitive but it's all about the surface area of attack, which seems to be expanding much faster than anyone on defense can keep up with."

Let's imagine for a second that vulnerability disclosure could work — that we can get every person on the planet to submit their evil prompts to a National Vulnerability Database-style repository before sharing it with their friends. Would that actually help? Last year at DEF CON, the AI village hosted the largest public AI red-teaming event, where they reportedly collected over 17,000 jailbreaking conversations. This was an incredible effort with huge benefits to our understanding of securing AI, but it did not make any significant change to the rate at which AI jailbreaks are discovered.

Vulnerabilities are quirks of the application in which they were found. If the application is complex, it has more surface for vulnerabilities. AI captures human languages so well, but can we really hope to enumerate all quirks of the human experience?

Stop Worrying About Jailbreaks

We need to operate under the assumption that AI jailbreaks are trivial. Don't give your AI application capabilities it should not be using. If the AI application can perform actions and relies on people not knowing those prompts as a defense mechanism, expect those actions to be eventually exploited by a persistent user.

AI startups are suggesting we think of AI agents as employees who know a lot of facts but need guidance on applying their knowledge to the real world. As security professionals, I believe we need a different analogy: I suggest you think of an AI agent as an expert you want to hire, even though that expert defrauded their previous employer. You really need this employee, so you put a bunch of guardrails in place to ensure this employee won't defraud you as well. But at the end of the day, every data and access you give this problematic employee exposes your organization and is risky. Instead of trying to create systems that can't be jailbroken, let's focus on applications that are easy to monitor for when they inevitably are, so we can quickly respond and limit the impact.

About the Author

Michael Bargury

CTO & Co-Founder, Zenity

Michael Bargury is an industry expert in cybersecurity focused on cloud security, SaaS security, and AppSec. Michael is the CTO and co-founder of Zenity.io, a startup that enables security governance for low-code/no-code enterprise applications without disrupting business. Prior to Zenity, Michael was a senior architect at Microsoft Cloud Security CTO Office, where he founded and headed security product efforts for IoT, APIs, IaC, Dynamics, and confidential computing. Michael holds 15 patents in the field of cybersecurity and a BSc in Mathematics and Computer Science from Tel Aviv University. Michael is leading the OWASP community effort on low-code/no-code security.

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights