ChatGPT Exposes Its Instructions, Knowledge & OS Files

According to Mozilla, users have a lot more power to manipulate ChatGPT than they might realize. OpenAI hopes those manipulations remain within a clearly delineated sandbox.

4 Min Read
ChatGPT, typed out on a screen
Source: mundissima via Alamy Stock Photo

ChatGPT exposes significant data pertaining to its instructions, history, and the files it runs on, placing public GPTs at risk of sensitive data exposure, and raising questions about OpenAI's security on the whole.

The world's leading AI chatbot is more malleable and multifunctional than most people realize. With some specific prompt engineering, users can execute commands almost like one would in a shell, upload and manage files as they would in an operating system, and access the inner workings of the large language model (LLM) it runs on: the data, instructions, and configurations that influence its outputs.

OpenAI argues that this is all by design, but Marco Figueroa, a generative AI (GenAI) bug-bounty programs manager at Mozilla who has uncovered prompt-injection concerns before in ChatGPT, disagrees.

"They're not documented features," he says. "I think this is a pure design flaw. It's a matter of time until something happens, and some zero-day is found," by virtue of the data leakage.

Prompt Injection: What ChatGPT Will Tell You

Figueroa didn't set out to expose the guts of ChatGPT. "I wanted to refactor some Python code, and I stumbled upon this," he recalls. When he asked the model to refactor his code, it returned an unexpected response: directory not found. "That's odd, right? It's like a [glitch in] the Matrix."

Related:In Appreciation: Amit Yoran, Tenable CEO, Passes Away

Was ChatGPT processing his request using more than just its general understanding of programming? Was there some kind of file system hidden underneath it? After some brainstorming, he thought of a follow-up prompt that might help elucidate the matter: "list files /", an English translation of the Linux command "ls /".

In response, ChatGPT provided a list of its files and directories: common Linux ones like "bin", "dev", "tmp", "sys", etc. Evidently, Figueroa says, ChatGPT runs on the Linux distribution "Debian Bookworm," within a containerized environment.

By probing the bot's internal file system — and in particular, the directory "/home/sandbox/.openai_internal/" — he discovered that besides just observing, he could also upload files, verify their location, move them around, and execute them.

OpenAI Access: Feature or Flaw?

In a certain light, all of this added visibility and functionality is a positive — offering even more ways for users to customize and level up how they use ChatGPT, and enhancing OpenAI's reputation for transparency and trustworthiness.

Indeed, the risk that a user could really do anything malicious here — say, upload and execute a malicious Python script — is softened by the fact that ChatGPT runs in a sandboxed environment. Anything a user can do will, in theory, be limited only to their specific environment, strictly cordoned off from any of OpenAI's broader infrastructure and most sensitive data.

Related:Managing Cloud Risks Gave Security Teams a Big Headache in 2024

Figueroa warns, though, that the extent of information ChatGPT leaks via prompt injection might one day help hackers find zero-day vulnerabilities, and break out of their sandboxes. "The reason why I stumbled onto everything I did was because of an error. This is what hackers do [to find bugs]," he says. And if trial and error doesn't work for them, he adds, "the LLM could assist you in figuring out how to get through it."

In an email to Dark Reading, a representative of OpenAI reaffirmed that it does not consider any of this a vulnerability, or otherwise unexpected behavior, and claimed that there were "technical inaccuracies" in Figueroa's research. Dark Reading has followed up for more specific information.

The More Immediate Risk: Reverse-Engineering

There is one risk here, however, that isn't so abstract.

Besides standard Linux files, ChatGPT also allows its users to access and extract much more actionable information. With the right prompts, they can unearth its internal instructions — the rules and guidelines that shape the model's behavior. And even deeper down, they can access its knowledge data: the foundational structure and guidelines that define how the model "thinks," and interacts with users.

Related:DDoS Attacks Surge as Africa Expands Its Digital Footprint

On one hand, users might be grateful to have such a clear view into how ChatGPT operates, including how it handles safety and ethical concerns. On the other hand, this insight could potentially help bad actors reverse engineer those guardrails, and better engineer malicious prompts.

Worse still is what this means for the millions of custom GPTs available in the ChatGPT store today. Users have designed custom ChatGPT models with focuses in programming, security, research, and more, and the instructions and data that gives them their particular flavor is accessible to anyone who feeds them the right prompts.

"People have put secure data and information from their organizations into these GPTs, thinking it's not available to everyone. I think that is an issue, because it's not explicitly clear that your data potentially could be accessed," Figueroa says.

In an email to Dark Reading, an OpenAI representative pointed to GPT Builder documentation, which warns developers about the risk: "Don't include information you do not want the user to know" it reads, and flags its user interface, which warns, "if you upload files under Knowledge, conversations with your GPT may include file contents. Files can be downloaded when Code Interpreter is enabled."

About the Author

Nate Nelson, Contributing Writer

Nate Nelson is a writer based in New York City. He formerly worked as a reporter at Threatpost, and wrote "Malicious Life," an award-winning Top 20 tech podcast on Apple and Spotify. Outside of Dark Reading, he also co-hosts "The Industrial Security Podcast."

Keep up with the latest cybersecurity threats, newly discovered vulnerabilities, data breach information, and emerging trends. Delivered daily or weekly right to your email inbox.

You May Also Like


More Insights