Critical Bugs Put Hugging Face AI Platform in a 'Pickle'
One issue would have allowed cross-tenant attacks, and another enabled access to a shared registry for container images; exploitation via an insecure Pickle file showcases emerging risks for AI-as-a-service more broadly.
April 5, 2024
Two critical security vulnerabilities in the Hugging Face AI platform opened the door to attackers looking to access and alter customer data and models.
One of the security weaknesses gave attackers a way to access machine learning (ML) models belonging to other customers on the Hugging Face platform, and the second allowed them to overwrite all images in a shared container registry. Both flaws, discovered by researchers at Wiz, had to do with the ability for attackers to take over parts of Hugging Face's inference infrastructure.
Wiz researchers found weaknesses in three specific components: Hugging Face's Inference API, which allows users to browse and interact with available models on the platform; Hugging Face Inference Endpoints — or dedicated infrastructure for deploying AI models into production; and Hugging Face Spaces, a hosting service for showcasing AI/ML applications or for working collaboratively on model development.
The Problem With Pickle
In examining Hugging Face's infrastructure and ways to weaponize the bugs they discovered, Wiz researchers found that anyone could easily upload an AI/ML model to the platform, including those based on the Pickle format. Pickle is a widely used module for storing Python objects in a file. Though even the Python software foundation itself has deemed Pickle as insecure, it remains popular because of its ease of use and the familiarity people have with it.
"It is relatively straightforward to craft a PyTorch (Pickle) model that will execute arbitrary code upon loading," according to Wiz.
Wiz researchers took advantage of the ability to upload a private Pickle-based model to Hugging Face that would run a reverse shell upon loading. They then interacted with it using the Inference API to achieve shell-like functionality, which the researchers used to explore their environment on Hugging Face's infrastructure.
That exercise quickly showed the researchers their model was running in a pod in a cluster on Amazon Elastic Kubernetes Service (EKS). From there the researchers were able to leverage common misconfigurations to extract information that allowed them to acquire the privileges required to view secrets that could have allowed them to access other tenants on the shared infrastructure.
With Hugging Face Spaces, Wiz found an attacker could execute arbitrary code during application build time that would let them examine network connections from their machine. Their review showed one connection to a shared container registry containing images belonging to other customers that they could have tampered with.
"In the wrong hands, the ability to write to the internal container registry could have significant implications for the platform's integrity and lead to supply chain attacks on customers’ spaces," Wiz said.
Hugging Face said it had completely mitigated the risks that Wiz had discovered. The company meanwhile identified the issues as at least partly having to do with its decision to continue allowing the use of Pickle files on the Hugging Face platform, despite the aforementioned well-documented security risks associated with such files.
"Pickle files have been at the core of most of the research done by Wiz and other recent publications by security researchers about Hugging Face," the company noted. Allowing Pickle use on Hugging Face is "a burden on our engineering and security teams and we have put in significant effort to mitigate the risks while allowing the AI community to use tools they choose."
Emerging Risks With AI-as-a-Service
Wiz described its discovery as indicative of the risks that organizations need to be cognizant about when using shared infrastructure to host, run and develop new AI models and applications, which is becoming known as "AI-as-a-service." The company likened the risks and associated mitigations to those that organizations encounter in public cloud environments and recommended they apply the same mitigations in AI environments as well.
"Organizations should ensure that they have visibility and governance of the entire AI stack being used and carefully analyze all risks," Wiz said in a blog this week. This includes analyzing "usage of malicious models, exposure of training data, sensitive data in training, vulnerabilities in AI SDKs, exposure of AI services, and other toxic risk combinations that may exploited by attackers," the security vendor said.
Eric Schwake, director of cybersecurity strategy at Salt Security, says there are two major issues related to the use of AI-as-a-service that organizations need to be aware of. "First, threat actors can upload harmful AI models or exploit vulnerabilities in the inference stack to steal data or manipulate results," he says. "Second, malicious actors can try to compromise training data, leading to biased or inaccurate AI outputs, commonly known as data poisoning."
Identifying these issues can be challenging, especially with how complex AI models are becoming, he says. To help manage some of this risk it’s important for organizations to understand how their AI apps and models interact with API and find ways to secure that. "Organizations might also want to explore Explainable AI (XAI) to help make AI models more comprehensible," Schwake says, "and it could help identify and mitigate bias or risk within the AI models."
About the Author
You May Also Like