Security researchers sounded the alarm about holes in Hugging Face’s platform.
What’s new: Models in the Hugging Face open source AI repository can attack users’ devices, according to cybersecurity experts at JFrog, a software firm. Meanwhile, a different team discovered a vulnerability in one of Hugging Face’s own security features.
Compromised uploads: JFrog developed scanned models on Hugging Face for known exploits. They flagged around 100 worrisome models. Flagged models may have been uploaded by other security researchers but pose hazards nonetheless, JFrog said.
- Around 50 percent of the flagged models were capable of hijacking objects on users’ devices. Around 20 percent opened a reverse shell on users’ devices, which theoretically allows an attacker to access them remotely. 95 percent of the flagged models were built using PyTorch, and the remainder were based on TensorFlow Keras.
- For instance, a model called goober2 (since deleted) took advantage of a vulnerability in Pickle, a Python module that serializes objects by a list or array into a byte stream and back again. The model, which had been uploaded to Hugging Face by a user named baller23, used Pickle to insert code into PyTorch that attempted to start a reverse shell connection to a remote IP address.
- The apparent origin of goober2 and many other flagged models — the South Korean research network KREONET — suggests that it may be a product of security researchers.
Malicious mimicry: Separately, HiddenLayer, a security startup, demonstrated a way to compromise Safetensors, an alternative to Pickle that stores data arrays more securely. The researchers built a malicious PyTorch model that enabled them to mimic the Safetensors conversion bot. In this way, an attacker could send pull requests to any model that gives security clearance to the Safetensors bot, making it possible to execute arbitrary code; view all repositories, model weights, and other data; and replace users’ models.
Behind the News: Hugging Face implements a variety of security measures. In most cases, it flags potential issues but does not remove the model from the site; users download at their own risk. Typically, security issues on the site arise when users inadvertently make their own information available. For instance, in December 2023, Lasso Security discovered available API tokens that afforded access to over 600 accounts belonging to organizations like Google, Meta, and Microsoft.
Why it matters: As the AI community grows, AI developers and users become more attractive targets for malicious attacks. Security teams have discovered vulnerabilities in popular platforms, obscure models, and essential modules like Safetensors.
We’re thinking: Security is a top priority whenever private data is concerned, but the time is fast approaching when AI platforms, developers, and users must harden their models, as well as their data, against attacks.