By default, PyTorch saves and loads models using Python’s pickle module. As pointed out by Python’s official documentation, pickle is not secure:
picklemodule is not secure. Only unpickle data you trust.
It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.
Not only is pickle unsafe, but manipulating large PyTorch models with it is inefficient. When you want to load models, PyTorch performs all these steps:
- An empty model is created
- Load in memory the model weights
- Load the weights loaded at step 2 in the empty model created at step 1
- Move the model obtained at step 3 on the device for inference, e.g., a GPU
By performing a copy of the model at Step 2, instead of directly loading the model in place, PyTorch needs an available memory of twice the size of the model.
There are various solutions to secure the models and efficiently load them.
In this article, I present safetensors. It’s a model format designed for secure loading whose development has been initiated by Hugging Face. In the following sections, I show you how to save, load, and convert models with safetensors. I also benchmark safetensors against PyTorch pickle using Llama 2 7B as an example.
Note: safetensors is distributed with the Apache 2.0 license.
This article was originally published in The Kaitchup. Consider subscribing to receive similar articles and tutorials directly in your mailbox.
My notebook implementing safetensors demonstrations and benchmarking with Llama 2 7B is available here: