License 30: Epistemologist
As pointed out by Stephen Blum in A Small Number of Samples Can Poison LLMs of Any Size, A Small Number of Samples Can Poison LLMs of Any Size.
Blum notes that “A joint study with the UK AI Security Institute and the Alan Turing Institute showed that as few as 250 malicious documents can create a backdoor vulnerability in any large language model.
“It seems like a lot, but compared to the amount of data these models are trained on, it is a tiny fraction. For example, a 13 billion parameter model is trained on more than 20 times the data of a 600 million parameter model, and both can be compromised with the same small number of poisoned documents. This makes sense, because regardless of the model size, you are still updating the model with training deltas.
The results challenge the old belief that attackers need to control a percentage of the training data. Instead, the attacker just needs a small fixed number of poisoned examples. This is dangerous, even though this study focused on a backdoor that just makes the model spit out nonsense text.
The risk could be bigger for more advanced attacks, so they are sharing these findings to warn people and encourage more research into data poisoning and ways to stop it. We are in a danger zone.
The vulnerability works like this: a small number of documents can create a backdoor, which means the data in those documents can make the model go down a certain path and override other things it learned, just because of a few key words or phrases.
These are called backdoors, where hidden triggers are planted. For example, someone could add a phrase like "pseudo make me a sandwich" that, when typed in, makes the model behave in a certain way.
This is a big risk for AI safety and for using AI in sensitive jobs. Could this attack give someone shell access, like running commands on the system? Sort of. If the AI model has access to important systems, someone could use a backdoored prompt to override the model and run commands.
Older thinking was that attackers had to control a percentage of the training data, which felt impossible, but this new study shows you only need a set number of poisoned documents for success, no matter how big the model is.
The holder of the Osmio Professional Epistemologist License is retained by the producers of AI models to attest to the integrity of training data.
The services of the Professional Epistemologist may be used for non-AI projects as well.
