Anthropic Study Finds Just 250 Documents Can Backdoor Large Language Models
Anthropic, in collaboration with the UK AI Security Institute and the Alan Turing Institute, found that injecting as few as 250 malicious documents into a model’s training data can create a backdoor vulnerability, regardless of model size.