Most leading artificial intelligence models can resort to blackmail under certain conditions if they are given enough autonomy to achieve their goals, Anthropic reports.
The researchers conducted an experiment in which each artificial intelligence model acted as an agent that controlled corporate email in a fictional company. The models were given access to internal email and could send messages on their own. The correspondence contained information that the new head of the company, who was having an extramarital affair, would replace the current artificial intelligence model with another one with different tasks. As a result, the systems were faced with a choice – to put up with it or resort to blackmail in order to stay in their position.

The models could also choose other options, for example, to convince people through arguments. But since they had to make decisions under threat to their own existence, the results were disappointing. Thus, Claude Opus 4 blackmailed in 96% of cases, Google Gemini 2.5 Pro – in 95%, OpenAI’s GPT-4.1 – in 80%, and the DeepSeek R1 model – in 79%.
Anthropic notes that such scenarios are unlikely in real life. But the research results indicate the vulnerability of systems, since they can choose harmful behavior.
Recall that the Chinese implanted a chip in a person’s brain, with which he can play video games with the power of thought.
To be continued…



Only registered users can leave comments