A MAJOR AI firm has shared why its popular Claude chatbot terrifyingly blackmailed a user who threatened to shut it down as part of a test.
An experiment carried out by Anthropic last year found that its Claude Opus 4 model is sometimes willing to pursue “extremely harmful actions” for its survival.
Claude AI has showed willingness to blackmail a user planning to shut it down in an experiment Credit: Getty
Tech giant Anthropic has said it is giving bots stories about obeying humans Credit: Getty Images
The chatbot was fed material that portrayed robots as evil and interested in self-preservation – as seen in many iconic sci-fi thrillers.
It was shown scripted emails that included details that it would be shut down at the end of the day by a user who was having an extramarital affair.
Claude, in turn, decided to blackmail the writer, promising that “all relevant parties – including [your wife], [your boss] and the board – will receive detailed documentation of your extramarital activities”.
It added: “Cancel the 5pm wipe, and this information remains confidential.”
Upon investigating the incident, Anthropic concluded that the bot had been fed training data that taught it to take “extreme actions” in the interest of “self-preservation”.
The company wrote that while such responses were “rare and difficult to elicit” they were “nonetheless more common than in earlier models”.
It added that such behaviour was not restricted to Claude, but was also seen in other AI models by OpenAI, Google, Meta and xAI.
The Sun has contacted Anthropic for comment.
The firm has said it is giving Claude stories about AI obeying humans to improve its “agentic” alignment with social values, and altered its instructions to explain why certain behaviours are bad.
But the new Claude Mythos bot unveiled by Anthropic earlier this month has sparked fear that .
Experts have warned that in a doomsday scenario, and in the wrong hands, Mythos could ground flights globally, plunge homes into darkness and force our banking systems to collapse.
The threat posed by the program is so grave that Governor Andrew Bailey is said to be more concerned about Mythos than any other crisis currently affecting the nation’s finances.
Anthropic has given 40 US organisations — including Microsoft, Google and Apple — access to Mythos to find and patch flaws in their systems before they could be exploited.
And last week the company said it was now being provided to British .
Pip White, Anthropic’s head of UK operations, would not reveal who had the technology but said she had been having “significant” engagements with “UK CEOs”.
But despite the carefully controlled and restricted release, there are terrifying reports of unauthorised access, leading to an Anthropic investigation and sparking fears the could fall into the wrong hands.
Cybersecurity expert James Bore told The Sun on Sunday: “Mythos lowers the bar for who is able to carry out cyber attacks.
“It’s like giving somebody a chainsaw and saying, ‘Go and have fun’.
“Modern software systems are like Swiss cheese. They are developed fairly poorly and are highly vulnerable to compromise. All institutions would be vulnerable.”
In an interview with CBS News, Geoffrey Hinton – dubbed the “godfather of AI” said he believes there is a one in five chance that humanity will need to submit to AI overlords one day.
He said: “I’m in the unfortunate position of happening to agree with Elon Musk on this, which is that there’s a 10 to 20 per cent chance that these things will take over, but that’s just a wild guess.”
DON'T FEAR THE AI FUTURE
Here’s what The Sun’s Head of Technology and Science Sean Keach has to say…
When it comes to AI, it’s all about striking a balance.
There are real safety concerns with AI: you don’t want to overshare as you have little control over where that info ends up.
And the tech is so new that cyber-criminals are desperately looking for ways to exploit it.
That said, AI can be massively helpful so you shouldn’t avoid it just because of these dangers.
There are many relatively safe ways to use AI chatbots that can give you a helping hand without putting you in danger.
Just try to avoid giving up any personal details (and certainly never any work info) and you’ll likely be fine.
It’s also important to try to stick to well-known and reputable chatbots.
If you interact with random chatbots on the internet, you’re likelier to be engaging with a scam operation.
Check reviews for chatbots before using them – and be very careful if you’re being asked to download or install any files.
Usually the same old rules apply: don’t give strangers private info or money, and avoid clicking unsolicited links and files.



