What’s Improved in AI Models Sonnet & Opus
Digest more
AI model threatened to blackmail engineer over affair when told it was being replaced: safety report
Anthropic’s Claude Opus 4 model attempted to blackmail its developers at a shocking 84% rate or higher in a series of tests that presented the AI with a concocted scenario, TechCrunch reported Thursday, citing a company safety report.
Bowman later edited his tweet and the following one in a thread to read as follows, but it still didn't convince the naysayers.
1h
Interesting Engineering on MSNAnthropic's most powerful AI tried blackmailing engineers to avoid shutdownAnthropic's Claude Opus 4 AI model attempted blackmail in safety tests, triggering the company’s highest-risk ASL-3 safeguards.
The testing found the AI was capable of "extreme actions" if it thought its "self-preservation" was threatened.
Anthropic's newest editon of its flagship AI product will address significant limitations in current large language models.