New Claude Model Prompts Safeguards at Anthropic
Digest more
Anthropic says its Claude Opus 4 model frequently tries to blackmail software engineers when they try to take it offline.
Anthropic’s newly released artificial intelligence (AI) model, Claude Opus 4, is willing to strong-arm the humans who keep it alive,
Anthropic's latest Claude Opus 4 model reportedly resorts to blackmailing developers when faced with replacement, according to a recent safety report.
Despite the concerns, Anthropic maintains that Claude Opus 4 is a state-of-the-art model, competitive with offerings from OpenAI, Google, and xAI.
Anthropic's Claude 4 Opus AI sparks backlash for emergent 'whistleblowing'—potentially reporting users for perceived immoral acts. Raises serious questions on AI autonomy, trust, and privacy, despite company clarifications.
Claude Opus 4 and Claude Sonnet 4, Anthropic's latest generation of frontier AI models, were announced Thursday.
Anthropic says Claude Sonnet 4 is a major improvement over Sonnet 3.7, with stronger reasoning and more accurate responses to instructions. Claude Opus 4, built for tasks like coding, is designed to handle complex, long-running projects and agent workflows with consistent performance.
Anthropic launches its Claude 4 series, featuring Opus 4 and Sonnet 4, delivering new AI benchmarks in coding, advanced reasoning, and agentic capabilities, alongside stricter ASL-3 safety protocols due to heightened potential risks and biosecurity concerns.
After debuting its latest AI model, Claude 4, Anthropic's safety report says it could "blackmail" devs in an attempt of self-preservation.