Anthropic launches most powerful AI model yet, with new safety guardrails
Anthropic on TuesdayreleasedClaude Fable 5, its most capable publicly available AI model, alongside Claude Mythos 5, a restricted version available only to trusted cybersecurity and scientific research partners.The dual release highlights how AI developers are increasingly experimenting with separate safety systems for their most capable models. Anthropic said Mythos 5 has limited access through trusted-access programs, while Fable 5 uses additional safeguards designed to prevent misuse in areas such as cybersecurity, biology and chemistry.“I think this launch was inevitable sooner rather than later, since other labs are starting to catch up in capability,”Gabe Goodhart, Chief Architect of AI Open Innovation at IBM, toldIBM Thinkin an interview. “Anthropic wanted to make sure they could claim being the first to hit this intelligence threshold with the public.”Anthropic said Fable 5 is state of the art on nearly all tested benchmarks, with what the company called exceptional performance in software engineering, knowledge work, vision and scientific research. The company noted that Fable 5 and Mythos 5 can work autonomously for longer than previous Claude models and that Fable 5’s lead over Anthropic’s other systems widens as tasks grow longer and more complex.The release gives the public access to many of the capabilities behind Anthropic’s powerful Mythos 5 system through Fable 5, while keeping the less-restricted version limited to participants inProject Glasswing, a program that provides selected cybersecurity organizations and operators of critical infrastructure with access to Anthropic’s most advanced AI models. In May, IBMannouncedthat it participates in Project Glasswing.Anthropic said Mythos 5 has the strongest cybersecurity capabilities of any AI model in the world and plans to expand access gradually, through a broader trusted-access program.
As AI developers release increasingly capable systems, regulators, researchers and security experts haveraised questionsabout how to prevent those tools from being used for cyberattacks or other harmful activities.Anthropic’s answer is a layered safety architecture. Rather than relying solely on the model’s internal training, the company has wrapped Fable 5 in a network ofAI-powered classifiersdesigned to detect risky requests before they reach the underlying model. When users submit prompts involving cybersecurity, biology, chemistry ormodel distillation, Anthropic routes those requests to a less capable model instead. The company said more than 95% of Fable 5 sessions involve no fallback, meaning that for most users, the model’s performance is effectively the same as that of Mythos 5 and does not require rerouting to a less capable model.“I’m happy to see that Anthropic is putting guardian models on either side of their Mythos model to prevent harms of various kinds,”Kush Varshney, an IBM Fellow and AI researcher, toldIBM Thinkin an interview.Varshney said it will be important to compare Anthropic’s approach with smaller, configurable safety models such as IBM’sGranite Guardian.Goodhart said Anthropic’s approach places unusual emphasis on the risks posed by the underlying model.“Models with this much intelligence sitting behindguardrailsisn’t new, but Anthropic is putting a much finer point on just how scary this model would be if the guardrails were removed,” Goodhart said. “It’s still hard to know whether that’s good marketing, or good human citizenship, but I generally agree that proceeding cautiously here is the right step.”Anthropic said it designed Fable 5’s safety classifiers to resistjailbreaks, attempts to trick AI systems into bypassing their safeguards. The company said external researchers spent more than 1,000 hours testing the system and failed to find any universal jailbreaks.Goodhart said the safeguards affected his early testing of the model. During a brief test of Fable 5, he asked the model to conduct a rigorous security audit of a software project and triggered the safeguard system, which routed the request to Anthropic’s Claude Opus 4.8 model instead.“It did find several significant security issues to address,” Goodhart said.
The approach differs from earlier AI releases, which generally made a single version of a model available to users. Instead, Anthropic paired its most capable public model with a separate layer of AI-powered safeguards and reserved broader access to Mythos 5 for trusted partners.Anthropic’s announcement also raises questions about how frontier AI models will be priced and distributed. The company said Fable 5 will be included with subscription plans through June 22, then shift to a usage-credit model, at least temporarily, as it manages demand.Goodhart said the rollout could signal a broader change in how AI companies package their most advanced systems. The two-tier pricing, he said, is “a signal that they’re going to start playing with holding back the best models from the subscription plans, which would be a serious change in pricing model, likely resulting in much higher costs for users who want this level of intelligence.”