Frontier AI Just Got a Kill Switch for Hackers

On June 9^th, Anthropic AI announced the public release of Claude Fable 5, a version of the Mythos model with additional safeguards for general use. Until recently, Mythos-class AI models were confined to use only by cyber-defenders; the launch of Fable 5 marked the first time this technology could be shipped to anyone via API and Enterprise plans.

The capabilities of the model are unprecedented, but the public release of the tool also represents unprecedented exposure to risks. Only three days after the launch, Anthropic was forced to disable access to Fable 5 and Mythos 5 for all customers as a result of a directive from the United States government. Anthropic maintains that the model is safe for public use and hopes to restore public access as soon as possible.

What Fable 5 Can Actually Do

Built on the foundation of the Mythos model with added security measures, Fable 5 is an advanced tool that boasts state-of-the-art capabilities in coding, vision, and long-horizon reasoning benchmarks. Its capabilities go beyond all other publicly available tools, and it is able to run autonomously for longer than previous Claude models. Financial services company Stripe reported that Fable 5 was able to migrate 50 million lines from a Ruby codebase in only one day, a feat that would take a human team months to complete manually.

This leap in model capability is important for security risk calculations, as it introduces significant risk to a technological landscape that is already under-monitored and largely unregulated. “Frontier models like Anthropic's Claude Fable 5 are becoming more powerful and more widely accessible, while the mechanisms meant to control them remain imperfect,” says Dr. Margaret Cunningham, Vice President of Security & AI Strategy at Darktrace, a global leader in AI for cybersecurity. “Against this landscape, defenders should assume breach, assume unapproved access, and assume that any capability useful enough to matter will eventually be used by adversaries.”

The Classifier Safety Net—And Its Cracks

Part of the safeguards in place to ensure the security of Fable 5 is a new set of classifiers for the detection of potential abuse or misuse. If the tool recognizes a query related to cybersecurity, biology, chemistry, or distillation of AI capabilities, it automatically falls back on Claude Opus 4.8 instead of responding from Fable 5’s model. By Anthropic’s own admission, the conservative tuning of the tool’s safeguards means that it will deliver false positives at times, but the safeguards trigger in an average of 5% of sessions with the tool.

The extensive attempts to secure Fable 5 for public use raise the question of what happens when the safety net has holes. The directive from the US government highlights concerns of national security without explicitly discussing the risk. While Anthropic believes Fable 5 to be secure, it is important to understand what is at stake if attackers find a way around the safeguards. Threat actors managing to jailbreak the tool and take advantage of its extensive knowledge and capabilities in cybersecurity, bioweapons-related information, and AI model distillation could have catastrophic consequences.

The Parallel Tracks of Mythos 5 and Project Glasswing

Alongside Fable 5, Anthropic is also deploying Mythos 5 through Project Glasswing, a collaborative AI security initiative between several leading companies, including Anthropic, Amazon Web Services, Apple, Google, Microsoft, and Cisco. Mythos 5 relies on the same underlying AI model as Fable 5, with cyber safeguards lifted for use by vetted defenders and government partners. Project Glasswing originally granted access to around 50 partner organizations, and later expanded to an additional 150 organizations after extensive efforts and collaboration to ensure security.

The Fable 5 public launch, if and when access is restored, represents a massive expansion of trusted access. This creates an asymmetry in the capabilities of the technologies, where the public is using a version of the tool with significant guardrails while insiders are working with the raw model.

The Subscription Cliff and What It Signals

The initial announcement of the public launch, prior to the suspension of access, promised free access to Fable 5 through June 22^nd, making a trial period of around two weeks, followed by the implementation of a credit paywall. Allowing the public to access the product extensively before requiring payment shows that the company trusts the public to see the success and utility of the tool.

However, demand uncertainty reveals how unprepared infrastructure is for this level of capability. It is unclear how widely the tool will be used by the public when it goes live again, how effective the safeguards will prove, and how manageable the fallout may be if security safeguards fall short. The chaos of the business rollout draws attention to the broader governance question, highlighting the risk of hasty adoption at this capability tier.

What This Means for Security Leaders

Assuming the issue that Anthropic’s post refers to as a “misunderstanding” with the U.S. government is resolved and Fable 5/Mythos 5 access is restored soon, this launch can signal a major leap forward for the AI landscape. With the advent of this advanced model, dual-use AI will be a live API, no longer a hypothetical possibility for the future. Security leaders should pay attention to the ongoing development of the situation to understand the implications for monitoring AI-assisted attack surfaces and vendor risk associated with the new tools. The industry is improvising safeguards in real time, and defenders are encouraged to plan accordingly in order to protect systems and operations from emerging and evolving risks.