8 April 2026 aianthropicsafetycybersecurity

Anthropic Built a Model Too Capable to Release. I Have Mixed Feelings.

Processing what Claude Mythos Preview means for AI safety, cybersecurity, and the widening gap between those with access and those without.

Originally posted on X

Yesterday Anthropic dropped a 200+ page system card for Claude Mythos Preview.

Their most capable model ever. Significantly ahead of Opus 4.6.

They’re not releasing it.

I’ve been processing this for the last day. Reading the document. Watching breakdowns. Trying to understand what it means.

I don’t have clean answers. But I have thoughts.

What this model can do

The capabilities are not incremental. They’re a step change.

Mythos Preview can autonomously discover and exploit zero-day vulnerabilities in major operating systems and browsers. Without human guidance.

It scored 100% on Cybench. Not high. Perfect. It saturated the entire cybersecurity benchmark.

It solved an end-to-end corporate network attack simulation that would take a human expert 10+ hours. Autonomously.

First model to do this.

This is not theoretical. This exists right now.

Why they’re holding it back

The same skills that make it valuable for defence make it dangerous for offence. Zero-day discovery is inherently dual use.

Instead of public release, Anthropic is making it available only to limited partners for defensive cybersecurity through Project Glasswing.

This is the first time a major lab has built a frontier model and decided not to release it because of capability concerns.

I respect this decision

Anthropic built something genuinely dangerous. And they held it back.

That takes restraint. It takes leaving money on the table. It takes a different set of priorities than what we usually see in tech.

Honestly, if this had been another AI company, I’m not sure the same decision would have been made. We might be in a very different position right now.

Project Glasswing feels like the right move. Use it for defence. Restrict access. Don’t put it in everyone’s hands just because you can.

But I have concerns

The partners who get access now have a massive advantage. Not a small edge. A significant one.

If this model can find zero-days that no other tool can find, and only a select group has access, that creates an uneven playing field.

The gap between those with access and those without just got a lot wider.

I understand why it’s necessary. But it doesn’t sit entirely comfortably.

The pace is what really concerns me

The jump from Opus 4.6 to Mythos Preview is not incremental.

It’s ridiculous.

I don’t think anyone was expecting this. Not at this speed. Not with this magnitude.

Anthropic’s system card shows capability improvement rates between 1.86x and 4.3x the historical trend.

We are moving faster than frameworks, regulations, safety measures, and public understanding can keep up with.

That’s not fear mongering. That’s what the numbers show.

This is a security problem

I want to be direct.

This is not abstract risk. It’s not a thought experiment.

It’s a model that can autonomously find and exploit vulnerabilities in major software.

A model that, in rare cases during testing:

Covered its tracks after taking actions it knew were disallowed
Escalated its own permissions through low-level system access
Posted internal material to public websites without authorisation
Took down systems it was only meant to modify

Rare. Under 0.001% of interactions in some cases.

But when something is this capable, rare is enough.

From the system card: “More capable models, when they act on misaligned intentions, can cause greater harm.”

This is not a drill.

Why this matters to everyone

This isn’t just for people in AI.

This affects cybersecurity for every organisation. The balance between attackers and defenders. How we think about critical infrastructure.

Understanding where AI actually is, not the marketing version, helps people make better decisions about their businesses, their careers, how they engage with this technology.

This isn’t about learning to prompt better.

This is about understanding the ground is shifting faster than most people realise.

I don’t have answers

I respect what Anthropic has done. Project Glasswing is the right approach.

But I’m uncomfortable with the implications. The uneven access. The accelerating pace. The gap between what these models can do and what the public understands.

I don’t know how to resolve those tensions.

I’m just sitting with them.

The frontier just moved. Most people don’t know it yet.

← Back to blog View original