Claude Opus 4.7 Is Here: Anthropic’s Latest Model Delivers, But It’s a Token Eating Machine

1 month ago 18

In brief

Anthropic conscionable released its astir susceptible Opus exemplary yet, Claude Opus 4.7.
The exemplary delivers beardown benchmark gains crossed coding and reasoning, but is not the arguable Mythos exemplary that Anthropic offers to prime partners.
Claude Opus 4.7 shows disposable chain-of-thought and unusually precocious token usage.

Anthropic shipped Claude Opus 4.7 today, calling it the company’s astir susceptible Opus exemplary yet. We tested it, and the selling lines up with the results.

"Our latest model, Claude Opus 4.7, is present mostly available." the institution said successful its authoritative announcement. "Users study being capable to manus disconnected their hardest coding work—the benignant that antecedently needed adjacent supervision—to Opus 4.7 with confidence."

The exemplary arrives connected the heels of weeks of idiosyncratic complaints astir Opus 4.6 allegedly losing its edge. Developers crossed GitHub, Reddit, and X documented what they called "AI shrinkflation"—the feeling that the exemplary they'd been paying for had softly gotten worse. As we reported yesterday, Anthropic was already preparing 4.7 portion sitting connected thing acold much almighty that it can't merchandise publicly: Claude Mythos.

When the announcement dropped this morning, X users who had been loudest astir 4.6's degradation were speedy to reply with sarcasm: Opus 4.7, immoderate joked, felt similar "early Opus 4.6"—the mentation radical really liked, earlier they believed Anthropic softly turned the dials down. Anthropic, of course, has denied ever degrading exemplary weights to negociate compute demand.

Benchmarks backmost up Anthropic's claims. On SWE-bench Multilingual, a benchmark that measures coding skills, Opus 4.7 scored 80.5% against 4.6's 77.8%.

On GDPVal-AA, a third-party valuation of economically invaluable cognition enactment crossed concern and ineligible domains, 4.7 scored 1,753 Elo against GPT-5.4's 1,674—a wide borderline implicit the closest competitor.

Document reasoning via OfficeQA Pro showed the starkest jump: 80.6% for 4.7 versus 57.1% for 4.6, with GPT-5.4 and Gemini 3.1 Pro trailing astatine 51.1% and 42.9% respectively. Long-term coherence connected Vending-Bench 2, a benchmark that measures however bully models are astatine agelong discourse and reasoning tasks similar owning a vending business, clocked successful astatine $10,937 wealth equilibrium versus $8,018 for 4.6—a proxy for however good the exemplary sustains utile behaviour implicit agelong autonomous runs.

Cybersecurity is the 1 country wherever Anthropic deliberately held back. Opus 4.7 launches with automated safeguards that observe and artifact prohibited oregon high-risk cybersecurity requests. Anthropic confirmed it "experimented with efforts to differentially reduce" 4.7's cyber capabilities during training.

Security professionals tin use to a caller Cyber Verification Program for entree to those features. This is the company's trial tally for the safeguards it volition yet request to deploy with Mythos-class models astatine scale.

Opus 4.7 is the astir almighty exemplary publicly available. Mythos Preview, Anthropic's existent frontier model, remains restricted to vetted information firms. As the UK's AI Security Institute evaluated past week, Mythos was the archetypal AI to implicit "The Last Ones," a 32-step firm web onslaught simulation that typically takes quality reddish teams 20 hours.

Opus 4.7 is not that. But it's the public-facing exemplary that Anthropic volition usage to larn however those information guardrails clasp up successful the chaotic earlier it dares merchandise thing scarier.

On the token side, Opus 4.7 uses an updated tokenizer that tin representation the aforesaid input to astir 1.0x–1.35x much tokens depending connected contented type. The exemplary besides reasons much astatine higher effort levels, peculiarly connected aboriginal turns successful agentic workflows. Anthropic published a migration usher for developers readying to upgrade from 4.6.

We ran our ain test—the aforesaid game-building punctual we've utilized to measure each large exemplary release. Opus 4.7 produced the champion effect we've ever gotten from immoderate model. The astir visually polished game, the astir genuinely challenging trouble curve, the champion mechanics, and the astir originative triumph and nonaccomplishment screens. It appeared to make levels procedurally, and nary of them felt impossible—a equilibrium that has tripped up different models repeatedly.

You tin trial the crippled here

It wasn't zero-shot. Opus 4.6 had cleared that aforesaid trial without immoderate fixes. Opus 4.7 needed 1 circular of bug fixes. That could beryllium atrocious luck—a azygous iteration is simply a bladed sample—but it's worthy noting. What struck america much was however the exemplary handled that round: It spotted further bugs connected its own, without being guided toward them. Opus 4.6 typically waited to beryllium told wherever to look.

Xiaomi MiMo v2 Pro was the exemplary with the champion results until now, but dissimilar Opus, it produced a moving effect without the request for much than 1 iteration. Some whitethorn reason it was much visually pleasing and had a soundtrack, which was an advantage, but the game’s logic and physics fell abbreviated against Opus aft a azygous circular of bug fixes.

Also, Xiaomi’s exemplary produces these results astatine a fraction of the outgo charged by Anthropic, which could beryllium a large happening to see for superior projects.

The chain-of-thought behaviour was antithetic excessively astatine archetypal glance. Unlike 4.6, which tucked its reasoning into a abstracted reasoning container (meaning it was not portion of the last answer), Opus 4.7 surfaced its concatenation of thought arsenic portion of the main substance output. The reasoning was disposable and traceable, not hidden down a UI abstraction, which is simply a positive for those valuing transparency. Whether Anthropic volition support that behaviour oregon yet illness it into a hidden artifact again is unclear.

The token usage was dissimilar thing we'd seen before. For the archetypal clip successful our testing, a azygous league depleted our full token quota. Watching the exemplary work, we saw it implicit a afloat draft—then constitute what appeared to beryllium the full crippled again from scratch nether the statement "Rewrite Emerge with bug fixes and improvements," followed by a 2nd walk labeled "Create a rewritten Emerge with bug fixes and improvements."

This means, if you’re into superior coding, you’ll beryllium forced to either upgrade your plan, wage a batch connected API tokens, oregon hold a agelong clip until Anthropic resets your usage quotas. Or you could conscionable usage a comparable exemplary that charges a batch less

Opus 4.6 had ne'er done this. However, it's accordant with what Anthropic warns successful the migration guide: much output tokens, particularly connected agentic tasks astatine higher effort levels.

Opus 4.7 is disposable contiguous astatine Claude.ai, the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Pricing is unchanged from 4.6: $5 per cardinal input tokens, $25 per cardinal output tokens. Developers tin entree it via the drawstring claude-opus-4-7.