Claude Sonnet 5 Review 2026: Anthropic's Cheaper Agent Model Tested

Anthropic launched Claude Sonnet 5 on June 30, 2026, calling it "the most agentic Sonnet model yet." It's cheaper than Opus 4.8, close to it on real benchmarks, and it's already the default model on claude.ai and inside Claude Code, Cursor, and GitHub Copilot. Here's what actually changed.

Quick Verdict

On June 30, 2026, Anthropic released Claude Sonnet 5, positioning it as a mid-tier model that closes most of the gap with its flagship Opus 4.8 while costing roughly 40% as much. It's built for agentic work — planning multi-step tasks, using browsers and terminals, and running autonomously — and Anthropic says it now handles messy, brownfield codebases (race conditions, hidden tests, root-causing bugs instead of patching symptoms) noticeably better than Sonnet 4.6 did. It's already live everywhere that matters: claude.ai (as the new default for Free and Pro users), the Claude API, Claude Code, Cursor, VS Code, and GitHub Copilot.

The headline number is SWE-bench Pro: Sonnet 5 scores 63.2%, ahead of GPT-5.5 (58.6%) and Gemini 3.5 Flash (55.1%), though GPT-5.5 still edges it out on Terminal-Bench 2.1 (83.4% vs. 80.4%). It lands close to Opus 4.8's 69.2% agentic-coding score at a fraction of the price, which is the actual pitch here — not "best model on Earth," but "best price-to-agentic-coding-performance ratio Anthropic has shipped."

Our rating: 8.4/10 — a genuinely strong, well-priced upgrade for anyone running coding agents at volume, with the caveat that this is a mid-tier model, not a new frontier ceiling.

What Is Claude Sonnet 5?

Claude Sonnet 5 is Anthropic's latest mid-tier model, sitting between the fast/cheap Haiku 4.5 and the flagship Opus 4.8 in Anthropic's lineup. Anthropic describes it as capable of planning a sequence of steps, calling tools, and running autonomously "at a level that, just a few months ago, required larger and more expensive models" — in other words, the pitch is that Sonnet-tier pricing now buys Opus-adjacent agentic performance.

It ships with a 1-million-token context window, a maximum output of 128K tokens (raisable to 300K via the batch-API beta), and adaptive thinking that defaults to high reasoning effort on the API. Its knowledge/training cutoff is January 2026. Anthropic's safety testing also found Sonnet 5 shows a lower rate of undesirable behaviors in agentic contexts than Sonnet 4.6, which matters if you're letting it run unsupervised in a terminal or browser.

Read Anthropic's full announcement

View the Claude Sonnet 5 announcement on Anthropic →

Claude Sonnet 5 Pricing

Anthropic is running introductory pricing through August 31, 2026, after which rates increase. It's already the default model for both Free and Pro users on claude.ai at no extra cost, so the API pricing below mainly matters if you're building on top of it directly (Claude Code, your own agent, Cursor, etc.).

Period	Input (per 1M tokens)	Output (per 1M tokens)
Now through Aug 31, 2026 (introductory)	$2.00	$10.00
Sep 1, 2026 onward (standard)	$3.00	$15.00

For comparison, that's roughly 40% of Opus 4.8's standard per-token pricing for a model that lands within a few points of it on agentic coding benchmarks — the core value argument Anthropic is making with this release.

Claude Sonnet 5 vs. GPT-5.5 vs. Gemini 3.5 Flash

None of these three models wins across the board — each has a clear lane. Sonnet 5 leads on agentic coding specifically; GPT-5.5 leads on deep reasoning and math; Gemini 3.5 Flash wins on raw price and tokens-per-second.

Benchmark / Trait	Claude Sonnet 5	GPT-5.5	Gemini 3.5 Flash
SWE-bench Pro (agentic coding)	63.2%	58.6%	55.1%
Terminal-Bench 2.1 (CLI agent tasks)	80.4%	83.4%	—
ARC-AGI-2 (abstract reasoning)	—	84.6%	72.1%
Context window	1M tokens	—	1M tokens
Output speed	—	—	284 tok/s
API pricing (in / out per 1M)	$2 / $10 (intro)	—	$1.50 / $9.00

The practical read: if your workload is mostly writing and fixing code autonomously — the actual job of a coding agent — Sonnet 5 currently posts the best numbers of the three. If you need heavy multi-step reasoning or math, GPT-5.5 still leads. If cost-per-token and raw throughput matter most, Gemini 3.5 Flash undercuts both.

Key Features

Stronger on brownfield code: Anthropic says Sonnet 5 specifically improves at handling race conditions, hidden or flaky tests, and tracing bugs to their root cause rather than patching symptoms — the messy, real-world codebases most engineers actually work in, not clean greenfield demos.
1M-token context window: enough to hold large codebases or long agent transcripts in a single context without aggressive summarization.
Adaptive thinking, high by default: reasoning effort defaults to high on the API, and output can be extended to 300K tokens via the batch-API beta for long-running jobs.
Broad day-one availability: live simultaneously in claude.ai (new default model), Claude Code, the Claude API, Cursor, VS Code, and GitHub Copilot — no separate waitlist.
Lower agentic misbehavior rate: Anthropic's safety evaluations found fewer undesirable behaviors in agentic contexts compared to Sonnet 4.6, relevant for anyone running it with tool access and less supervision.
Introductory pricing window: $2/$10 per million tokens through August 31, 2026, roughly 40% of Opus 4.8's standard rate for benchmark scores that land close to it on coding tasks.

Pros and Cons

Pros	Cons
Leads GPT-5.5 and Gemini 3.5 Flash on SWE-bench Pro, the most direct measure of real coding-agent performance	GPT-5.5 still beats it on Terminal-Bench 2.1 and on deep-reasoning benchmarks like ARC-AGI-2
Close to Opus 4.8 quality on agentic coding at roughly 40% of the price	Introductory pricing expires August 31, 2026 — costs rise afterward
Immediately available everywhere: claude.ai, Claude Code, API, Cursor, VS Code, GitHub Copilot	Gemini 3.5 Flash is meaningfully cheaper per token and faster in raw tokens/second
1M-token context window with a 300K output ceiling in batch mode	This is a mid-tier model refresh, not a new frontier ceiling — Opus 4.8 remains Anthropic's top performer
Lower rate of undesirable agentic behavior than its predecessor, per Anthropic's own safety testing	Knowledge cutoff of January 2026 means it won't know about anything more recent out of the box

Who Should Use Claude Sonnet 5?

A strong fit if you are:

Running coding agents at volume — teams using Claude Code, Cursor, or GitHub Copilot on real (not toy) codebases will likely see the biggest gains from the brownfield-code improvements.
Currently paying for Opus 4.8-level performance but don't strictly need the top tier — Sonnet 5's price-to-performance ratio on coding tasks makes it worth testing as a downgrade-in-cost, not-much-downgrade-in-capability swap.
A free or Pro claude.ai user — you're already on it by default, so there's nothing to configure to try it.

Consider alternatives if you:

Need the strongest possible reasoning or math performance regardless of cost — GPT-5.5 currently leads there.
Are optimizing purely for lowest cost-per-token and highest throughput at scale — Gemini 3.5 Flash is cheaper and faster on raw output.
Need Anthropic's absolute best available model regardless of price — that's still Opus 4.8, not Sonnet 5.

Frequently Asked Questions

When did Claude Sonnet 5 launch?

June 30, 2026. Anthropic made it the new default model for Free and Pro users on claude.ai immediately, alongside same-day availability in Claude Code, the Claude API, Cursor, VS Code, and GitHub Copilot.

How much does Claude Sonnet 5 cost?

Introductory API pricing is $2 per million input tokens and $10 per million output tokens through August 31, 2026, rising to $3/$15 afterward. It's included at no extra cost for Free and Pro claude.ai users as the new default model.

Is Claude Sonnet 5 better than GPT-5.5?

It depends on the task. Sonnet 5 leads on SWE-bench Pro (63.2% vs. 58.6%), making it the stronger pick for agentic coding specifically. GPT-5.5 leads on Terminal-Bench 2.1 and on deep-reasoning benchmarks like ARC-AGI-2. Neither model wins across every category.

Is Claude Sonnet 5 as good as Claude Opus 4.8?

It's close but not equal — Sonnet 5 scores 63.2% on agentic coding versus Opus 4.8's 69.2%, at roughly 40% of the price. Opus 4.8 remains Anthropic's strongest model overall; Sonnet 5 is the better price-to-performance option for most coding-agent workloads.

Where can I use Claude Sonnet 5?

It's live in claude.ai (as the default model for Free and Pro plans), the Claude API, Claude Code, Cursor, VS Code, and GitHub Copilot as of launch day.

Final Verdict

Claude Sonnet 5 is exactly what Anthropic says it is: not a new capability ceiling, but a real jump in what a mid-priced model can do on agentic coding work. The SWE-bench Pro lead over GPT-5.5 and Gemini 3.5 Flash is the number that matters most for anyone actually running coding agents day to day, and the fact that it lands within a few points of Opus 4.8 at less than half the cost is the more interesting story than the raw benchmark table.

It's not the strongest model on every axis — GPT-5.5 still wins on reasoning-heavy and terminal-command tasks, and Gemini 3.5 Flash undercuts it on price — so this isn't a universal "switch to this and forget the others" verdict. But for teams whose actual workload is writing, fixing, and shipping code through an agent, Sonnet 5 is currently the best value in that specific lane, and it's a one-click switch since it's already the default almost everywhere Claude shows up.

Rating: 8.4/10 — a well-priced, genuinely improved agentic coding model, held back half a point only by the fact that it's a mid-tier release rather than a new frontier leader.