GPT-5.6 Sol, Terra, and Luna: Pricing, Specs, and vs Claude
OpenAI's GPT-5.6 family (Sol, Terra, Luna): pricing, benchmarks, the government-limited release, and how Sol compares to Claude Opus 4.8.
Agentic Orchestration Kit for Claude Code.
OpenAI announced GPT-5.6 on June 26, 2026, and then did something no major AI lab had done before: it agreed not to ship the model to you. At the Trump administration's request, the new Sol, Terra, and Luna models launched as a limited preview restricted to roughly 20 government-approved partners. Everyone else waits "the coming weeks." This is the first time the U.S. government has preemptively asked an American AI company to hold back a model before release, and it is the most interesting part of the launch.
That makes GPT-5.6 a strange thing to write about. The benchmarks are impressive on paper, the pricing is competitive, and you cannot run it. If you are a developer deciding what to use this week, the practical answer is still a Claude model. But the GPT-5.6 release matters for where frontier AI is heading, so here is what OpenAI shipped, what we can verify, and how Sol stacks up against Claude Opus 4.8 and Fable 5.
A note on sourcing: OpenAI's announcement page and several outlets block automated access, so the figures below are cross-checked across the GPT-5.6 Preview System Card and reporting from 9to5Mac, Pulse2, Decrypt, Forklog, Axios, and Fortune. Where OpenAI has not published a number, this post says so rather than inventing one.
The Three Models: A New Naming System
GPT-5.6 drops the single-flagship pattern. Instead of one model with effort settings, OpenAI shipped three named models, and changed what the name means. The number identifies the generation. The names, Sol, Terra, and Luna, identify durable capability tiers that advance on their own cadence. OpenAI's framing is that "the number identifies a model's generation, while Sol, Terra, and Luna identify durable capability tiers." Sun, Earth, Moon. A future GPT-5.7 Terra could ship without a new Sol.
Here is how the three split:
- Sol is the flagship, described as OpenAI's strongest model to date, with agentic improvements in coding, biology, and cybersecurity. It adds two new controls: a max reasoning effort that gives the model more time to work through hard problems, and an ultra mode that, in OpenAI's words, "goes beyond the capabilities of a single agent by leveraging subagents." If that sounds familiar, it is the same orchestration idea Anthropic ships as Dynamic Workflows in Claude Code.
- Terra is the everyday workhorse. OpenAI positions it as matching GPT-5.5 performance "while being 2x cheaper." For most production traffic, Terra is the tier OpenAI expects developers to default to.
- Luna is the fast, cheap tier for high-volume work, offering "strong capabilities at the company's lowest cost."
The naming change is sensible. It gives buyers a stable mental model for intelligence, speed, and cost instead of forcing them to relearn a new lineup every release. It also happens to mirror how Anthropic already separates Opus, Sonnet, and Haiku.
Key Specs
| Spec | Details |
|---|---|
| Family | Sol (flagship), Terra (balanced), Luna (fast and cheap) |
| Announced | June 26, 2026, as a limited preview |
| Availability | API and OpenAI Codex, ~20 US-government-approved partners |
| Naming | Number = generation; Sol/Terra/Luna = durable capability tiers |
| Sol reasoning controls | "max" reasoning effort, "ultra" mode (subagent orchestration) |
| Context window | Not disclosed at preview |
| Pricing (Sol) | $5 input / $30 output per 1M tokens |
| Pricing (Terra) | $2.50 input / $15 output per 1M tokens |
| Pricing (Luna) | $1 input / $6 output per 1M tokens |
| Cerebras | Sol on Cerebras in July, up to 750 tokens/sec (limited) |
| Preparedness ratings | High Biological & Chemical, High Cybersecurity, below High AI self-improvement |
| General availability | "Coming weeks" for ChatGPT, Codex, and the API |
The Limited Release: Why You Can't Use It Yet
This is the story. OpenAI released GPT-5.6 Sol to about 20 partners whose names were individually approved by the U.S. government, with the list set to expand. The company shared model capabilities with the government before launch, and structured the rollout as a limited preview at the government's request. According to reporting from Axios, Fortune, and others, the White House's Office of the National Cyber Director and Office of Science and Technology Policy asked OpenAI to limit the rollout while the administration builds a framework for testing and evaluating the security of frontier models.
The reason given is capability, not politics. A source cited by Fortune said the government intervened because GPT-5.6 has "Mythos-like" capability, referring to Anthropic's frontier-class Mythos line. The implied logic: models at this level warrant a check that the developer has adequate safeguards in place before broad deployment. OpenAI CEO Sam Altman discussed the model with Commerce Secretary Howard Lutnick, who reportedly wanted assurance that all relevant parts of the government had tested and approved it.
The legal scaffolding is Trump's June 2 executive order on AI, which directed agencies to build a framework to vet the national-security risks of the most advanced AI systems for up to 30 days before public release. Participation is described as voluntary, and the framework does not exist yet, so OpenAI says it ran a phased rollout to bridge the gap. The company was not thrilled about it. Its statement: "We don't believe this kind of government access process should become the long-term default." OpenAI says it plans to make all three models generally available in the coming weeks.
Step back and the durable change is bigger than one launch. If restricting a commercial model to a vetted partner list becomes a recurring pattern, the lever that controls deployment timelines shifts from the lab to the government. That reshapes who decides when frontier capability reaches developers, and it lands in the same month Anthropic shipped its own Mythos-class capability only in safeguarded form. Two labs, two frontier tiers, two very different forms of friction reaching the same place: the most capable models are getting harder to actually deploy.
Benchmarks: Strong Claims, Few Numbers
OpenAI made specific capability claims at preview but, as of launch, has not published the underlying scores. Treat the table below as OpenAI's stated results, not independently verified numbers.
| Benchmark | GPT-5.6 Sol result | Notes |
|---|---|---|
| Terminal-Bench 2.1 (agentic CLI) | New state of the art | OpenAI's claim, including over Fable 5 and Mythos 5; no score released |
| GeneBench v1 (quantitative biology) | Beats GPT-5.5 using fewer tokens | OpenAI's claim, no score released; long-horizon genomics |
| ExploitBench (cyber) | Competitive with Mythos Preview at ~1/3 the output tokens | OpenAI's claim via Forklog, no score released; efficiency, not a raw lead |
| Real-world cyber (Chromium, Firefox) | Found bugs and exploitation primitives, no full autonomous exploit | Stays below OpenAI's "Cyber Critical" threshold |
The Terminal-Bench 2.1 claim is the one worth anchoring on, because it is the benchmark our Opus 4.8 coverage already tracks. On that page, GPT-5.5 paired with the Codex CLI harness scored 83.4%. OpenAI now says Sol sets a new high on the same benchmark, including over Claude's Fable 5 and Mythos 5. Without a published figure or a third-party harness run, that is a claim to file, not a result to bank. Anthropic and OpenAI report Terminal-Bench through different harnesses, and as our Opus 4.7 vs GPT-5.4 comparison noted, cross-harness scores are directional at best. The cybersecurity framing is more concrete and more honest: OpenAI says GPT-5.6 is better at finding and fixing vulnerabilities than at exploiting them, and that no model in the family carries out autonomous end-to-end attacks against hardened targets.
Pricing
GPT-5.6 prices per token across all three tiers, which keeps cost modeling simple. The full schedule:
| Model | Input (per 1M) | Output (per 1M) |
|---|---|---|
| Sol | $5 | $30 |
| Terra | $2.50 | $15 |
| Luna | $1 | $6 |
OpenAI had not detailed verified per-family cache discount ratios for GPT-5.6 in the materials we could access at preview, so check OpenAI's official API pricing page for the current caching terms before you model your spend.
The numbers are aggressive where it counts. Sol matches Opus 4.8 on input at $5 per million but charges $30 on output versus Opus 4.8's $25. Terra at $2.50/$15 undercuts Claude Sonnet 4.6's $3/$15 on input. Luna at $1/$6 has no direct Claude equivalent at that price point. On paper, Terra and Luna are priced to win high-volume production traffic. The catch is the same as everywhere else in this launch: you can read the price sheet, but unless you are one of the ~20 vetted partners, you cannot buy at it yet. The reason access is gated at all sits in the safety disclosures.
Safety Profile
The system card is the most substantive document in the release, and it is where the "Mythos-like" capability concern shows up in numbers. Under OpenAI's Preparedness Framework, all three models, Sol, Terra, and Luna, are rated High in Biological & Chemical and High in Cybersecurity, and below High in AI self-improvement. That uniformity is the notable part. OpenAI states it is "the first time that smaller and faster members of a model family have received a High capability designation in any Tracked Category." Even Luna, the cheap tier, lands at High.
The mitigations match the rating. OpenAI reports over 700,000 A100-equivalent GPU hours spent on automated red-teaming to find universal jailbreaks, continuing after deployment. Sol and Terra ship with activation classifiers that can intervene mid-generation, real-time output scanning, and trust-based access programs that reserve the most sensitive cybersecurity and biological capabilities for verified defenders. External testing came from SecureBio, METR, Apollo Research, and Irregular. The framing OpenAI keeps returning to is layered defense: "Severe harm requires a chain of successful steps, and our safeguards place barriers throughout that chain." And on cyber specifically: "These models are a meaningful step up in cybersecurity capability, but they do not reach our risk framework's highest level."
That last line is the crux of the government's involvement. A model strong enough to find real vulnerabilities, rated High on both bio and cyber, is exactly the profile a national-security reviewer wants to see safeguarded before it reaches everyone.
GPT-5.6 Sol vs Claude Opus 4.8 and Fable 5
For a Claude Code developer, the honest comparison starts with one axis that overrides the rest: availability. Here is the side-by-side.
| Dimension | GPT-5.6 Sol | Claude Opus 4.8 | Claude Fable 5 |
|---|---|---|---|
| Status | Limited preview (~20 vetted partners) | Generally available | Generally available (safeguarded) |
| Price (per 1M) | $5 / $30 | $5 / $25 ($10 / $50 Fast mode) | $10 / $50 |
| Where it runs | OpenAI Codex, API | Claude Code, claude.ai, API | Claude Code, claude.ai, API |
| Agentic coding | Terminal-Bench 2.1 SoTA (claimed, no number) | Browser-agent SoTA (84% Online-Mind2Web) | SoTA on FrontierCode and CursorBench |
| Multi-agent | "ultra" mode spawns subagents | Dynamic Workflows (hundreds of subagents) | Dynamic Workflows |
| Can you use it today? | Only if government-approved | Yes | Yes |
On raw capability claims, GPT-5.6 Sol and the Claude frontier are in the same league, and the architectural ideas have converged: both camps now ship a flagship that spawns subagents to parallelize hard work. Sol's "ultra" mode and Opus 4.8's Dynamic Workflows are describing the same pattern from two labs.
But capability you cannot deploy is a benchmark, not a tool. Claude Opus 4.8 is the reliable default for daily agentic coding right now, generally available at $5/$25, with a Fast mode at $10/$50 for throughput-heavy work. Fable 5 sits above it as the first publicly available Mythos-class model, safeguarded so that high-risk cybersecurity, biology, chemistry, and distillation requests route to Opus 4.8 instead, which happens in under 5% of sessions. Both are in production today. GPT-5.6 Sol's strongest claim, Terminal-Bench 2.1 leadership, is unverified, unpriced for the public, and gated behind a government approval list. If your decision is which model to point at a real codebase this week, the comparison is not close, and it has nothing to do with which model is smarter on a chart.
Coding: Codex vs Claude Code
There is a structural reason GPT-5.6 does not change a Claude developer's week, beyond the gating. GPT-5.6 ships through OpenAI Codex. Claude models run in Claude Code. They are different harnesses, and you cannot run GPT-5.6 inside Claude Code. So even when Sol, Terra, and Luna reach general availability, adopting them means moving your agentic workflow into OpenAI's stack, not swapping a model string in your existing setup.
If you specifically want to run non-Anthropic models through a Claude Code-style harness, that is possible for some providers today, and our guide to running Claude Code on other providers walks through the proxy approach. OpenAI's Codex models are a separate ecosystem, though, so the realistic read is this: for frontier agentic coding you can actually use right now, Claude Opus 4.8 and Fable 5 in Claude Code are generally available, and GPT-5.6 is a preview you mostly read about. When Sol opens up, it will be worth a head-to-head on real tasks, the same way we ran Opus 4.7 against GPT-5.4. Until then, betting your workflow on a gated preview is a bet on a timeline OpenAI does not fully control.
Frequently Asked Questions
Is GPT-5.6 available to use? Not for most people yet. GPT-5.6 launched on June 26, 2026 as a limited preview, available through the API and OpenAI Codex to roughly 20 partners whose names were individually approved by the U.S. government. OpenAI says it plans to make Sol, Terra, and Luna generally available across ChatGPT, Codex, and the API in the coming weeks, but it has not committed to a specific date.
What is GPT-5.6 Sol? Sol is the flagship of OpenAI's GPT-5.6 family, alongside the balanced Terra and the fast, low-cost Luna. OpenAI describes it as its strongest model to date, with agentic gains in coding, biology, and cybersecurity. Sol adds a "max" reasoning effort for harder problems and an "ultra" mode that spawns subagents to parallelize complex work.
How much does GPT-5.6 cost? Per million tokens, Sol is $5 input and $30 output, Terra is $2.50 and $15, and Luna is $1 and $6. Those rates are public, but during the limited preview you can only buy at them if you are one of the government-approved partners.
GPT-5.6 vs Claude: which is better for coding? For code you can actually ship this week, Claude wins, because GPT-5.6 is gated. OpenAI claims Sol sets a new Terminal-Bench 2.1 high, but the score is unpublished and access is restricted to about 20 partners. Claude Opus 4.8 and Fable 5 are generally available in Claude Code right now. When Sol opens up, a real head-to-head will be worth running; until then, availability decides it.
What Happens Next
The near-term timeline is concrete where it can be. The preview is live now via API and Codex for the approved partner list, which OpenAI says expands next week. Sol arrives on Cerebras in July at up to 750 tokens per second for select customers as capacity scales. General availability for Sol, Terra, and Luna across ChatGPT, Codex, and the API is slated for "the coming weeks," contingent on the government review process that triggered the limited rollout in the first place. Greg Brockman's preview verdict on X was characteristically brief: "GPT-5.6 Sol preview, it's a good model."
The takeaway for builders is simple. GPT-5.6 is a genuinely strong family and a preview of a new regulatory reality, but availability beats benchmarks, and right now the available frontier runs on Claude. If you want to put that frontier to work without spending a week wiring up agents, context management, and a multi-agent pipeline from scratch, the Code Kit ships the operational stack tuned for Opus 4.8 and Claude Code: 18 specialist agents, the /team-plan to /build pipeline, and the routing config that decides when to reach for the heavier model. For the broader picture of which model fits which task, see our model selection guide and the full Claude model lineup. When Sol is something you can actually run, we will benchmark it properly. Until then, the most capable model you can deploy is the one that matters.
Last updated on
