Claude Opus 4.7: Vision, Verification, and a New Effort Tier
Opus 4.7 adds 3x vision resolution, xhigh effort, stricter instruction-following, and 70% CursorBench. Same $5/$25 pricing.
Agentic Orchestration Kit for Claude Code.
Claude Opus 4.7 is Anthropic's latest flagship model. It verifies its own outputs before reporting, processes images at more than three times the resolution of any previous Claude, and introduces a new xhigh effort level that sits between high and max. Instruction-following is stricter, vision capabilities took a generational leap, and coding benchmarks climbed again. Pricing stays at $5/$25 per million tokens.
The model ships alongside a Cyber Verification Program for security professionals, higher-resolution image support across the API, and task budgets in public beta for managing token spend on long-running agentic sessions.
Key Specs
| Spec | Details |
|---|---|
| API ID | claude-opus-4-7 |
| Release Date | April 16, 2026 |
| Context Window | 1M tokens (GA) |
| Max Output | 128,000 tokens |
| Pricing | $5 input / $25 output per 1M tokens |
| Status | Active, current recommended Opus |
What Changed: The Practical Improvements
Anthropic's internal teams use Claude Code daily, and each model release reflects what they learned from the previous one. Opus 4.7's changes are specific:
Self-verification before reporting. The model checks its own work before presenting results. It catches logical faults during planning and validates outputs against the original requirements. Intuit describes it as "catching its own logical faults during the planning phase and accelerating execution." Vercel's team observed it doing "proofs on systems code before starting work, which is new behavior."
3x vision resolution. Opus 4.7 accepts images up to 2,576 pixels on the long edge (roughly 3.75 megapixels), more than three times what prior Claude models supported. No API parameter changes needed. On XBOW's visual-acuity benchmark, it scored 98.5% compared to 54.5% for Opus 4.6. It reads chemical structures, complex technical diagrams, and dense charts that previous models struggled with.
Stricter instruction-following. The model interprets instructions more literally than Opus 4.6. This is a double-edged upgrade: prompts that relied on the model filling in implied context may need adjustment. The flip side is that explicit instructions produce more predictable results. Notion found it was the first model to pass their implicit-need tests.
New xhigh effort level. The effort scale now has five levels: low, medium (implied between low and high), high, xhigh, and max. Claude Code defaults to xhigh for all plans. The xhigh level gives deeper reasoning than high without the full cost of max. Hex's CTO noted that "low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6."
Longer autonomous sessions. Cognition (Devin) reports the model "works coherently for hours, pushes through hard problems." Factory saw 10-15% higher task success rates with fewer instances of the model stopping halfway through complex work.
Updated tokenizer. The same input may map to roughly 1.0-1.35x more tokens depending on content type. Combined with deeper thinking at higher effort levels, token usage increases. Controls include the effort parameter, task budgets, and conciseness prompting. Check the migration guide for details.
Benchmark Results
Opus 4.7 posted gains across coding, vision, legal, finance, and agentic evaluations:
| Benchmark | Opus 4.7 | Opus 4.6 | Notable |
|---|---|---|---|
| CursorBench | 70% | 58% | +12 points |
| Terminal-Bench 2.0 | 3 new tasks solved | baseline | Tasks no prior model could pass |
| XBOW Visual Acuity | 98.5% | 54.5% | +44 points, generational leap |
| BigLaw Bench | 90.9% | -- | Harvey, high effort |
| GDPval-AA | SoTA | SoTA | Maintained frontier position |
| General Finance | 0.813 | 0.767 | AlphaSense research-agent module |
| OfficeQA Pro | 21% fewer errors | baseline | Databricks evaluation |
| Notion Agent | +13% resolution | baseline | 93-task internal benchmark, fewer tokens |
CursorBench measures real-world coding assistance quality, the benchmark most directly relevant to developers using Claude in their editor. The jump from 58% to 70% represents meaningfully better code suggestions, completions, and refactors.
The XBOW visual-acuity result deserves attention. Going from 54.5% to 98.5% is not incremental improvement. This is the first Claude model where you can reliably pass in high-resolution screenshots, architectural diagrams, or scientific figures and expect accurate interpretation.
On Terminal-Bench 2.0, Opus 4.7 solved three tasks that no previous Claude model (or competing frontier model) could handle, including fixing a race condition that required multi-file reasoning across a complex codebase.
Rakuten reported 3x more production tasks resolved compared to Opus 4.6, with double-digit gains in Code Quality and Test Quality scores. CodeRabbit saw recall improve over 10%, noting the model is "a bit faster than GPT-5.4 xhigh."

Vision: A Generational Leap
Previous Claude models were limited to lower-resolution image inputs. Opus 4.7 raises the ceiling to 2,576 pixels on the long edge, roughly 3.75 megapixels. This is a model-level change with no API parameters to toggle.
What this means in practice:
- Code screenshots at full resolution, no more squinting artifacts
- Technical diagrams with fine labels and small text rendered accurately
- Chemical structures and scientific notation parsed correctly (Solve Intelligence confirmed this)
- Charts and graphs with dense data points interpreted without hallucinating values
Higher-resolution images consume more tokens. If you are passing images where fine detail is not critical, downsample before sending to manage costs.
Cybersecurity and the Cyber Verification Program
Opus 4.7 includes automated safeguards that detect and block requests indicating prohibited or high-risk cybersecurity uses. These safeguards are part of Anthropic's broader Project Glasswing initiative and serve as a testing ground for eventual broader release of their more capable Mythos-class models.
Legitimate security professionals can access cybersecurity capabilities through Anthropic's new Cyber Verification Program. This covers vulnerability research, penetration testing, and red-teaming. Apply at claude.com/form/cyber-use-case.
The cyber capabilities in Opus 4.7 are intentionally less advanced than what Anthropic's internal Mythos Preview can do. Training included differential reduction of certain cyber capabilities as a safety measure.
Safety Profile
Opus 4.7 maintains a similar safety profile to Opus 4.6 with targeted improvements in honesty and resistance to prompt injection attacks. Anthropic's assessment describes it as "largely well-aligned and trustworthy, though not fully ideal."
One noted weakness: the model can be overly detailed in harm-reduction advice on controlled substances. The full details are in the Claude Opus 4.7 System Card.
New API and Product Features
Alongside the model, Anthropic shipped:
Task budgets (public beta). Guide Claude's token spend across longer agentic runs. Set a budget and the model plans its token usage accordingly rather than spending freely.
Higher-resolution image support. The 2,576-pixel limit applies across the API for all Opus 4.7 requests. No opt-in needed.
/ultrareview in Claude Code. A dedicated slash command that runs a focused review session, flagging bugs and design issues. Pro and Max users get 3 free ultrareviews.
Auto mode for Max users. Claude makes decisions autonomously with fewer interruptions and managed risk. Previously limited to Enterprise plans.
Pricing
No price increase. Unchanged from Opus 4.6:
| Tier | Cost |
|---|---|
| All contexts | $5 input / $25 output per 1M tokens |
| Prompt caching | Up to 90% savings |
| Batch processing | 50% savings |
| US-only inference | 1.1x standard pricing |
The tokenizer change means the same input may cost slightly more (1.0-1.35x) due to different token boundaries. For most workloads the increase is negligible. For token-sensitive applications, benchmark your specific content.
How to Use Opus 4.7 in Claude Code
Switch your default model:
For per-session overrides:
The model is available across claude.ai, the Messages API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. The API model identifier is claude-opus-4-7.
To use the new xhigh effort level:
Opus 4.7 vs Opus 4.6: What Changed
| Feature | Opus 4.6 | Opus 4.7 |
|---|---|---|
| Vision resolution | Standard (~850px long edge) | 2,576px long edge (3.75 MP, 3x more) |
| Effort levels | low, high, max | low, high, xhigh (new), max |
| Default effort | high | xhigh |
| CursorBench | 58% | 70% (+12 points) |
| XBOW Visual Acuity | 54.5% | 98.5% (+44 points) |
| Self-verification | Basic | Proactive output verification |
| Instruction-following | Standard | Stricter, more literal |
| Tokenizer | Previous generation | Updated (1.0-1.35x more tokens) |
| Cyber safeguards | Not present | Automated detection and blocking |
| Pricing | $5/$25 per 1M | $5/$25 per 1M (unchanged) |
The core story is vision and verification. The 3x resolution jump makes Opus 4.7 the first Claude model where image understanding is genuinely reliable for professional use. The self-verification behavior means fewer rounds of "wait, let me check that" from the user. Everything Opus 4.6 did well -- 1M context, 128K output, adaptive thinking, agent teams -- carries forward unchanged.
For model selection, the upgrade path is straightforward. If you use Opus for complex work, switch to 4.7. The only adjustment needed is reviewing prompts that depended on loose instruction interpretation, since the model now follows instructions more literally. If you are on Sonnet 4.6 for daily work and Opus for heavy lifting, that split still makes sense with the newer Opus.
Last updated on
