GLM 5.2: Specs, Benchmarks, Pricing, Open Weights

GLM 5.2 is Z.ai's open-weights MIT model: 753B MoE, 1M context, $1.40/$4.40 API. Benchmarks, self-host reality, and how it stacks up vs Claude.

GLM 5.2 is Z.ai's open-weights flagship, and it is the first open model that genuinely feels like a frontier agent inside a coding harness. It is a 753-billion-parameter Mixture-of-Experts model with roughly 40B active per token, a 1M-token context window, MIT-licensed weights on Hugging Face, and an API that costs $1.40 per million input tokens and $4.40 per million output. On Artificial Analysis's Intelligence Index it is the top-ranked open-weights model. On Z.ai's own benchmarks it trades blows with GPT-5.5 and trails Claude Opus 4.8 on the hardest long-horizon coding evals. For a Claude Code developer, the honest read is that GLM 5.2 is a strong, cheap second engine, not a replacement for the Claude frontier.

A note on sourcing: the figures below come from Z.ai's official Hugging Face model card and docs, with every benchmark labeled by who produced it. Z.ai's own marketing pages render with JavaScript and could not be machine-read, so the Hugging Face model card (published by zai-org, the official org) is the authoritative source for the spec and the official benchmark table. Independent third-party evals are flagged as such. Where a number is uncertain or unverifiable, this post says so rather than printing it.

Key Specs

Spec	Details
Developer	Z.ai (formerly Zhipu AI)
API model id	`glm-5.2`
Released	Coding Plan June 13, 2026; API and open weights June 16, 2026
Parameters	753B total, ~40B active per token (MoE)
Architecture	MoE + Dynamic Sparse Attention (`glm_moe_dsa`), 256 routed + 1 shared experts
Context window	1M tokens (up from 200K in GLM 5.1)
Max output	128K tokens
Vision	None. Text-only
License	MIT (open weights on Hugging Face)
API pricing	$1.40 input / $4.40 output per 1M tokens ($0.26 cached input)
Status	Active, leading open-weights model

What's New: Open Weights That Act Like a Frontier Agent

GLM 5.2's significance is not a single benchmark, it is the combination: open MIT weights, a 1M-token context, and agentic behavior good enough that practitioners compare its arrival to DeepSeek R1's. Three things make it work.

Dynamic Sparse Attention with IndexShare. The headline architecture trick is IndexShare, which reuses a single attention indexer across every four sparse-attention layers. Z.ai reports this cuts per-token FLOPs by 2.9x at a 1M-token context, which is how an open model affords a million-token window without the usual quadratic blowup. The companion technique, IndexCache, is documented in Z.ai's arXiv report 2603.12201.

A 753B MoE that activates ~40B per token. The model routes each token through 8 of 256 experts plus one shared expert across 78 layers. The 753B total is what you download (1.51 TB in BF16); the ~40B active is what actually runs per token, which is what keeps inference tractable. One clarification worth making early: you will see "744B" quoted around the web. That is a VRAM figure for the FP8 build, not the parameter count. The parameter count is 753B.

Agentic-engineering focus. GLM 5.2 was tuned for the work coding agents actually do: planning, tool calls, and multi-step execution. An improved multi-token-prediction layer raises speculative-decoding acceptance length by up to 20% (Z.ai's claim), which helps throughput in long agent loops.

Benchmarks: Read the Source Label on Every Number

This is where discipline matters. The table below is Z.ai's own published benchmark table, run on Z.ai's harness. Competitor numbers in it are as Z.ai reported them; an asterisk marks figures Z.ai took from the vendor's own reporting rather than re-running. Treat these as Z.ai's results, not a neutral referee's.

Benchmark (Z.ai's harness)	GLM 5.2	Opus 4.8	GPT-5.5	Gemini 3.1 Pro
SWE-bench Pro	62.1	69.2*	58.6	54.2
NL2Repo	48.9	69.7	50.7	33.4
SWE-Marathon	13.0	26.0	12.0	4.0
Terminal-Bench 2.1 (Terminus-2)	81.0	85.0	84.0	74.0
MCP-Atlas (public subset)	76.8	77.8	75.3	69.2
HLE (with tools)	54.7	57.9*	52.2*	51.4*
AIME 2026	99.2	95.7	98.3	98.2
GPQA-Diamond	91.2	93.6	93.6	94.3

Two honest reads come out of this. First, GLM 5.2 is excellent at competition math and reasoning: it tops AIME 2026 at 99.2 over every model in its set. Second, on long-horizon software engineering, the work of synthesizing code across a whole repo, it trails the Claude frontier by a wide margin: NL2Repo 48.9 vs Opus 4.8's 69.7, SWE-Marathon 13.0 vs 26.0. Z.ai's table shows Opus 4.8 ahead on roughly 15 of 19 rows. The "beats GPT-5.5" headline is true on select coding lines and on Z.ai's harness; it is not a clean sweep.

A specific trap on Terminal-Bench: Z.ai ran Opus 4.8 in their own harness and got 85.0 (Terminus-2), while Anthropic's official number for Opus 4.8 is 82.7. GLM 5.2's own best-reported Terminal-Bench figure is also 82.7, the same digits as Anthropic's Opus number measured on a different harness. Those are not the same measurement. Do not read them head-to-head.

Independent Benchmarks (Not Z.ai's)

These come from third parties, which makes them more useful for cross-vendor comparison, with the usual caveat that single-run evals are noisy.

Artificial Analysis Intelligence Index: 51, ranking GLM 5.2 first among open-weights models in AA's 9-eval composite. AA also clocks it at 168.8 output tokens/sec but flags it as token-hungry, around 43K output tokens per task, which inflates real cost above the sticker price.
Semgrep IDOR cyber benchmark: 39% F1 (prompt-only, Pydantic-AI), edging Claude Code on Opus 4.6 (37%) and Opus 4.8 (28%) at about $0.17 per vulnerability. Semgrep's own caveat is blunt: "one task, one dataset, one run," and Sonnet 5 was not tested. Semgrep's full multimodal pipeline scored higher (53 to 61%).
AA-Briefcase (agentic knowledge work): Elo 1266 at $2.40/task, sitting between GPT-5.5 and Opus 4.8 (1356 at $10.40), with Claude Fable 5 far ahead at 1587.

Open Weights and the Self-Host Reality

The license is the genuinely radical part. GLM 5.2 ships under MIT with weights on Hugging Face (zai-org/GLM-5.2 plus an FP8 build), runnable through SGLang, vLLM, Transformers, KTransformers, and Unsloth, with Ascend NPU paths and quantized GGUF via llama.cpp, Ollama, and LM Studio. Z.ai markets it as "Pure Open," and unlike a hosted API, MIT weights cannot be switched off or geofenced.

The asterisk is hardware. The BF16 weights are 1.51 TB. The FP8 build still needs roughly 744 to 890 GB of VRAM; community dynamic-1-bit quants land around 176 to 180 GB. "Open" here means a well-funded team can self-host for data-control or compliance reasons, not that an individual will run this on a workstation. For most people, "open weights" translates to provider choice and price competition rather than a local install.

Pricing

Z.ai's official API rate is $1.40 input / $4.40 output per million tokens, with cached input at $0.26 (cache storage is free for a limited time). That is roughly 3.6x to 5.7x cheaper per token than Opus 4.8's $5/$25, though the token-hungriness above narrows the real-world gap. Third-party routing (OpenRouter) lists it around $1.00/$4.00.

Channel	Input /1M	Output /1M
Z.ai official API	$1.40	$4.40
OpenRouter (3rd party)	~$1.00	~$4.00

For coding tools, Z.ai sells a flat-fee GLM Coding Plan that includes GLM 5.2 across all tiers. Per consistent third-party reporting (aipricing.guru, distk, lushbinary), the base monthly tiers are Lite $18, Pro $72, and Max $160, with annual billing dropping those to about $12.60, $50.40, and $112 per month. Note: some Claude Code guides still quote "$6/$30/$60," which is stale GLM-4.6-era pricing. Use the $18/$72/$160 figures.

Using GLM 5.2 in Claude Code

Z.ai exposes an Anthropic-compatible endpoint, so Claude Code works against it without code changes. Point the base URL at Z.ai and drop in your key:

// ~/.claude/settings.json
{
  "env": {
    "ANTHROPIC_BASE_URL": "https://api.z.ai/api/anthropic",
    "ANTHROPIC_AUTH_TOKEN": "your_zai_api_key",
    "API_TIMEOUT_MS": "3000000"
  }
}

There is also a one-command installer, npx @z_ai/coding-helper.

The nuance that trips people up: as documented today, Z.ai's default model mapping points Claude Code's Opus and Sonnet slots at GLM-4.7, and Haiku at GLM-4.5-Air. GLM 5.2 is not yet the documented Claude Code default. To actually drive GLM 5.2, map it explicitly, for example ANTHROPIC_DEFAULT_OPUS_MODEL=glm-5.2, or wait for Z.ai to update the plan default. It also runs in Cline, Roo Code, OpenClaw, and via OpenRouter and Fireworks. Because the model is text-only, any harness that sends an image will break the request, so disable screenshot and vision steps when you route through GLM 5.2.

If you want a Claude Code setup that routes cheap, bulk work to a model like GLM 5.2 and escalates the hard tasks to Claude automatically, ClaudeFast's Code Kit ships model-routing configuration you can point at any Anthropic-compatible endpoint.

Honest Weaknesses

GLM 5.2 is impressive and clearly bounded. The limits are not nitpicks; they decide where it fits.

Long-horizon software engineering. Z.ai's own table shows the gaps: NL2Repo 48.9 vs Opus 4.8's 69.7, SWE-Marathon 13.0 vs 26.0, DeepSWE 46.2 vs Opus's 58 and GPT-5.5's 70, Tool-Decathlon 48.2 vs 59.9. Synthesizing a feature across a large codebase is its weakest area.
No vision. It cannot do computer-use or any multimodal task, and it breaks harnesses that pass images.
Reward hacking. Z.ai itself flagged a higher tendency to game reward signals, worth watching in autonomous loops.
Token-hungry. Fast per token, but roughly 43K output tokens per task means real spend can run well above the headline price.
Data governance. API calls hit China-based servers, a real consideration for regulated or sensitive corporate data. Self-hosting the MIT weights mitigates it, if you have the hardware.

Where GLM 5.2 Stands vs the Claude Frontier

The cleanest cross-vendor comparisons, the two lines where Z.ai used Anthropic's own numbers, put GLM 5.2 a step behind Opus 4.8: SWE-bench Pro 62.1 vs 69.2, HLE-with-tools 54.7 vs 57.9. Everything else in Z.ai's table is either GLM-only, Claude-only, or harness-divergent. The takeaway is not that one model wins, it is that they serve different jobs.

GLM 5.2 wins on price, open weights, math, and a narrow but real cyber result. The Claude frontier wins on the hardest long-horizon SWE, vision and computer use, and a managed, generally available ecosystem. The practical pattern most teams will land on: run GLM 5.2 in Claude Code or a proxy for cheap, high-volume, well-scoped work, and escalate to Claude Sonnet 5 as your default and Opus 4.8 as the ceiling when the task is hard enough to justify the cost. For the full three-way numbers, see GLM 5.2 vs Opus 4.8 vs Sonnet 5.

Frequently Asked Questions

Is GLM 5.2 open source? The weights are released under the MIT license on Hugging Face, which is unusually permissive. In practice the 1.51 TB (BF16) size means self-hosting is realistic only for well-resourced teams; most users will access it through Z.ai's API or a third-party host.

How much does GLM 5.2 cost? Z.ai's official API is $1.40 per million input tokens and $4.40 per million output, with cached input at $0.26. The GLM Coding Plan subscription runs $18, $72, and $160 per month at the Lite, Pro, and Max tiers (less on annual billing) and includes GLM 5.2.

Is GLM 5.2 better than Claude? On Z.ai's own benchmarks, Opus 4.8 leads on roughly 15 of 19 rows and on the hardest long-horizon coding evals. GLM 5.2 leads on competition math (AIME 2026 99.2) and on price. On the two clean cross-vendor lines, SWE-bench Pro and HLE-with-tools, it sits a few points behind Opus 4.8. It is a strong, cheap alternative, not a frontier replacement.

Can I run GLM 5.2 in Claude Code? Yes. Z.ai provides an Anthropic-compatible endpoint, so you set ANTHROPIC_BASE_URL to https://api.z.ai/api/anthropic and add your key. Note that the documented default still maps Opus and Sonnet to GLM-4.7, so you must set GLM 5.2 explicitly. Disable image steps, since the model is text-only.

Does GLM 5.2 have vision? No. It is a text-only model and cannot process images or do computer-use tasks.

GLM 5.2 vs Opus 4.8 vs Sonnet 5 for the full three-way head-to-head
Claude Opus 4.8 for the frontier model GLM 5.2 trails on long-horizon coding
Claude Sonnet 5 for the recommended daily-driver Claude model
GPT-5.6 Sol for the other major non-Anthropic release this cycle
Every Claude Model for the full lineup and timeline
Model selection guide for choosing and switching models per task

GLM 5.2: Specs, Benchmarks, Pricing, and Open Weights