GLM 5.2 put through the same five coding tests as Claude: what Julian Goldie found

Video: "NEW GLM 5.2 DESTROYS Claude?" by Julian Goldie on YouTube.

What GLM 5.2 actually is

GLM 5.2 is a large language model from Zhipu AI, a Chinese research lab. It was released on 13 June 2026 under an MIT open source licence, which means you can use it commercially, run it locally, or access it free via API. The context window sits at 1 million tokens — large enough to hold an entire codebase in a single session without chunking.

That combination — free, open source, large context — would be notable even if the model were mediocre. The question is whether it's actually capable enough for real development work. That's what the test was designed to find out.

The five tests and what they showed

Julian ran five identical coding prompts through GLM 5.2, Claude Opus 4.8, and Kimi K2.7. The tasks covered code completion, debugging, and structured output generation — a reasonable spread of what you'd ask a model to do during a typical build.

GLM 5.2 performed well on code completion and structured output. It produced clean, usable results and was fast. On debugging tasks where the error was contained and the fix was well-defined, it held its own. To be fair, those are also the tasks where most models do reasonably well once they're above a certain capability threshold.

Where Claude Opus 4.8 pulled ahead was on tasks involving complex, multi-step reasoning — situations where the model needs to hold a lot of context, track dependencies across the problem, and reason through a sequence of decisions before writing any code. Claude's answers in those scenarios were more methodical and less likely to introduce new issues while fixing existing ones.

Kimi K2.7 in the mix

Kimi K2.7 is a significant model in its own right — roughly 1 trillion parameters, with a design that uses around 30% fewer tokens than comparable models for the same output. In the coding tests, it was fast and handled structured tasks well. It sits closer to GLM 5.2 than to Claude on the complex reasoning end, but it's worth including in any model shortlist if throughput matters.

What "free" actually changes for small businesses

The cost argument for GLM 5.2 is straightforward. If it performs adequately on 70–80% of your coding tasks, routing those to GLM 5.2 and reserving Claude Opus 4.8 for the complex ones could reduce your API bill substantially. That's not a hypothetical — it's a routing decision that's practical to implement right now.

In practice, the tasks where GLM 5.2 falls short are usually the ones that take the most developer time to review and correct anyway. Using a cheaper model for those tasks isn't really a saving if you're spending extra time cleaning up after it. The honest approach is to test your specific workload before committing to any routing strategy.

Where this connects to NordSys

We build bespoke software for clients using AI coding tools, and the choice of model matters — not just for quality, but for cost and predictability over a project's lifetime. A free, capable open-source model like GLM 5.2 genuinely changes the options available for smaller builds where API costs would otherwise be a constraint. We keep a close eye on these releases so that when we're recommending a tech stack to a client, we're working from current information rather than assumptions. If you have a development project in mind and want advice on what's actually available, get in touch.

See our Programming service →