Understanding GLM-5: The Model That Stumped Silicon Valley

Introduction

After experiencing the newly released GLM-5 from Zhiyu AI, I finally understand why it has puzzled Silicon Valley. The rumors surrounding the mysterious “Pony Alpha” model have been circulating on the internet for a week. Some claimed it was a disguise for Claude 5, while others said it was a secret weapon from a major tech company. The truth is out: this new model, codenamed “Pony Alpha,” is indeed Zhiyu AI’s big move for the Spring Festival—GLM-5.

Moreover, it has been open-sourced.

If 2025 was the year AI learned to write code, then as predicted by Tesla’s former AI director Andrej Karpathy, 2026 marks the beginning of the “Agentic Engineering” era. However, compared to GPT-5.3-Codex and Claude Opus 4.6, the first to establish this as an open-source infrastructure is the domestic model GLM-5.

The Deception of Pony Alpha

Creating simple games like Snake or Tetris is no longer a novelty for AI. To truly test its capabilities, we presented GLM-5 with a highly specific physical simulation task:

“Create an interactive HTML, CSS, and JavaScript satellite system simulation program that simulates the process of a satellite sending signals to ground receivers. The simulation should display a satellite orbiting the Earth and periodically sending signals that are received by multiple ground receivers.”

It did not immediately provide code; instead, it paused momentarily (simulating a thought process) before generating an HTML webpage according to my requirements. On the screen, the satellite not only orbited but also displayed signal transmission with visual metaphors of wave diffusion that corresponded to the Doppler effect.

It understood the physical principles behind “simulation,” not just the action of “drawing.”

Next, we increased the difficulty.

A user on X, known as @scaling01, gave a high praise: “Pony-Alpha is either AGI or has memorized my SVG problem library.”

To verify this, we tested an extremely abstract Python task: “Visualize the operation of traffic lights on a one-lane road, with vehicles entering at random speeds.”

In less than 3 minutes, a dynamic traffic flow simulation appeared.

The logic was impeccable: green lights allowed passage, red lights queued, and the randomness of vehicle acceleration and deceleration was well simulated. However, the aesthetic of this interface… well, let’s just say it was a bit ‘rudimentary.’

Even a user @anurudhsharmaa generated an aesthetically pleasing website with just one prompt.

Another user @zakarinoo7 created a fully functional media player—supporting MP4/MP3 decoding, playlist management, and even a dark mode UI, all compiled into just 15MB.

This made me eager to try again. I asked GLM-5 to help me build an open-world game featuring stick figures.

It didn’t rush to write code; instead, it took a very “human” approach by first discussing the tech stack, core gameplay, and world style, gradually aligning with my needs.

During its “construction” process, I could act like a demanding client, continuously inserting new ideas into the original requirements:

Running around is too boring; there should be an economy system with randomly spawning coins on the ground.
Add some action elements; press J to shoot arrows and K for melee attacks.
Where do the items go? Add a backpack UI that can be summoned with the I key.
The stick figures by the roadside shouldn’t just be decorations; I want to interact with NPCs.

When it finally ran, the result could only be described as “perfect.”

Since it claimed to be a system architect, I also had it create a Mac system after it was launched on the official website.

Although the overall design was a bit rough, it accurately depicted the classic screen background, synchronized time display on the top status bar, and the arrangement of icons in the bottom dock. Surprisingly, every application on it could actually be opened.

Adapting to Half the Chip Industry

Benchmark results show that GLM-5 achieved state-of-the-art performance in coding and agent capabilities.

Data doesn’t lie. In the two most recognized and challenging programming rankings, SWE-bench-Verified and Terminal Bench 2.0, GLM-5 scored 77.8 and 56.2, respectively. In real programming scenarios, its performance is already incredibly close to Claude Opus 4.5.

What enables GLM-5 to achieve this? Reviewing the official report, we found several key points behind a multitude of parameters: the MoE architecture and Asynchronous Reinforcement Learning (Asynchronous RL).

With a total parameter count of 744 billion and only 40 billion active parameters, it is both smart and lightweight. However, the real killer feature is Zhiyu’s newly constructed “Slime” framework.

To put it simply: previous model training was like taking an “exam,” where the model memorized answers to score high; GLM-5’s training resembles an “internship,” where it learns by completing long-term projects in an environment called Slime, continuously learning through feedback and interaction.

Additionally, it integrates DeepSeek Sparse Attention for the first time. This means that when processing context with potentially hundreds of thousands of lines of code, it not only won’t “get lost” but can also significantly reduce deployment costs.

What struck me most was the long list of acknowledgments at the bottom of the official announcement. Domestic large models can now achieve stable operation with high throughput and low latency on domestic chip clusters.

Huawei Ascend, Moore Threads, Cambricon, Kunlun, Muxi, Suiruan, Haiguang…

This is almost half of the Chinese semiconductor industry, indicating that the open-sourcing of GLM-5 is not just a software victory; it marks the gradual completion of a closed loop in the domestic AI ecosystem—from the underlying chip computing power to the intermediate framework, and finally to the upper-level model.

With the open-sourcing of GLM-5 and its integration with mainstream tools like Claude Code and OpenCode, we may be standing at the threshold of Software Engineering 2.0.

Andrej Karpathy’s predicted era of “Agentic Engineering” is arriving faster than expected. In the future, you may no longer need to build line by line. You only need to define the system, the aesthetics, and what is “fun” and “useful.”

Then, watch as large models like GLM-5 act as foremen, directing the underlying computing power to construct skyscrapers.

The traditional “coder” era might truly be coming to an end.

But don’t worry; this doesn’t mean humans are obsolete. On the contrary, as AI takes care of the tedious implementations, your aesthetic sense, judgment, and ability to pose good questions will become the last and most solid moat for humanity.