Top AI Models Embrace “Interleaved Thinking”

MiniMax M2’s Interleaved Thinking slashes cost 12×, lifts Agent scores 40%, ends state drift.

Dec 05, 2025

∙ Paid

“AI Disruption” Publication 8400 Subscriptions 20% Discount Offer Link.

MiniMax M2: An open source beast that you can run “locally” in your laptop thanks to ollama Cloud | by Jose MS Gomez-Olea | Oct, 2025 | Medium

Yesterday, a Twitter blogger shared the results of several major domestic open-source models on the mini-SWE-agent benchmark test, a lightweight software engineering Agent evaluation. This benchmark primarily tests large language models’ multi-step reasoning, environment interaction, and engineering capabilities in real software development tasks.

The results showed that MiniMax’s new-generation large model M2 performed the best, surpassing competitors such as DeepSeek, GLM, Qwen, Kimi, and others.

As a large model that excelled in Agent and coding capabilities from its initial release, MiniMax M2’s impressive performance on the mini-SWE-agent test is not surprising. It can not only plan excellently and execute complex long-chain tool-calling tasks stably, but also collaboratively invoke Shell, Browser, Python code executors, and various other MCP tools.

The key technology supporting these capabilities is precisely the “Interleaved Thinking” adopted by MiniMax M2. In simple terms, this means thinking while calling tools simultaneously. With this technology, the model can continuously accumulate contextual understanding in a closed loop of “think - act - reflect” and adjust strategies in real-time based on feedback.

This approach, which more closely resembles how real engineers work, significantly enhances MiniMax M2’s Agent execution capabilities—stronger planning in complex tasks, higher execution robustness, and more reliable self-correction ability—forming its most distinctive core advantage.

Released just over a month ago, MiniMax M2 has gained widespread recognition from developers in actual Agent usage scenarios. Previously, Twitter blogger @elvis stated, “MiniMax-M2 is much more important than I imagined! I built a deep research Agent with M2, and the interleaved thinking is truly remarkable. It can retain complete content blocks (thinking + text + tool calls) between tool invocations, enabling continuous reasoning. This is very helpful for self-improving Agents.”

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.