DeepSeek/o3 Weakness: Why the Shortest Answer is Often Correct

Recent research highlights the "Underthinking" issue in AI models like DeepSeek, showing how ineffective switching between approaches can waste resources and reduce accuracy.

Feb 04, 2025

∙ Paid

"AI Disruption" publication New Year 30% discount link.

As DeepSeek and other inference models like o1/o3 continue to make a huge impact, some have started to study their weaknesses.

Recent research reveals:

When faced with difficult problems, inference models may frequently switch between different approaches like a “fickle student,” but fail due to lack of in-depth exploration. This phenomenon is referred to by researchers as Underthinking.

The research team, from Tencent AI Lab, Soochow University, and Shanghai Jiao Tong University, focused on the open-source DeepSeek-R1 and Qwen QwQ series models.

By analyzing AI's incorrect answers, they discovered that these models often start off on the right track early in their reasoning but tend to “scratch the surface” before quickly exploring other approaches. This leads to thousands of generated tokens that do not contribute to solving the problem.

This “ineffective effort” not only wastes computational resources but significantly reduces the accuracy of the answers.

AI Disruption

DeepSeek/o3 Weakness: Why the Shortest Answer is Often Correct

Recent research highlights the "Underthinking" issue in AI models like DeepSeek, showing how ineffective switching between approaches can waste resources and reduce accuracy.

This post is for paid subscribers