AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
DeepSeek-V3 Goes Viral: 671B MoE at $5.58M

DeepSeek-V3 Goes Viral: 671B MoE at $5.58M

Discover DeepSeek-V3: A groundbreaking 671B-parameter AI model with unmatched efficiency, outperforming SOTA models and redefining open-source benchmarks.

Meng Li's avatar
Meng Li
Dec 27, 2024
∙ Paid
1

Share this post

AI Disruption
AI Disruption
DeepSeek-V3 Goes Viral: 671B MoE at $5.58M
1
Share
DeepSeek-V3 is INSANE (FREE): RIP 3.5 Sonnet & O1?

Today, the DeepSeek model has taken the world by storm.

Opening X, the feed is flooded with discussions about DeepSeek-V3, with one of the hottest topics being its colossal 671B parameters and the surprising efficiency of its training process. The pre-training required just 2.664 million H800 GPU hours, and even with context extension and post-training, the total training time only reached 2.788 million H800 GPU hours.

In comparison, the Llama 3 series models have a computational budget of 39.3 million H100 GPU hours, sufficient to train DeepSeek-V3 at least 15 times over.

Despite its relatively lower computational demand, DeepSeek-V3 delivers performance on par with or even surpassing other state-of-the-art models.

According to the newly released DeepSeek-V3 technical report, its base model excels in tasks spanning English, code, mathematics, Chinese, and multilingual scenarios. On benchmarks like AGIEval, CMath, and MMMLU-non-English, it even significantly outperforms other open-source models.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share