DeepSeek-V3 Goes Viral: 671B MoE at $5.58M

Discover DeepSeek-V3: A groundbreaking 671B-parameter AI model with unmatched efficiency, outperforming SOTA models and redefining open-source benchmarks.

Dec 27, 2024

∙ Paid

DeepSeek-V3 is INSANE (FREE): RIP 3.5 Sonnet & O1?

Today, the DeepSeek model has taken the world by storm.

Opening X, the feed is flooded with discussions about DeepSeek-V3, with one of the hottest topics being its colossal 671B parameters and the surprising efficiency of its training process. The pre-training required just 2.664 million H800 GPU hours, and even with context extension and post-training, the total training time only reached 2.788 million H800 GPU hours.

In comparison, the Llama 3 series models have a computational budget of 39.3 million H100 GPU hours, sufficient to train DeepSeek-V3 at least 15 times over.

Despite its relatively lower computational demand, DeepSeek-V3 delivers performance on par with or even surpassing other state-of-the-art models.

According to the newly released DeepSeek-V3 technical report, its base model excels in tasks spanning English, code, mathematics, Chinese, and multilingual scenarios. On benchmarks like AGIEval, CMath, and MMMLU-non-English, it even significantly outperforms other open-source models.

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.

AI Disruption

DeepSeek-V3 Goes Viral: 671B MoE at $5.58M

Discover DeepSeek-V3: A groundbreaking 671B-parameter AI model with unmatched efficiency, outperforming SOTA models and redefining open-source benchmarks.

Continue reading this post for free, courtesy of Meng Li.