AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
DeepSeek Releases DeepGEMM: 300 Lines of Code Accelerate V3 & R1, R2 Expected Before May
Copy link
Facebook
Email
Notes
More

DeepSeek Releases DeepGEMM: 300 Lines of Code Accelerate V3 & R1, R2 Expected Before May

DeepSeek unveils DeepGEMM, an FP8 GEMM library accelerating V3/R1 performance with 300 lines of code. Expect R2 model release before May for enhanced AI capabilities.

Meng Li's avatar
Meng Li
Feb 26, 2025
∙ Paid
2

Share this post

AI Disruption
AI Disruption
DeepSeek Releases DeepGEMM: 300 Lines of Code Accelerate V3 & R1, R2 Expected Before May
Copy link
Facebook
Email
Notes
More
1
Share

"AI Disruption" publication New Year 30% discount link.


Applicable to both standard AI models and MoE.

DeepSeek’s Open Source Week has entered its third day (see the previous two days’ coverage at the end of this article in “Related Reading”).

Today's open-source project is DeepGEMM, an FP8 GEMM library that supports a dense Mixture of Expert (MoE) GEMM. It provides support for training and inference on V3/R1 and achieves a computing performance of over 1350+ FP8 TFLOPS on the Hopper GPU.

Specifically, DeepGEMM is a library that aims to achieve efficient and simplified FP8 General Matrix Multiply (GEMM) by utilizing the fine-grained scaling technique introduced in DeepSeek-V3.

The library supports standard GEMM as well as MoE-grouped GEMM. It is written in CUDA, and during installation, there is no need to compile the code manually; instead, a lightweight Just-In-Time (JIT) module compiles all kernels at runtime.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More