DeepSeek Releases FlashMLA, Boosting H800 GPU Performance

DeepSeek launches FlashMLA, an efficient decoding kernel for Nvidia's H800 GPU, boosting AI task performance and lowering training costs with MLA and MoE technologies.

Feb 24, 2025

∙ Paid

"AI Disruption" publication New Year 30% discount link.

Deepseek Day 1 of Open Source Week: FlashMLA | by Ashley | Towards AGI | Feb, 2025 | Medium

Last Friday, DeepSeek tweeted that this week would be Open Source Week (OpenSourceWeek), and they would release five software libraries in succession.

On the first day of Open Source Week, DeepSeek released its first open-source project—FlashMLA.

The project garnered over 3.3k stars within just three hours of its launch! The number of stars is rapidly skyrocketing.

This is an efficient MLA decoding kernel developed by DeepSeek specifically for Nvidia's Hopper GPU, optimized especially for variable-length sequences. It has now officially been put into production.

AI Disruption

DeepSeek Releases FlashMLA, Boosting H800 GPU Performance

DeepSeek launches FlashMLA, an efficient decoding kernel for Nvidia's H800 GPU, boosting AI task performance and lowering training costs with MLA and MoE technologies.

This post is for paid subscribers