DeepSeek Releases FlashMLA, Boosting H800 GPU Performance
DeepSeek launches FlashMLA, an efficient decoding kernel for Nvidia's H800 GPU, boosting AI task performance and lowering training costs with MLA and MoE technologies.
"AI Disruption" publication New Year 30% discount link.
Last Friday, DeepSeek tweeted that this week would be Open Source Week (OpenSourceWeek), and they would release five software libraries in succession.
On the first day of Open Source Week, DeepSeek released its first open-source project—FlashMLA.
The project garnered over 3.3k stars within just three hours of its launch! The number of stars is rapidly skyrocketing.
This is an efficient MLA decoding kernel developed by DeepSeek specifically for Nvidia's Hopper GPU, optimized especially for variable-length sequences. It has now officially been put into production.