The First Pure Attention-Free Large Model Surpasses Open-Source Giant Llama 3.1

Falcon Mamba 7B: A new open-source model challenging Transformer, handling infinite sequences on a single GPU. Now outperforming Llama 3.1.

Aug 13, 2024

∙ Paid

The Mamba architecture model challenges Transformer once again.

Is the Mamba model finally ready to "stand tall"? Since its first launch in December 2023, Mamba has been a strong competitor to Transformer.

Since then, models using the Mamba architecture have continued to emerge. For example, Mistral released Codestral 7B, the first open-source large model based on Mamba.

Today, the Abu Dhabi Technology Innovation Institute (TII) released a new open-source Mamba model—Falcon Mamba 7B.

Here are the highlights of Falcon Mamba 7B: It can handle sequences of any length without additional memory and can run on a single 24GB A10 GPU.

Falcon Mamba 7B is now available on Hugging Face. This model, using a causal decoder only, leverages the novel Mamba State Space Language Model (SSLM) architecture to tackle various text generation tasks.

In terms of performance, Falcon Mamba 7B outperforms other leading models of similar size on several benchmarks, including Meta's Llama 3 8B, Llama 3.1 8B, and Mistral 7B.

Continue reading this post for free, courtesy of Meng Li.

Or purchase a paid subscription.

AI Disruption

The First Pure Attention-Free Large Model Surpasses Open-Source Giant Llama 3.1

Falcon Mamba 7B: A new open-source model challenging Transformer, handling infinite sequences on a single GPU. Now outperforming Llama 3.1.

Continue reading this post for free, courtesy of Meng Li.