AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
Bitnet.cpp: Microsoft Open-Sources 1-Bit Inference Framework, Running 100B Models on CPUs

Bitnet.cpp: Microsoft Open-Sources 1-Bit Inference Framework, Running 100B Models on CPUs

Microsoft's Bitnet.cpp open-source 1-bit inference framework enables running 100B models on CPUs 6x faster, reducing power use by up to 82%.

Meng Li's avatar
Meng Li
Oct 22, 2024
∙ Paid
3

Share this post

AI Disruption
AI Disruption
Bitnet.cpp: Microsoft Open-Sources 1-Bit Inference Framework, Running 100B Models on CPUs
1
Share

A few days ago, Microsoft open-sourced Bitnet.cpp, which is big news.

In short, this means you can now run massive models, like the 72B model or even larger, on a CPU.

The reason is that the precision of these models has been reduced to 1-bit. Bitnet.cpp is a highly efficient 1-bit LLM inference framework, allowing you to run 100B models on local devices up to six times faster while reducing power consumption by 82.2%.

m2_performance
m2_performance

In the future, Bitnet.cpp will also support NPUs and GPUs. On ARM CPUs, it boosts model speed by 1.37 to 5.07 times, while reducing power consumption by 55.4% to 70%.

On x86 CPUs, the acceleration ranges from 2.37 to 6.17 times, and power consumption decreases by 71.9% to 82.2%.

This framework can run the 100B bitnet_b1_58 model (a 1-bit quantized language model within the BitNet framework) at a rate of 5-7 tokens per second on a single CPU.

This performance, comparable to human reading speed, significantly enhances the potential to run large language models on local devices.

You can now install it on Ubuntu Linux, Windows, or MacOS. Also, Bitnet.cpp supports both x86 and ARM architectures.

So, let’s see how it works.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share