AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
Model Dissection: Exploring the Inside of a Model
Copy link
Facebook
Email
Notes
More

Model Dissection: Exploring the Inside of a Model

Explore the inner workings of Transformer models, understand the storage and function of model files, and learn about the roles of weights and biases. Dive into model visualization and capacity insigh

Meng Li's avatar
Meng Li
Jul 31, 2024
∙ Paid
2

Share this post

AI Disruption
AI Disruption
Model Dissection: Exploring the Inside of a Model
Copy link
Facebook
Email
Notes
More
1
Share

Welcome to the "Practical Application of AI Large Language Model Systems" Series

Table of Contents

Table of Contents

Meng Li
·
June 7, 2024
Read full story

Last class, we manually implemented a Transformer model. The final trained model had around 120 million parameters and a file size of about 505MB. In this lesson, we'll explore an intriguing question: what exactly is stored inside this 505MB file?

Do you remember running Qwen2-7B locally a while ago?

The 7B model files are divided into 8 parts, and some versions have 5 files. Combined, these files are about 20GB. The 130B model files, which are its predecessor, total nearly 240GB. If you've wondered what's inside these large model files, you're not alone. When I first encountered large language models, I was very curious about this too.

Through continuous study, I have gained some understanding. Let's share this knowledge today.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More