AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
The Untold Story: Small Models Behind Every Successful Large AI Model

The Untold Story: Small Models Behind Every Successful Large AI Model

Explore the crucial role of small models in AI, from powering large models to optimizing performance. Discover why small models are key to big AI success.

Meng Li's avatar
Meng Li
Aug 20, 2024
∙ Paid
3

Share this post

AI Disruption
AI Disruption
The Untold Story: Small Models Behind Every Successful Large AI Model
1
Share
Applications of LLM Agents in various industries

Today, I’m sharing some thoughts on the differences between large and small models.

First, let's consider why Qwen2 is currently the most popular open-source model.

To be honest, compared to the detailed reports from DeepSeek, LLaMA, and MiniCPM, Qwen2's report feels a bit lacking, as it doesn't cover key technical details.

However, the comprehensive "all-in-one" package Qwen2 offers to the open-source community is something no lengthy report can match.

For LLM researchers, the value of a cluster of smaller LLMs, derived from the same tokenizer and 7T pretraining data, far exceeds that of Qwen2-72B itself!

Now, let's move forward with two key concepts:

  • Homologous small models: These are smaller-sized LLMs trained with the same tokenizer and data.

  • Small models: The focus here is on their size—they are fast to infer or are purely classification models, regardless of the training method. Examples include small-sized LLMs, BERT, RoBERTa, XGBoost, and LR.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share