AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
DeepSeek Launches JanusFlow: A 1.3B Model Unifying Visual Understanding and Generation
Copy link
Facebook
Email
Notes
More

DeepSeek Launches JanusFlow: A 1.3B Model Unifying Visual Understanding and Generation

JanusFlow unifies visual understanding and generation in a 1.3B LLM, integrating vision encoders and Rectified Flow for multimodal AI breakthroughs.

Meng Li's avatar
Meng Li
Nov 22, 2024
∙ Paid
2

Share this post

AI Disruption
AI Disruption
DeepSeek Launches JanusFlow: A 1.3B Model Unifying Visual Understanding and Generation
Copy link
Facebook
Email
Notes
More
1
Share

In the field of multimodal AI, methods based on pre-trained vision encoders and MLLMs (e.g., the LLaVA series) demonstrate exceptional performance in visual understanding tasks.

Meanwhile, models based on Rectified Flow (e.g., Stable Diffusion 3 and its derivatives) have achieved significant breakthroughs in visual generation.

Is it possible to unify these two simple technical paradigms into a single model?

Research from DeepSeek, Peking University, the University of Hong Kong, and Tsinghua University suggests:

Directly integrating these two architectures within an LLM framework enables the effective unification of visual understanding and generation capabilities.

Model Architecture

In simple terms, JanusFlow combines the understanding framework based on a vision encoder and LLM with the generation framework based on Rectified Flow. This integration enables end-to-end training within a single LLM.

Key Design Features:

  1. Decoupled Vision Encoders: Separate optimization of understanding and generation capabilities.

  2. Representation Alignment: Use the understanding encoder to align features for the generation component, significantly improving RF training efficiency.

Using an LLM with 1.3B parameters, JanusFlow outperforms prior unified multimodal models of similar scale in both visual understanding and generation tasks.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More