AI Disruption

AI Disruption

Share this post

AI Disruption
AI Disruption
ByteDance Open-Sources GPT-4o-Level Image Generation Capabilities!
Copy link
Facebook
Email
Notes
More

ByteDance Open-Sources GPT-4o-Level Image Generation Capabilities!

ByteDance open-sources BAGEL, a GPT-4o-level multimodal AI for image generation, editing & 3D synthesis. Outperforms SD3 & Gemini 2.0.

Meng Li's avatar
Meng Li
May 24, 2025
∙ Paid
1

Share this post

AI Disruption
AI Disruption
ByteDance Open-Sources GPT-4o-Level Image Generation Capabilities!
Copy link
Facebook
Email
Notes
More
1
Share

"AI Disruption" Publication 6600 Subscriptions 20% Discount Offer Link.


ByteDance has been aggressively open-sourcing lately…

This time, they’ve directly open-sourced image generation capabilities on par with GPT-4o.

Bytedance released Multimodal model Bagel with image gen capabilities like  Gpt 4o : r/StableDiffusion

But that’s not all. Their latest integrated multimodal model, BAGEL, aims for “grand unification,” consolidating functions like image-based reasoning, image editing, and 3D generation into a single model.

Various fancy use cases include:

Despite having only 7B active parameters (14B total), it has already achieved top performance in image understanding, generation, and editing, surpassing or matching leading open-source models (like Stable Diffusion 3, FLUX.1) and closed-source models (like GPT-4o, Gemini 2.0).

Upon release, the model not only quickly topped the Hugging Face trending list but also sparked heated discussions on 𝕏.

An OpenAI researcher publicly praised it, stating that ByteDance’s Seed team has firmly secured a spot among top-tier labs in his view.

Alright, let’s dive into what the BAGEL model can do.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Meng Li
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More