ByteDance Open-Sources GPT-4o-Level Image Generation Capabilities!
ByteDance open-sources BAGEL, a GPT-4o-level multimodal AI for image generation, editing & 3D synthesis. Outperforms SD3 & Gemini 2.0.
"AI Disruption" Publication 6600 Subscriptions 20% Discount Offer Link.
ByteDance has been aggressively open-sourcing lately…
This time, they’ve directly open-sourced image generation capabilities on par with GPT-4o.
But that’s not all. Their latest integrated multimodal model, BAGEL, aims for “grand unification,” consolidating functions like image-based reasoning, image editing, and 3D generation into a single model.
Various fancy use cases include:
Despite having only 7B active parameters (14B total), it has already achieved top performance in image understanding, generation, and editing, surpassing or matching leading open-source models (like Stable Diffusion 3, FLUX.1) and closed-source models (like GPT-4o, Gemini 2.0).
Upon release, the model not only quickly topped the Hugging Face trending list but also sparked heated discussions on 𝕏.
An OpenAI researcher publicly praised it, stating that ByteDance’s Seed team has firmly secured a spot among top-tier labs in his view.
Alright, let’s dive into what the BAGEL model can do.