A new company with deep roots in generative AI made an eye-catching debut.
What’s new: Black Forest Labs, home to alumni of Stability AI, released the Flux.1 family of text-to-image models under a variety of licenses including open options. The largest of them outperformed Stable Diffusion 3 Ultra, Midourney v6.0, and DALL·E 3 HD in the company’s internal qualitative tests.
How it works: The Flux.1 models are based on diffusion transformers that were trained using flow matching, a form of diffusion. Like other latent diffusion models, given text and a noisy image embedding, they learn to remove the noise. At inference, given text and an embedding of pure noise, they remove the noise in successive steps and render an image using a decoder that was trained for the purpose.
- Flux.1 pro, whose parameter count is undisclosed, is a proprietary model available via API. It costs roughly $0.055 per image, which falls between DALL·E 3 and Stable Diffusion 3 Medium, according to Artificial Analysis. You can try a demo here.
- Flux.1 [dev] is a 12 billion-parameter distillation of Flux.1 pro. Its weights are licensed for noncommercial use and available here. A demo is available here.
- Flux.1 schnell, also 12 billion parameters, is built for speed. It’s fully open under the Apache 2.0 license. You can download weights and code here and try a demo here.
Results: Black Forest Labs evaluated the models internally in qualitative tests. Given images produced by one of the Flux.1 family and a competitor, roughly 800 people judged which they preferred for various qualities. The two larger versions achieved high scores.
- Visual quality: Flux.1 pro and Flux.1 [dev] ranked first and second (1060 Elo and 1044 Elo respectively). Stable Diffusion 3 Ultra (1031 Elo) came in third.
- Prompt following: Flux.1 pro and Flux.1 [dev] took the top two spots (1048 Elo and 1035 Elo respectively). Midjourney v6.0 (1026 Elo) placed third.
- Rendering typography: Ideogram (1080 Elo) took the top honor. Flux.1 pro and Flux.1 dev came in second and third (1068 Elo and 1038 Elo respectively).
- As of this writing, Flux.1 [pro] and Flux.1 [dev] rank first and second on the Artificial Analysis Text to Image Arena Leaderboard. Flux.1 schnell ranks fifth behind Midjourney v6.1 and Stable Diffusion 3 Large.
Behind the news: The Black Forest Labs staff includes former core members of Stability AI, which lost many top employees in April. Black Forest CEO Robin Rombach co-authored the papers that introduced VQGAN, latent diffusion, adversarial diffusion distillation, Stable Diffusion XL, and Stable Video Diffusion.
Why it matters: Text-to-image models generally occupy three tiers: large commercial models like Midjourney v6, OpenAI DALL·E 3, and Adobe Firefly; offerings that are open-source to varying degrees like Stability AI’s Stable Diffusion 3 Medium; and smaller models that can run locally like Stable Diffusion’s Stable Diffusion XL Lightning. The Flux.1 suite checks all the boxes with high marks in head-to-head comparisons.
We’re thinking: In late 2022, Stability AI’s release of the open Stable Diffusion unleashed a wave of innovation. We see a similar wave building on the open versions of Flux.1.