Text Generation by Diffusion Mercury Coder uses diffusion to generate text

Published

Mar 05, 2025

Reading time

2 min read

Typical large language models are autoregressive, predicting the next token, one at a time, from left to right. A new model hones all text tokens at once.

What’s new: Inception Labs, a Silicon Valley startup, emerged from stealth mode with Mercury Coder, a diffusion model that generates code, in small and mini versions. Registered users can try it out here, and an API (sign up for early access here) and on-premises deployments are in the works. The company has not yet announced availability and pricing.

How it works: Like image diffusion models, Mercury Coder improves its output over a number of steps by removing noise.

Inception Labs shared little information about the model, leaving details including parameter count, input size and output size, training data, and training methods undisclosed.
An October 2023 paper co-authored by an Inception Labs co-founder describes training a text diffusion model using score entropy. The model learned to estimate the transition ratio between two tokens; that is, the probability that token y is correct over the probability that the current token x is correct.
In their most successful experiments, the authors added noise to tokens by progressively masking an ever-greater percentage of tokens at random over several steps.
At inference, the model started with masked tokens and unmasked them over a number of steps. The estimated transition ratio determined how to change each token at each step.

Results: Mercury Coder’s major advantage is speed, but it also performs well compared to several competitors.

The Small and Mini versions are 3.5 to 18 times faster than comparable small coding models. Running on an Nvidia H100 graphics processing unit, Mercury Coder Small generates 737 tokens per second and Mercury Coder Mini generates 1,109 tokens per second. In comparison, Qwen 2.5 Coder 7B generates 207 tokens per second and GPT 4o-Mini generates 59 tokens per second.
On coding tasks across six benchmarks, Mercury Coder Small outperforms Gemini 2.0 Flash-Lite, Claude 3.5 Haiku, GPT-4o Mini, and Qwen 2.5 Coder 7B on at least four. Mercury Coder Mini beats those models on at least two. Both versions of Mercury Coder lost to DeepSeek Coder V2 Lite on all six benchmarks.

Behind the news: Several teams have built diffusion models that generate text, but previous efforts have not been competitive with autoregressive large language models (LLMs). Recently, LLaDA showed comparable performance to Meta’s Llama 2 7B but fell short of Llama 3 8B and other similarly sized modern LLMs.

Why it matters: Text diffusion models are already faster than autoregressive models. They offer significant promise to accelerate text generation even further.

We’re thinking: Diffusion image generators have delivered good output with as little as four or even one step, generating output tokens significantly faster than autoregressive models. If text diffusion models can benefit from improvements in image generation, they could lead to rapid generation of lengthy texts and, in turn, faster agents and reasoning.

Subscribe to The Batch