The ‘Fastest Commercial-Grade’ Diffusion LLM is Available Now

The ‘Fastest Commercial-Grade’ Diffusion LLM is Available Now

Analytics India Magazine (Supreeth Koundinya)

Snippet: Mercury matches the performance of GPT-4.1 Nano and Claude 3.5 Haiku, running over seven times faster. 

Inception Labs, an AI startup based in the United States, has launched Mercury for public use, which the company claims is the fastest commercial-scale diffusion large language model (LLM). 

The model can be accessed on chat.inceptionlabs.ai, and third-party platforms like OpenRouter and Poe. 

We’re excited to launch Mercury, the first commercial-scale diffusion LLM tailored for chat applications!

Ultra-fast and efficient, Mercury brings real-time responsiveness to conversations, just like Mercury Coder did for code. pic.twitter.com/eJJFuvNLHl

— Inception (@InceptionAILabs) June 26, 2025

According to Artificial Analysis, an independent AI model evaluation platform, Mercury offers an output speed of over 700 tokens per second, significantly higher than the Gemini 2.5 Flash, which provides 344 tokens per second. Besides, it also offers performance comparable to OpenAI’s small model, the GPT-4.1 Nano, and Anthropic’s Claude 3.5 Haiku. 

Mercury is also available via a first-party API, with a cost of $0.25 to $1 per million input/output tokens.

Inception Labs announced Mercury in February and also recently published a technical report for the model. The model’s high-speed output is attributed to its diffusion architecture, which breaks away from the traditional language approach of models. 

The architecture, commonly referred to as Diffusion Language Models (Diffusion-LM), works differently from traditional language models that generate one word or token at a time. 

“This sequential process can be slow and limit the quality and coherence of the output,” Google said, while announcing their diffusion model, the Gemini Diffusion. 

“Diffusion models work differently. Instead of predicting text directly, they learn to generate outputs by refining noise, step by step. This means they can iterate on a solution very quickly and error correct during the generation process,” the company stated. 

The post The ‘Fastest Commercial-Grade’ Diffusion LLM is Available Now appeared first on Analytics India Magazine.

Generated by RSStT. The copyright belongs to the original author.

Source

Report Page