Higher Performance, Lower Prices AI model prices drop as competition heats up

Published

Aug 14, 2024

Reading time

2 min read

Prices for access to large language models are falling as providers exploit new efficiencies and compete for new customers.

What’s new: Open AI cut the price of calls to GPT-4o’s API by 50 percent for input tokens and 33 percent for output tokens, with an even steeper discount for asynchronous processing. Not to be outdone, Google cut the price of API calls to Gemini 1.5 Flash by approximately 75 percent.

How it works: The latest price reductions follow a steady trend, tracked by Smol.ai CEO Shawn Wang, in which providers are charging less even as model performance (as measured by LMSys’s Chatbot Arena Leaderboard Elo ratings) rises. Here’s a list of recent prices in order of each model’s rank on the leaderboard as of this writing:

The latest version of GPT-4o, which now underpins the top-ranked ChatGPT, costs $2.50/$10 per million input/output tokens. That’s substantial discount from the previous $5/$15 per million input/output tokens. And the price is half as much for batch processing of up to 50,000 requests in a single file with a 24-hour turnaround.
The recently released GPT-4o mini, which ranks third on the leaderboard, costs much less at $0.15/$0.60 per million tokens input/output, with the same 50 percent discount for batch processing.
Llama 3.1 405B, which was released in July and ranks fifth, is available for $2.70/$2.70 million input/output tokens from DeepInfra. That’s around 66 percent less than Azure charges.
Gemini 1.5 Flash, which ranks 18th, costs $0.15/$0.60 per million input/output tokens after the new price cut. There’s a 50 percent discount for inputs and outputs smaller than 128,000 tokens (or submitted in batch mode). There’s also a generous free tier.
DeepSeek v2, in 19th place, costs $0.14/$0.28 per million tokens input/output. That’s 46 percent less than when the model was released in late July.

Behind the news: Less than six months ago, cutting-edge large language models like GPT-4, Claude 2, Gemini 1.0, Llama 2, and Mistral Large were less capable and more expensive than their current versions. For instance, GPT-4 costs $30/$60 per million tokens input/output. Since then, models have notched higher benchmark performances even prices have fallen. The latest models are also faster, have larger context windows, support a wider range of input types, and do better at complex tasks such as agentic workflows.

Why it matters: Competition is fierce to provide the most effective and efficient large language models, offering an extraordinary range of price and performance to developers. Makers of foundation models that can’t match the best large models in performance or the best small models in cost are in a tight corner.

We’re thinking: What an amazing time to be developing AI applications! You can choose among models that are open or closed, small or large, faster or more powerful in virtually any combination. Everyone is competing for your business!

Subscribe to The Batch