Mini but Mighty OpenAI’s GPT-4o Mini offers big performance at a small price

Published

Jul 24, 2024

Reading time

2 min read

A slimmed-down version of Open AI’s multimodal flagship packs a low-price punch.

What’s new: OpenAI released GPT-4o mini, a smaller text-image-video-audio generative model that, according to the company, generally outperforms models from Google and Anthropic models of similar size at a lower price for API access. It newly underpins the free version of ChatGPT.

How it works: GPT-4o mini currently accepts text and image inputs and outputs text. Image output as well as video and audio input/output are coming soon. OpenAI did not provide information about its architecture or training but told TechCrunch it’s roughly the size of Claude 3 Haiku, Gemini 1.5 Flash, and the 8-billion-parameter version of Llama 3. It has a context window of 128,000 tokens and can output up to around 16,400 tokens.

API access to GPT-4o mini, which costs $0.15/$0.60 per 1 million input/output tokens. That’s significantly less than the more capable GPT-4o ($5/$15 per 1 million input/output tokens with the same context window). It’s also more cost-effective and significantly better performing than GPT-3.5 Turbo ($0.50/$1.50 per 1 million input/output tokens with a 16,000-token context window).
On the MMLU language understanding benchmark, GPT-4o mini beats Gemini 1.5 Flash at a lower cost, according to tests by Artificial Analysis. It’s just behind Llama 3 70B and Reka Core but costs around half as much as the former and 1/20th as much as the latter.
GPT-4o mini (which generates 108 tokens per second) is slower than Llama 3 8B (166 tokens per second), Gemini 1.5 Flash (148 tokens per second), and Claude 3 Haiku (127 tokens per second) according to Artificial Analysis. However, GPT-4o mini speeds past GPT-4o, which produces 63 tokens per second.

Behind the news: GPT-4o mini part of a July wave of smaller large language models.

Mistral and Nvidia jointly released Mistral NeMo (12 billion parameters). Its context window is 128,000 tokens, equal to GPT-4o mini and larger than most models of its size. It’s available under the Apache 2.0 open source license.
Hugging Face debuted SmolLM, a family of three even smaller models — 135 million, 362 million, and 1.71 billion parameters — designed to run on mobile devices. The base and instruction-tuned versions including weights are freely available for download with no restrictions on commercial use. SmolLM is licensed under Apache 2.0.

Why it matters: Powerful multimodal models are becoming ever more widely available at lower prices, creating opportunities for developers and researchers alike. GPT-4o mini sets a new standard for others to beat. Its price may be especially appealing to developers who aim to build agentic workflows that require models to process large numbers of tokens on their way to producing output.

We’re thinking: Not long ago, pushing the edge of large language models meant making them larger, with higher computing costs to drive rising parameter counts. But building bigger models has made it easier to develop smaller models that are more cost-effective and nearly as capable. It’s a virtuous circle: Costs fall and productivity rises to everyone’s benefit.

Subscribe to The Batch