OpenAI introduced a successor to its o1 models that’s faster, less expensive, and especially strong in coding, math, and science.
What’s new: o3-mini is a large language model that offers selectable low, medium, and high levels of reasoning “effort.” These levels consume progressively higher numbers of reasoning tokens (specific numbers and methods are undisclosed), and thus greater time and cost, to generate a chain of thought. It’s available to subscribers to ChatGPT Plus, Team, and Pro, as well as to higher-volume users of the API (tiers 3 through 5). Registered users can try it via the free ChatGPT service by selecting “reason” in the message composer or selecting o3-mini before regenerating a response.
How it works: o3-mini’s training set emphasized structured problem-solving in science and technology fields, and fine-tuning used reinforcement learning on chain-of-thought (CoT) data. Like the o1 family, it charges for tokens that are processed during reasoning operations and hides them from the user. (Competing reasoning models DeepSeek-R1, Gemini 2.0 Flash Thinking, and QwQ-32B-Preview make these tokens available to users.) o3-mini has a maximum input of 200,000 tokens and a maximum output of 100,000 tokens. Its knowledge cutoff is October 2023.
- In OpenAI’s tests, o3-mini beat o1 and o1-mini on multiple benchmarks including math (AIME 2024), science (GPQA Diamond), and coding (Codeforces and LiveBench). It outperformed o1 by 1 to 4 percentage points when set at high or medium effort, and it outperformed o1-mini when set at low effort. It did significantly less well on tests of general knowledge, even with high effort. On MMLU (multiple-choice questions in many fields) and SimpleQA (questions about basic facts), o3-mini with high effort (which achieved 86.9 percent and 13.8 percent respectively) underperformed o1 (92.3 percent and 47 percent) and GPT-4o (88.7 percent and 39 percent).
- Unlike o1-mini, o3-mini supports function calling, structured outputs (JSON format), developer messages (system prompts that specify the model’s context or persona separately from user input), and streaming (delivering responses token-by-token in real time).
- API access costs $1.10/$4.40 per million input/output tokens with a discounted rate of $0.55 per million cached input tokens. OpenAI’s Batch API, which processes high-volume requests asynchronously, costs half as much. In comparison, access to o1 costs $15/$60 per million input/output tokens and o1-mini costs $3/$12 per million input/output tokens. (OpenAI recently removed API pricing for o1-mini and, in the ChatGPT model picker, replaced it with o3-mini, which suggests that o1-mini is being phased out.)
- OpenAI limits the number API calls users can make per minute and per day depending on how frequently they use the API and how much money they’ve spent. Rate limits range from 5,000/4 million requests/tokens per per minute (Tier 3) to 30,000/150 million requests/tokens per minute (Tier 5), with higher limits for batch requests.
- o3-mini’s system card highlights safety measures taken during the model’s training. OpenAI notes that o3-mini’s improved coding ability puts it at a medium risk for autonomous misuse, the first OpenAI model to be so flagged.
What they’re saying: Users praised o3-mini for its speed, reasoning, and coding abilities. They noted that it responds best to “chunkier” prompts with lots of context. However, due to its smaller size, it lacks extensive real-world knowledge and struggles to recall facts.
Behind the news: Days after releasing o3-mini, OpenAI launched deep research, a ChatGPT research agent based on o3. OpenAI had announced the o3 model family in December, positioning it as an evolution of its chain-of-thought approach. The release followed hard upon that of DeepSeek-R1, an open weights model that captivated the AI community with its high performance and low training cost, but OpenAI maintained that the debut took place on its original schedule.
Why it matters: o3-mini continues OpenAI’s leadership in language models and further refines the reasoning capabilities introduced with the o1 family. In focusing on coding, math, and science tasks, it takes advantage of the strengths of reasoning models and raises the bar for other model builders. In practical terms, it pushes AI toward applications in which it’s a reliable professional partner rather than a smart intern.
We’re thinking: We’re glad that o3-mini is available to users of ChatGPT’s free tier as well as paid subscribers and API users. The more users become familiar with how to prompt reasoning models, the more value they’ll deliver.