OpenAI’s GPT-4.5 Goes Big OpenAI releases GPT-4.5, its most powerful non-reasoning model and maybe its last

Published
Reading time
3 min read
Table comparing GPT-4.5, GPT-4o, and o3-mini on GPQA, AIME 2024, MMLU, MMMU, and coding tests.
Loading the Elevenlabs Text to Speech AudioNative Player...

OpenAI launched GPT-4.5, which may be its last non-reasoning model.

What’s new: GPT-4.5 is available as a research preview. Unlike OpenAI’s recent models o1 and o3, GPT-4.5 is not fine-tuned to reason by generating a chain of thought, although the company hinted that it may serve as a basis of a reasoning model in the future. Instead, it’s a huge model that was trained using a huge amount of computation. As OpenAI’s biggest model to date, GPT-4.5 is very expensive to run, and the company is evaluating whether to offer it via API in the long term.

  • Input/output: text and images in, text out. Voice and video interactions may be available in future updates.
  • Availability/price: Via ChatGPT (currently ChatGPT Pro; soon ChatGPT Plus, Team, Enterprise, and Edu) and various APIs (Chat Completions, Assistants, and Batch). $75/$150 per million input/output tokens.
  • Features: Web search, function calling, structured output, streaming, system messages, canvas collaborative user interface.
  • Undisclosed: Parameter count, input and output size, architecture, training data, training methods.

How it works: OpenAI revealed few details about how GPT-4.5 was built. The model is bigger than GPT-4o, and it was pretrained and fine-tuned on more data using more computation — possibly 10x more, given OpenAI’s comment that “with every new order of magnitude of compute comes novel capabilities.”

  • The model was trained on a combination of publicly available data and data from partnerships and in-house datasets, including data generated by smaller models. Its knowledge cutoff is October 2023.
  • The data was filtered for quality, to eliminate personally identifying information, and to eliminate information that might contribute to proliferation of chemical, biological, radiological, and nuclear threats.
  • OpenAI developed unspecified techniques to scale up unsupervised pretraining, supervised fine-tuning, and alignment.

Performance: “This isn’t a reasoning model and won’t crush benchmarks,” OpenAI CEO Sam Altman warned in a tweet. The company claims that GPT-4.5 offers improved general knowledge, adheres to prompts with more nuance, delivers greater creativity, and has higher emotional intelligence.

  • GPT-4.5 shows less propensity to hallucinate, or confabulate information, than other OpenAI models. On PersonQA (questions that involve publicly available facts about people), GPT-4.5 achieved 78 percent accuracy compared to GPT-4o (28 percent accuracy) and o1 (55 percent accuracy). Moreover, GPT-4.5 achieved a hallucination rate (lower is better) of 0.19 compared to GPT-4o (0.52) and o1 (0.20).
  • Its performance on coding benchmarks is mixed. On SWE-Bench Verified, GPT-4.5 achieved a 38 percent pass rate, higher than GPT-4o (30.7 percent) but well below deep research (61 percent), an agentic workflow that conducts multi-step research on the internet. On SWE-Lancer Diamond, which evaluates full-stack software engineering tasks, GPT-4.5 solved 32.6 percent of tasks, outperforming GPT-4o (23.3 percent) and o3-mini (10.8 percent) but again lagging deep research (around 48 percent).

Behind the news: GPT-4.5’s release comes as OpenAI nears an announced transition away from developing separate general-knowledge and reasoning models. The launch also comes as OpenAI faces an ongoing shortage of processing power. CEO Sam Altman said that the company is “out of GPUs” and struggling to meet demand — a constraint that may impact whether OpenAI continues to offer GPT-4.5 via API.

Why it matters: GPT-4.5 highlights a growing divide in AI research over whether to pursue performance gains by scaling up processing during pretraining or inference. Despite the success of approaches that consume extra processing power at inference, such as agentic techniques and reasoning models such as its own o family, OpenAI clearly still sees value in pretraining larger and larger models.

We’re thinking: There’s still more juice to be squeezed out of bigger models! We’re excited to see what the combination of additional compute applied to both pretraining and inference can achieve.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox