Large Language Models (LLMs)

125 Posts

Diagram of latent transformer model using byte-level encoding, patching, and cross-attention for next-byte prediction.
Large Language Models (LLMs)

Toward LLMs That Understand Misspellings: New byte-based model beats Llama 3 on spelling, noise, and translation

Researchers built a model that’s more robust to noisy inputs like misspellings, smarter about character-level information like the number of R's in strawberry, and potentially better able to understand unfamiliar languages that might share groups of letters with familiar languages.
Illustration of a businessman in a blue suit sitting alone at the head of a long boardroom table with black chairs.
Large Language Models (LLMs)

The Fall and Rise of Sam Altman: Inside Sam Altman’s brief ouster from OpenAI

A behind-the-scenes account provides new details about the abrupt firing and reinstatement of OpenAI CEO Sam Altman in November 2023.
Diagram of Modal Context Protocol showing MCP client-server architecture, APIs, and local and remote data sources.
Large Language Models (LLMs)

Open Standard for Tool Use and Data Access Gains Momentum: OpenAI adopts Model Context Protocol to boost LLM tool integration

OpenAI embraced Model Context Protocol, providing powerful support for an open standard that connects large language models to tools and data.
AI benchmark comparison chart showing Gemini 2.5 Pro, GPT-4.5, Claude, Grok, and others across science, math, code, and reasoning.
Large Language Models (LLMs)

Google Unveils Gemini 2.5: Google’s Gemini 2.5 Pro Experimental outperforms top AI models

Google’s new flagship model raised the state of the art in a variety of subjective and objective tests.
Llama 4 Behemoth benchmark chart comparing coding, reasoning, and multilingual scores with Claude, Gemini, and GPT-4.5.
Large Language Models (LLMs)

Llama’s Mixture of Vision-Language Experts: Meta releases Llama 4 models, claims edge over AI competitors

Meta updated its popular open-weights models, claiming performance superior to closed competitors in three size classes.
AI tutoring system interface showing real-time context integration, privacy, and expert-like feedback generation.
Large Language Models (LLMs)

LLM Support for Tutors: GPT-4 boosts remote tutors’ performance in real time, study finds

Students benefit from tutoring, but training tutors is expensive. A study shows that large language models can boost tutors’ effectiveness in real time.
Comparison table of Gemini and Gemma models across benchmarks like MMLU, MATH, and GPQA with radar charts.
Large Language Models (LLMs)

Vision-Language, Compact and Open: Google releases Gemma 3 vision-language models with open weights

Google updated its open-weights family of large language models to include versions that handle image and video inputs.
GIF of AI-assisted art: A landscape is edited, a cyborg sketch turns photorealistic, and a cat reads a newspaper, showing human input for copyright
Large Language Models (LLMs)

Some AI-Generated Works Are Copyrightable: U.S. Copyright Office says that no new laws are needed for AI-generated works

The United States Copyright Office determined that existing laws are sufficient to decide whether a given AI-generated work is protected by copyright, making additional legislation unnecessary.
AI model performance benchmark comparing R1 1776 and DeepSeek-R1 across MMLU, DROP, MATH-500, and AIME 2024 tests.
Large Language Models (LLMs)

DeepSeek-R1 Uncensored: Perplexity launches uncensored version of DeepSeek-R1

Large language models built by developers in China may, in some applications, be less useful outside that country because they avoid topics its government deems politically sensitive. A developer fine-tuned DeepSeek-R1 to widen its scope without degrading its overall performance.
QwQ-32B vs DeepSeek-R1 AI model performance benchmark across AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL datasets.
Large Language Models (LLMs)

Compact Reasoning: QwQ-32B challenges DeepSeek-R1 and other larger reasoning models

Most models that have learned to reason via reinforcement learning were huge models. A much smaller model now competes with them.
Table comparing Claude 3.7, 3.5, o1, o3-mini, DeepSeek R1, and Grok 3 Beta on reasoning, coding, tools, visuals, and math.
Large Language Models (LLMs)

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode

Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.
Table comparing GPT-4.5, GPT-4o, and o3-mini on GPQA, AIME 2024, MMLU, MMMU, and coding tests.
Large Language Models (LLMs)

OpenAI’s GPT-4.5 Goes Big: OpenAI releases GPT-4.5, its most powerful non-reasoning model and maybe its last

OpenAI launched GPT-4.5, which may be its last non-reasoning model.
Table comparing AI models on throughput, HumanEval, MBPP, EvalPlus, MultiPL-E, and code completion.
Large Language Models (LLMs)

Text Generation by Diffusion: Mercury Coder uses diffusion to generate text

Typical large language models are autoregressive, predicting the next token, one at a time, from left to right. A new model hones all text tokens at once.
Diagram of Coconut, a method training LLMs to process thought chains as vectors, comparing it to Chain-of-Thought (CoT).
Large Language Models (LLMs)

Reasoning in Vectors, Not Text: Meta introduces Chain of Continuous Thought (Coconut) to improve next-token prediction

Although large language models can improve their performance by generating a chain of thought (CoT) — intermediate text tokens that break down the process of responding to a prompt into a series of steps.
Illustration of two men staring intensely at each other against a red and yellow background, symbolizing rivalry.
Large Language Models (LLMs)

Musk Complicates OpenAI’s Plan: Elon Musk’s $97.4B bid for OpenAI rejected, fueling AI power struggle

Elon Musk and a group of investors made an unsolicited bid to buy the assets of the nonprofit that controls OpenAI, complicating the AI powerhouse’s future plans.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox