Large Language Models (LLMs)

118 Posts

GIF of AI-assisted art: A landscape is edited, a cyborg sketch turns photorealistic, and a cat reads a newspaper, showing human input for copyright
Large Language Models (LLMs)

Some AI-Generated Works Are Copyrightable: U.S. Copyright Office says that no new laws are needed for AI-generated works

The United States Copyright Office determined that existing laws are sufficient to decide whether a given AI-generated work is protected by copyright, making additional legislation unnecessary.
AI model performance benchmark comparing R1 1776 and DeepSeek-R1 across MMLU, DROP, MATH-500, and AIME 2024 tests.
Large Language Models (LLMs)

DeepSeek-R1 Uncensored: Perplexity launches uncensored version of DeepSeek-R1

Large language models built by developers in China may, in some applications, be less useful outside that country because they avoid topics its government deems politically sensitive. A developer fine-tuned DeepSeek-R1 to widen its scope without degrading its overall performance.
QwQ-32B vs DeepSeek-R1 AI model performance benchmark across AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL datasets.
Large Language Models (LLMs)

Compact Reasoning: QwQ-32B challenges DeepSeek-R1 and other larger reasoning models

Most models that have learned to reason via reinforcement learning were huge models. A much smaller model now competes with them.
Table comparing Claude 3.7, 3.5, o1, o3-mini, DeepSeek R1, and Grok 3 Beta on reasoning, coding, tools, visuals, and math.
Large Language Models (LLMs)

Budget for Reasoning to the Token: Claude 3.7 Sonnet adds extended thinking mode

Anthropic’s Claude 3.7 Sonnet implements a hybrid reasoning approach that lets users decide how much thinking they want the model to do before it renders a response.
Table comparing GPT-4.5, GPT-4o, and o3-mini on GPQA, AIME 2024, MMLU, MMMU, and coding tests.
Large Language Models (LLMs)

OpenAI’s GPT-4.5 Goes Big: OpenAI releases GPT-4.5, its most powerful non-reasoning model and maybe its last

OpenAI launched GPT-4.5, which may be its last non-reasoning model.
Table comparing AI models on throughput, HumanEval, MBPP, EvalPlus, MultiPL-E, and code completion.
Large Language Models (LLMs)

Text Generation by Diffusion: Mercury Coder uses diffusion to generate text

Typical large language models are autoregressive, predicting the next token, one at a time, from left to right. A new model hones all text tokens at once.
Diagram of Coconut, a method training LLMs to process thought chains as vectors, comparing it to Chain-of-Thought (CoT).
Large Language Models (LLMs)

Reasoning in Vectors, Not Text: Meta introduces Chain of Continuous Thought (Coconut) to improve next-token prediction

Although large language models can improve their performance by generating a chain of thought (CoT) — intermediate text tokens that break down the process of responding to a prompt into a series of steps.
Illustration of two men staring intensely at each other against a red and yellow background, symbolizing rivalry.
Large Language Models (LLMs)

Musk Complicates OpenAI’s Plan: Elon Musk’s $97.4B bid for OpenAI rejected, fueling AI power struggle

Elon Musk and a group of investors made an unsolicited bid to buy the assets of the nonprofit that controls OpenAI, complicating the AI powerhouse’s future plans.
A person typing a prompt in an AI-powered mobile app with a button to improve the input.
Large Language Models (LLMs)

Mobile Apps to Order: Replit’s agent-powered mobile app expands to full app development

Replit, an AI-driven integrated development environment, updated its mobile app to generate further mobile apps to order.
AI model comparison on reasoning and test-time compute across math, science, and coding benchmarks.
Large Language Models (LLMs)

Grok 3 Scales Up: Grok 3, xAI’s new model family, improves on its predecessors, adds reasoning

xAI’s new model family suggests that devoting more computation to training remains a viable path to building more capable AI.
AI model leaderboard comparing performance across tasks like math, vision, and document analysis.
Large Language Models (LLMs)

Alibaba’s Answer to DeepSeek: Alibaba debuts Qwen2.5-VL, a powerful family of open vision-language models

While Hangzhou’s DeepSeek flexed its muscles, Chinese tech giant Alibaba vied for the spotlight with new open vision-language models.
ChatGPT interface drafting a research report on retail trends, including AI, e-commerce, and inflation.
Large Language Models (LLMs)

Agents Go Deep: OpenAI’s Deep Research agent generates detailed reports by analyzing web sources

OpenAI introduced a state-of-the-art agent that produces research reports by scouring the web and reasoning over what it finds.
Line charts showing performance improvements in math and science with 2.0 Flash Thinking models.
Large Language Models (LLMs)

Gemini Thinks Faster: Google’s Gemini 2.0 Flash Thinking advances in reasoning, outperforms DeepSeek-R1

Google updated the December-vintage reasoning model Gemini 2.0 Flash Thinking and other Flash models, gaining ground on OpenAI o1 and DeepSeek-R1.
Flowchart illustrating the automation of opening, editing, and saving a Word document using PyAutoGUI.
Large Language Models (LLMs)

Training for Computer Use: UI-TARS shows strong computer use capabilities in benchmarks

As Anthropic, Google, OpenAI, and others roll out agents that are capable of computer use, new work shows how underlying models can be trained to do this.
Bar chart animation showing accuracy improvements in AIME 2024 competition math models.
Large Language Models (LLMs)

Reasoning in High Gear: o3-mini, a faster, more affordable reasoning model for coding, math, and science

OpenAI introduced a successor to its o1 models that’s faster, less expensive, and especially strong in coding, math, and science.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox