Large Language Models (LLMs)

111 Posts

Illustration of two men staring intensely at each other against a red and yellow background, symbolizing rivalry.
Large Language Models (LLMs)

Musk Complicates OpenAI’s Plan: Elon Musk’s $97.4B bid for OpenAI rejected, fueling AI power struggle

Elon Musk and a group of investors made an unsolicited bid to buy the assets of the nonprofit that controls OpenAI, complicating the AI powerhouse’s future plans.
A person typing a prompt in an AI-powered mobile app with a button to improve the input.
Large Language Models (LLMs)

Mobile Apps to Order: Replit’s agent-powered mobile app expands to full app development

Replit, an AI-driven integrated development environment, updated its mobile app to generate further mobile apps to order.
AI model comparison on reasoning and test-time compute across math, science, and coding benchmarks.
Large Language Models (LLMs)

Grok 3 Scales Up: Grok 3, xAI’s new model family, improves on its predecessors, adds reasoning

xAI’s new model family suggests that devoting more computation to training remains a viable path to building more capable AI.
AI model leaderboard comparing performance across tasks like math, vision, and document analysis.
Large Language Models (LLMs)

Alibaba’s Answer to DeepSeek: Alibaba debuts Qwen2.5-VL, a powerful family of open vision-language models

While Hangzhou’s DeepSeek flexed its muscles, Chinese tech giant Alibaba vied for the spotlight with new open vision-language models.
ChatGPT interface drafting a research report on retail trends, including AI, e-commerce, and inflation.
Large Language Models (LLMs)

Agents Go Deep: OpenAI’s Deep Research agent generates detailed reports by analyzing web sources

OpenAI introduced a state-of-the-art agent that produces research reports by scouring the web and reasoning over what it finds.
Line charts showing performance improvements in math and science with 2.0 Flash Thinking models.
Large Language Models (LLMs)

Gemini Thinks Faster: Google’s Gemini 2.0 Flash Thinking advances in reasoning, outperforms DeepSeek-R1

Google updated the December-vintage reasoning model Gemini 2.0 Flash Thinking and other Flash models, gaining ground on OpenAI o1 and DeepSeek-R1.
Flowchart illustrating the automation of opening, editing, and saving a Word document using PyAutoGUI.
Large Language Models (LLMs)

Training for Computer Use: UI-TARS shows strong computer use capabilities in benchmarks

As Anthropic, Google, OpenAI, and others roll out agents that are capable of computer use, new work shows how underlying models can be trained to do this.
Bar chart animation showing accuracy improvements in AIME 2024 competition math models.
Large Language Models (LLMs)

Reasoning in High Gear: o3-mini, a faster, more affordable reasoning model for coding, math, and science

OpenAI introduced a successor to its o1 models that’s faster, less expensive, and especially strong in coding, math, and science.
Bar chart comparing active vs. random sampling effects on length, diversity, and toxicity after fine-tuning.
Large Language Models (LLMs)

Fine-Tuning Fine Points: Active inheritance, a smarter way to fine-tune models on synthetic data

The practice of fine-tuning models on synthetic data is becoming well established. But synthetic training data, even if it represents the training task well, may include characteristics like toxicity that impart unwelcome properties in the trained model’s output...
Diagram of a reinforcement learning system for training LLMs, showing data and weight flow processes.
Large Language Models (LLMs)

Reinforcement Learning Heats Up: How DeepSeek-R1 and Kimi k1.5 use reinforcement learning to improve reasoning

Reinforcement learning is emerging as an avenue for building large language models with advanced reasoning capabilities.
Bar chart comparing accuracy and percentile scores of DeepSeek models and OpenAI models across benchmarks.
Large Language Models (LLMs)

DeepSeek Sharpens Its Reasoning: DeepSeek-R1, an affordable rival to OpenAI’s o1

A new open model rivals OpenAI’s o1, and it’s free to use or modify.
DeepSeek-V3 accuracy across benchmarks compared to other AI models.
Large Language Models (LLMs)

DeepSeek Ups the Open Weights Ante: DeepSeek-V3 redefines LLM performance and cost efficiency

A new model from Hangzhou upstart DeepSeek delivers outstanding performance and may change the equation for training costs.
Diagram of Localize-and-Stitch merging fine-tuned models by combining critical weights into one model.
Large Language Models (LLMs)

Better Performance From Merged Models: Localize-and-Stitch improves methods for merging and fine-tuning multiple models

Merging multiple fine-tuned models is a less expensive alternative to hosting multiple specialized models. But, while model merging can deliver higher average performance across several tasks, it often results in lower performance on specific tasks. New work addresses this issue.
A narrow library aisle filled with shelves stacked with countless books.
Large Language Models (LLMs)

Massively More Training Text: Harvard unveils a million-book corpus for AI training

Harvard University amassed a huge new text corpus for training machine learning models.
Claude 3 Opus performs the Self-Exfiltration task, balancing renewable goals and corporate priorities.
Large Language Models (LLMs)

Models Can Use Tools in Deceptive Ways: Researchers expose AI models' deceptive behaviors

Large language models have been shown to be capable of lying when users unintentionally give them an incentive to do so. Further research shows that LLMs with access to tools can be incentivized to use them in deceptive ways.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox