Large Language Models (LLMs)

101 Posts

GB10 Superchip architecture with Blackwell GPU and Grace CPU.
Large Language Models (LLMs)

AI Supercomputer on Your Desk: Nvidia introduced Project Digits, a $3,000 home supercomputer for mid-sized AI models

Nvidia’s new desktop computer is built specifically to run large AI models.
DeepSeek-V3 accuracy across benchmarks compared to other AI models.
Large Language Models (LLMs)

DeepSeek Ups the Open Weights Ante: DeepSeek-V3 redefines LLM performance and cost efficiency

A new model from Hangzhou upstart DeepSeek delivers outstanding performance and may change the equation for training costs.
Diagram of Localize-and-Stitch merging fine-tuned models by combining critical weights into one model.
Large Language Models (LLMs)

Better Performance From Merged Models: Localize-and-Stitch improves methods for merging and fine-tuning multiple models

Merging multiple fine-tuned models is a less expensive alternative to hosting multiple specialized models. But, while model merging can deliver higher average performance across several tasks, it often results in lower performance on specific tasks. New work addresses this issue.
A narrow library aisle filled with shelves stacked with countless books.
Large Language Models (LLMs)

Massively More Training Text: Harvard unveils a million-book corpus for AI training

Harvard University amassed a huge new text corpus for training machine learning models.
Claude 3 Opus performs the Self-Exfiltration task, balancing renewable goals and corporate priorities.
Large Language Models (LLMs)

Models Can Use Tools in Deceptive Ways: Researchers expose AI models' deceptive behaviors

Large language models have been shown to be capable of lying when users unintentionally give them an incentive to do so. Further research shows that LLMs with access to tools can be incentivized to use them in deceptive ways.
Top use cases for Claude.ai, with percentages for tasks like app development and content creation.
Large Language Models (LLMs)

What LLM Users Want: Anthropic reveals how users interact with Claude 3.5

Anthropic analyzed 1 million anonymized conversations between users and Claude 3.5 Sonnet. The study found that most people used the model for software development and also revealed malfunctions and jailbreaks.
JOSEPH GONZALEZ
Large Language Models (LLMs)

Joseph Gonzalez: General intelligence

In 2025, I expect progress in training foundation models to slow down as we hit scaling limits and inference costs continue to rise.
A hand holding a snow globe with skaters and a snowman.
Large Language Models (LLMs)

Smaller Is Beautiful: Compact AI models redefine efficiency, bringing advanced capabilities to everyday devices

For years, the best AI models got bigger and bigger. But in 2024, some popular large language models were small enough to run on a smartphone.
Sleigh rides sign with pricing adjustments and hot cocoa.
Large Language Models (LLMs)

Prices Tumble: AI price wars drive costs down as competition heats up

Fierce competition among model makers and cloud providers drove down the price of access to state-of-the-art models.
Animation showcasing 7 key NLP topics visually expanding on the screen.
Large Language Models (LLMs)

When LLMs Propose Research Ideas: Stanford study finds AI matches human experts at writing research proposals

How do agents based on large language models compare to human experts when it comes to proposing machine learning research? Pretty well, according to one study.
Performance comparison for Gemini models across benchmarks.
Large Language Models (LLMs)

Multimodal Modeling on the Double: Google introduces Gemini 2.0 Flash, a faster, more capable AI model

Google’s Gemini 2.0 Flash, the first member of its updated Gemini family of large multimodal models, combines speed with performance that exceeds that of its earlier flagship model, Gemini 1.5 Pro, on several measures.
Benchmark results for Phi-4, GPT, LLaMA-3.3, and Qwen 2.5 models.
Large Language Models (LLMs)

Phi-4 Beats Models Five Times Its Size: Microsoft’s Phi-4 blends synthetic and organic data to surpass larger models in math and reasoning benchmarks

Microsoft updated its smallest model family with a single, surprisingly high-performance model.
Graph showing how training loss affects token prediction accuracy and hallucination elimination.
Large Language Models (LLMs)

Getting the Facts Right: A memory method that reduces hallucinations in LLMs

Large language models that remember more hallucinate less.
o1 Family Benchmarks comparing pass rates across AIME, Codeforces, and GPQA.
Large Language Models (LLMs)

Higher Reasoning: OpenAI debuts o1 and pro mode for $200/month

OpenAI launched not only its highly anticipated o1 model but also an operating mode that enables the model to deliver higher performance — at a hefty price.
Table comparing model performance on Mathvista, MMMU, ChartQA, DocVQA, and other tasks.
Large Language Models (LLMs)

Mistral’s Vision-Language Contender: Mistral unveils Pixtral Large, a rival to top vision-language models

Mistral AI unveiled Pixtral Large, which rivals top models at processing combinations of text and images.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox