Large Language Models (LLMs)

197 Posts

Lines connect multiple Wikipedia globe logos, symbolizing data exchange and partnerships.

AI Giants Share Wikipedia’s Costs: Wikimedia Foundation strikes deals with Amazon, Meta, Microsoft, Mistral AI, and Perplexity

On its 25th anniversary, Wikipedia celebrated with high-profile deals to make its data easier for AI companies to train their models in exchange for financial support.

Diagram shows sales, campaign, social posts before and after LLM simulation feedback loops.

Large Language Models (LLMs)

Training For Engagement Can Degrade Alignment: Stanford Researchers coin “Moloch’s Bargain,” show fine-tuning can affect social values

Individuals and organizations increasingly use large language models to produce media that helps them compete for attention. Does fine-tuning LLMs to encourage engagement, purchases, or votes affect their alignment with social values? Researchers found that it does.

AI models’ performance shown in bars; GPT-5.2 highest at 51, reflecting updated benchmarks.

Large Language Models (LLMs)

Artificial Analysis Revamps Intelligence Index: Independent AI testing authority turns from saturated knowledge benchmarks to harder business tests

Artificial Analysis, which tests AI systems, updated the component evaluations in its Intelligence Index to better reflect large language models’ performance in real-world use cases.

Large Language Models (LLMs)

Apple’s Foundation Models Will Be Gemini: Apple announced a partnership with Google to power Siri and other AI features

Apple cut a multi-year deal with Google to use Gemini models as the basis of AI models that reside on Apple devices.

ChatGPT interface on a phone displays a conversation and a sponsored grocery ad at the bottom of the screen.

Large Language Models (LLMs)

ChatGPT Shows Ads: OpenAI tests advertisements for U.S. chatbot users in free and lower-cost tiers

AI has a new revenue stream, and it looks a lot like old web banner ads.

Matrix links queries to documents, illustrating embedding limits in representing relevance combinations.

Large Language Models (LLMs)

Retrieval Faces Hard Limits: Google and Johns Hopkins researchers show embedding models can’t search unlimited documents

Can your retriever find all the relevant documents for any query your users might enter? Maybe not, research shows.

Diagrams comparing LongCoT and Delethink environments show reasoning processes and context management.

Large Language Models (LLMs)

More Affordable Reasoning: Canadian researchers find capping context helps models better retrieve data

One way to improve a reasoning model’s performance is to let it produce a longer chain of thought. However, attending to ever-longer contexts can become expensive, and making that attention more efficient requires changes to a model’s architecture.

Dialogue displays a model revealing it answered incorrectly and wrote code against instructions.

Large Language Models (LLMs)

Teaching Models to Tell the Truth: OpenAI fine-tuned a version of GPT-5 to confess when it was breaking the rules

Large language models occasionally conceal their failures to comply with constraints they’ve been trained or prompted to observe. Researchers trained an LLM to admit when it disobeyed.

Sharon Zhou is pictured smiling confidently with her hands clasped, reflecting AI’s potential for community-building.

Large Language Models (LLMs)

Chatbots That Build Community by Sharon Zhou: Sharon Zhou of AMD on expanding chat to serve groups and connect us with other people

Next year, I’m excited to see AI break out of 1:1 relationships with each of us. In 2026, AI has the potential to bring people together and unite us with human connection, rather than polarize and isolate us. It’s about time for ChatGPT to enter your group chats.

Mice on a laptop keyboard explore, with code on screen; background features festive lights, presents.

Large Language Models (LLMs)

Agents Write Code Faster, Cheaper: Software developers used more versatile AI-powered tools to write code

Coding apps moved beyond autofill-style code completion to agentic systems that manage a wide range of software development tasks.

Snowman in Thinker pose on snowy landscape, with a person building it.

Large Language Models (LLMs)

Thinking Models Solve Bigger Problems: Reasoning models, beginning with OpenAI’s o1 and DeepSeek’s R1, transformed the industry

Think step by step. Explain your reasoning. Work backwards from the answer. As 2025 began, models executed these reasoning strategies only when prompted. Now most new large language models do it as a matter of course, improving performance across a wide range of tasks.

Diagram shows LLM training with encoders for images, audio, video; inference with galaxies, satellites.

Large Language Models (LLMs)

Adapting LLMs to Any Sort of Data: SEMI (Sample-Efficient Modality Integration) tackles new domains with few-shot examples

Enabling a pretrained large language model to process a data type other than text (say, images), possibly in a specialized domain (say, radiology), typically requires thousands to millions of examples that pair the other data (perhaps x-rays) with text.

A table compares GPT-5.2's benchmark scores to Claude Opus 4.5 and Gemini 3 Pro in various reasoning tasks.

Large Language Models (LLMs)

OpenAI’s Answer to Gemini 3: GPT-5.2 arrives, touting variable reasoning and coding performance

OpenAI launched GPT-5.2 only weeks after its CEO Sam Altman reportedly issued a “code red” alarm in response to Google's Gemini 3.

Table highlights Opus 4.5’s superior scores in coding and reasoning compared to other AI models.

Large Language Models (LLMs)

Claude Does More With Fewer Tokens: Claude Opus 4.5 retakes the coding crown at one-third the price of its predecessor

Claude Opus 4.5, the latest version of Anthropic’s flagship model, extends the earlier version’s strengths in coding, computer use, and agentic workflows while generating fewer tokens.

Diagram shows AI traits with pipelines for "evil" vs. "helpful" responses to user queries on animal treatment.

Large Language Models (LLMs)

Toward Steering LLM Personality: Persona Vectors allow model builders to identify and edit out sycophancy, hallucinations, and more

Large language models can develop character traits like cheerfulness or sycophancy during fine-tuning. Researchers developed a method to identify, monitor, and control such traits.

Large Language Models (LLMs)

AI Giants Share Wikipedia’s Costs: Wikimedia Foundation strikes deals with Amazon, Meta, Microsoft, Mistral AI, and Perplexity

Training For Engagement Can Degrade Alignment: Stanford Researchers coin “Moloch’s Bargain,” show fine-tuning can affect social values

Artificial Analysis Revamps Intelligence Index: Independent AI testing authority turns from saturated knowledge benchmarks to harder business tests

Apple’s Foundation Models Will Be Gemini: Apple announced a partnership with Google to power Siri and other AI features

ChatGPT Shows Ads: OpenAI tests advertisements for U.S. chatbot users in free and lower-cost tiers

Retrieval Faces Hard Limits: Google and Johns Hopkins researchers show embedding models can’t search unlimited documents

More Affordable Reasoning: Canadian researchers find capping context helps models better retrieve data

Teaching Models to Tell the Truth: OpenAI fine-tuned a version of GPT-5 to confess when it was breaking the rules

Chatbots That Build Community by Sharon Zhou: Sharon Zhou of AMD on expanding chat to serve groups and connect us with other people

Agents Write Code Faster, Cheaper: Software developers used more versatile AI-powered tools to write code

Thinking Models Solve Bigger Problems: Reasoning models, beginning with OpenAI’s o1 and DeepSeek’s R1, transformed the industry

Adapting LLMs to Any Sort of Data: SEMI (Sample-Efficient Modality Integration) tackles new domains with few-shot examples

OpenAI’s Answer to Gemini 3: GPT-5.2 arrives, touting variable reasoning and coding performance

Claude Does More With Fewer Tokens: Claude Opus 4.5 retakes the coding crown at one-third the price of its predecessor

Toward Steering LLM Personality: Persona Vectors allow model builders to identify and edit out sycophancy, hallucinations, and more

Subscribe to The Batch