Machine Learning Research

573 Posts

A graph shows assistant behavior shifting between helpful and role-playing, with conversation bubbles.

Assistants That Assist Consistently: Large language models can drift drift from helpful personas to harmful ones, but new research aims to stabilize them

Typically, large language models are trained to act as helpful, harmless, honest assistants. However, during long or emotionally charged conversations, traits can emerge that are less beneficial. Researchers devised a way to steady the assistant personas of LLMs.

A humanoid robot with teal and white elements handles metal parts in bins on a factory floor.

Machine Learning Research

Humanoid Robots Work Factory Floors: Agiliy Digits humanoid robots fetch and carry bins at a Schaeffler auto-parts factory, displacing humans into higher-level jobs

A small number of humanoid robots have made their way into industrial settings, where they’re roughly matching the cost of human labor and propelling some workers into higher-level roles.

GLM-5.1 excels in SWE-Bench Pro and Terminal-Bench 2.0, leading in coding and reasoning tests.

Machine Learning Research

GLM 5.1 Aims for Long-Running Tasks: Z.ai’s GLM 5.1 evaluates interim results and may change its approach hundreds of times before it delivers final output

Z.ai updated its flagship open-weights large language model to work autonomously on single tasks for up to eight hours.

Image depicts persona generator creating synthetic personas, with outputs analyzed for diversity metrics.

Machine Learning Research

Simulating Diverse Human Cohorts: Persona generation simulates human characters across a controllable range of points of view

If you want to understand how the public will respond to your offerings, large language models can simulate users who answer questions about capabilities, features, promotions, or prices.

Quilt map of U.S. states with varying shades, some states in black, illustrating AI legislation differences.

Machine Learning Research

US States Move Forward With AI Laws: Most states are regulating AI despite President Trump’s opposition to state-level laws

U.S. states are continuing to enact laws that regulate AI, despite President Trump’s efforts to discourage state-by-state legislation in favor of national laws.

Diagram showing AI-driven drug discovery process, from lung fibrosis data to molecule generation.

Machine Learning Research

Big Pharma Bets Big on AI: Pharmaceutical kingpin Eli Lilly gave Insilico $2.75 billion for AI-driven drug development

Generative AI has proven that it can produce text, images, audio, video, and code. The world’s most valuable pharmaceutical company is betting billions that it can produce drugs as well.

A table compares AI models, highlighting Muse Spark's performance across multimodal and health benchmarks.

Machine Learning Research

Life After Llama: With Muse Spark, Meta pivots away from its open-weights Llama strategy

Meta pivoted from its open-weights strategy to deliver a closed alternative.

GIF showing fluid simulation with patch jittering, followed by predicted vs. actual flows across time steps 0–38.

Machine Learning Research

How Liquids and Gases Behave: A dynamic fluids model appears to solve transformers’ pixellation problem

Simulating complex physical systems through traditional numerical methods is slow and expensive, and simulations based on machine learning are usually specialized for a specific type of system, such as water in a pipe or atmosphere surrounding a planet.

Diagram shows DNA analysis with interconnected devices, output types, and species-specific data.

Machine Learning Research

Dark DNA Unveiled: Google’s AlphaGenome interprets DNA that regulates genetic expression

An open-weights model could help scientists compare the impact of genetic variations, identify mutations that cause diseases, and develop treatments.

Table compares AI models' performance across benchmarks, showing Claude Mythos Preview leading.

Machine Learning Research

Claude Mythos Preview Raises Security Worries: Why Claude’s advanced Mythos Preview model will be limited-release-only

Anthropic took unusual steps to prepare the world for a forthcoming large language model that it said poses extraordinary risks to cybersecurity.

Two graphs show TTT-E2E maintains stable loss and latency across increasing context lengths up to 128k.

Machine Learning Research

Learning Long Context at Inference: Test-Time Training End-to-End (TTT-E2E) retrains model weights to handle long inputs

Large language models typically become less accurate and slower when they process longer contexts, but researchers enabled an LLM to keep accuracy stable and inference time constant as its context grew.

Text input in Google's Gemini app creates a comical R&B song, showcasing music generation capabilities.

Machine Learning Research

Gemini’s Music Generator: Google debuted Lyria 3, an app that turns text or images into 30-second songs

Google added a music generator to Gemini and YouTube, putting a model that produces synthetic songs in front of hundreds of millions of users.

A black box with a red symbol is open, revealing a glowing interior, symbolizing a security breach.

Machine Learning Research

Inside Claude Code: Claude Code’s source code leaked, exposing potential future features Kairos and autoDream

The inner workings of the popular coding agent Claude Code are available for all to see.

Top graph (blue) shows GPT-5 score drop; bottom graph (orange) shows RLM maintaining higher scores.

Machine Learning Research

Context As An External Variable: Recursive Language Models offer path to aramatically expand beyond the context window

When processing long contexts, large language models often lose track of details or devolve into nonsense. Researchers reduced these effects by managing context externally.

A dancing woman morphs into a black cat, then a monkey, and finally a pig under neon hexagon lights, illustrating xAI’s video generator capabilities.

Machine Learning Research

xAI’s Cost-Effective Video Generator: Grok Imagine 1.0 sharply cuts costs for high-quality video generation

xAI launched a video generator that topped an independent quality ranking at a fraction of competitors’ prices.

Machine Learning Research

Assistants That Assist Consistently: Large language models can drift drift from helpful personas to harmful ones, but new research aims to stabilize them

Humanoid Robots Work Factory Floors: Agiliy Digits humanoid robots fetch and carry bins at a Schaeffler auto-parts factory, displacing humans into higher-level jobs

GLM 5.1 Aims for Long-Running Tasks: Z.ai’s GLM 5.1 evaluates interim results and may change its approach hundreds of times before it delivers final output

Simulating Diverse Human Cohorts: Persona generation simulates human characters across a controllable range of points of view

US States Move Forward With AI Laws: Most states are regulating AI despite President Trump’s opposition to state-level laws

Big Pharma Bets Big on AI: Pharmaceutical kingpin Eli Lilly gave Insilico $2.75 billion for AI-driven drug development

Life After Llama: With Muse Spark, Meta pivots away from its open-weights Llama strategy

How Liquids and Gases Behave: A dynamic fluids model appears to solve transformers’ pixellation problem

Dark DNA Unveiled: Google’s AlphaGenome interprets DNA that regulates genetic expression

Claude Mythos Preview Raises Security Worries: Why Claude’s advanced Mythos Preview model will be limited-release-only

Learning Long Context at Inference: Test-Time Training End-to-End (TTT-E2E) retrains model weights to handle long inputs

Gemini’s Music Generator: Google debuted Lyria 3, an app that turns text or images into 30-second songs

Inside Claude Code: Claude Code’s source code leaked, exposing potential future features Kairos and autoDream

Context As An External Variable: Recursive Language Models offer path to aramatically expand beyond the context window

xAI’s Cost-Effective Video Generator: Grok Imagine 1.0 sharply cuts costs for high-quality video generation

Subscribe to The Batch