Machine Learning Research

560 Posts

Top graph (blue) shows GPT-5 score drop; bottom graph (orange) shows RLM maintaining higher scores.
Machine Learning Research

Context As An External Variable: Recursive Language Models offer path to aramatically expand beyond the context window

When processing long contexts, large language models often lose track of details or devolve into nonsense. Researchers reduced these effects by managing context externally.
A dancing woman morphs into a black cat, then a monkey, and finally a pig under neon hexagon lights, illustrating xAI’s video generator capabilities.
Machine Learning Research

xAI’s Cost-Effective Video Generator: Grok Imagine 1.0 sharply cuts costs for high-quality video generation

xAI launched a video generator that topped an independent quality ranking at a fraction of competitors’ prices.
The chart compares Nemotron 3 models’ performance in accuracy and processing speed against other AI models.
Machine Learning Research

Open-Source Speed Demon: Nvidia’s open Nemotron 3 Super 120B-A12B model sets new paces in its class

Nvidia, the dominant supplier of AI chips, released a competitive open-source large language model whose speed tops its size class — the first open-weights leader to come from the United States since last year, when Meta delivered Llama 4.
Apple's AToken, a multimodal model with a single encoder and tokenizer for images, videos, and 3D objects
Machine Learning Research

A Single Tokenizer for Visual Media: Apple’s AToken, a multimodal model with a single encoder and tokenizer for images, videos, and 3D objects

Multimodal models typically use different tokenizers to embed different media types, and different encoders when training to generate media rather than classify it.
DeepSeek made its upcoming 4.0 model available for performance testing to Chinese chipmakers but not U.S, ones
Machine Learning Research

DeepSeek Snubs Nvidia for Huawei: DeepSeek made its upcoming 4.0 model available for performance testing to Chinese chipmakers but not U.S, ones

DeepSeek, the Chinese developer of outstanding open-weights models, has withheld an upcoming update of its flagship model from U.S. chip makers, a move that intensifies the AI rivalry between the U.S. and China.
Alibaba's latest flagship models are open-weights MoE performers in sizes from less than 1B parameters
Machine Learning Research

Qwen3.5 Outperforms Bigger Models, Leads Vision Benchmarks: Alibaba’s latest flagship models are open-weights MoE performers in sizes from less than 1B parameters

The Qwen3.5 family of open-weights vision-language models includes impressive larger models as well as a smaller one that outperforms an OpenAI open-weights model 10 times its size.
Visual schema of FAE's learning process, featuring fire and snowflake icons showing performance focus.
Machine Learning Research

Lightning-Fast Diffusion Learning: Inside Feature Auto-Encoder, a diffusion image generator that shrinks embeddings for more speed

Research shows that diffusion image generators can train somewhat faster if they learn to reconstruct embeddings from a pretrained encoder that’s built for vision tasks like classification, segmentation, and retrieval — not image generation.
Table shows GPT-5.4 outperforms in GDPval and Tau2-bench Telecom, setting new state-of-the-art scores.
Machine Learning Research

GPT-5.4’s Higher Performance, Higher Price: OpenAI’s GPT-5.4 Pro and GPT-5.4 Thinking challenge Google’s Gemini 3.1 Pro Preview as best all-around AI model

OpenAI updated its flagship models, extending the ability to use tools and setting the state of the art on a handful of benchmarks, and priced them at the top of the market. Its coding and agentic abilities have enabled Codex, OpenAI’s competitor to Anthropic’s Claude Code, to leap ahead.
Diagram depicts a math problem-solving workflow from problem generation to verification and revision.
Machine Learning Research

Agent Solves Stubborn Math Problems: Google’s Aletheia uses Gemini 3 Deep Think to find original mathematics solutions

LLMs have achieved gold-medal performance in math competitions. An agentic system showed strength in mathematical research as well.
Cursor hovers over a button labeled "Submit" on a platform showing task ratings and a typed approval note.
Machine Learning Research

Management for Agents: OpenAI’s Frontier agent insights and orchestration platform launches to select customers

Managers need to understand how their subordinates get work done, what resources they require, and what they accomplish. OpenAI’s latest product aims to fulfill this need when the teammates are AI agents.
AI-generated scenes including ornate signage, a beachgoer’s tattoo, and cactus and honeycomb cars, illustrating Nano Banana 2’s realism.
Machine Learning Research

Nano Banana 2 Ups Performance/Price: Gemini 3.1 Flash Image makes photo generation and edits easier and faster

Google launched a cheaper, faster successor to its flagship image generator, delivering greater interactivity at roughly half the price.
Bar graph depicts rising efficiency in AI models from 2023 to 2025, highlighting energy gains.
Machine Learning Research

Can Local AI Stand In for the Cloud?: Stanford and Together.AI researchers chart edge models’ performance in intelligence per watt

Projected demand for output from large language models is spurring a massive buildout of data centers. Researchers asked whether smaller models running on local devices could meaningfully lighten that load.
Officials and leaders stand together at the India AI Impact Summit 2026 in New Delhi.
Machine Learning Research

Global AI Summit Shows Optimism: Rising giant India presents itself as AI Counterweight to U.S. and China

The fourth global AI summit marked a decisive shift from focusing on theoretical hazards to spreading AI’s benefits throughout the world.
A benchmark table shows Gemini 3.1 Pro leading in performance across several tested metrics.
Machine Learning Research

Gemini Takes the Lead: Google releases Gemini 3.1 Pro in preview, tops Intelligence Index at same price

Google updated its flagship Gemini model, topping several benchmarks while undercutting competitors on performance per dollar.
Diagram shows SleepFM's data processing flow from sleep signals to disease prediction using neural networks.
Machine Learning Research

Sleep Signals Predict Illness: SleepFM detects signs of neurological disorders years before symptoms manifest

Difficulty sleeping often precedes heart disease, psychiatric disorders, and many other illnesses. Researchers used data gathered during sleep studies to detect such conditions.
Load More

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox