Generative AI

131 Posts

3D scene comparison of human-object interaction for ZeroHSI, LINGO, and CHOIS models in a synthetic indoor environment.

Human Action in 3D: Stanford researchers use generated video to animate 3D interactions without motion capture

AI systems designed to generate animated 3D scenes that include active human characters have been limited by a shortage of training data, such as matched 3D scenes and human motion-capture examples. Generated video clips can get the job done without motion capture.

Visual model aligning diffusion embeddings with DINOv2 encoders using REPA and DiT/SiT blocks.

Generative AI

Faster Learning for Diffusion Models: Pretrained embeddings accelerate diffusion transformers’ learning

Diffusion transformers learn faster when they can look at embeddings generated by a pretrained model like DINOv2.

Diagram comparing diffusion, flow matching, and shortcut models for image generation with fewer steps.

Generative AI

Better Images in Fewer Steps: Researchers introduce shortcut models to speed up diffusion

Diffusion models usually take many noise-removal steps to produce an image, which takes time at inference. There are ways to reduce the number of steps, but the resulting systems are less effective. Researchers devised a streamlined approach that doesn’t sacrifice output quality.

Comparison table of Gemini and Gemma models across benchmarks like MMLU, MATH, and GPQA with radar charts.

Generative AI

Vision-Language, Compact and Open: Google releases Gemma 3 vision-language models with open weights

Google updated its open-weights family of large language models to include versions that handle image and video inputs.

GIF of AI-assisted art: A landscape is edited, a cyborg sketch turns photorealistic, and a cat reads a newspaper, showing human input for copyright

Generative AI

Some AI-Generated Works Are Copyrightable: U.S. Copyright Office says that no new laws are needed for AI-generated works

The United States Copyright Office determined that existing laws are sufficient to decide whether a given AI-generated work is protected by copyright, making additional legislation unnecessary.

Amazon smart display with widgets for recipes, calendar, weather, events, and streaming (Prime Video, Netflix, Disney+).

Generative AI

Amazon’s Next-Gen Voice Assistant: Alexa+ adds generative AI and agents, using Claude and other models

Amazon announced Alexa+, a major upgrade to its long-running voice assistant.

Diagram of Coconut, a method training LLMs to process thought chains as vectors, comparing it to Chain-of-Thought (CoT).

Generative AI

Reasoning in Vectors, Not Text: Meta introduces Chain of Continuous Thought (Coconut) to improve next-token prediction

Although large language models can improve their performance by generating a chain of thought (CoT) — intermediate text tokens that break down the process of responding to a prompt into a series of steps.

Bar chart comparing active vs. random sampling effects on length, diversity, and toxicity after fine-tuning.

Generative AI

Fine-Tuning Fine Points: Active inheritance, a smarter way to fine-tune models on synthetic data

The practice of fine-tuning models on synthetic data is becoming well established. But synthetic training data, even if it represents the training task well, may include characteristics like toxicity that impart unwelcome properties in the trained model’s output...

AI assistant processes ‘Find me a family-friendly campsite’ and suggests options.

Generative AI

Computer Use Gains Momentum: OpenAI’s Operator automates online tasks with a new AI agent

OpenAI introduced an AI agent that performs simple web tasks on a user’s behalf.

Generative AI

David Ding: Generated video with music, sound effects, and dialogue

Last year, we saw an explosion of models that generate either video or audio outputs in high quality. In the coming year, I look forward to models that produce video clips complete with audio soundtracks including speech, music, and sound effects.

Generative AI

Hanno Basse: Generative AI for artists

Stability AI’s aim is to liberate artists of all trades from the repetitive, mechanical aspects of their work and help them spend the majority of their time on the creative side. So our highest hope for next year is that generative AI will help people to be more creative and productive.

Generative AI

Generative Video Takes Off: Generative video models revolutionize content creation with stunning realism

Video generation exploded in an abundance of powerful models.

A GIF with scenes of a man at a café, a working robot, a ghost in a mirror, and a speeding truck.

Generative AI

Open Video Gen Closes the Gap: Tencent releases HunyuanVideo, an open source model rivaling commercial video generators

The gap is narrowing between closed and open models for video generation.

Game character climbing a ladder with visible controls (QWASD) and health bars.

Generative AI

Game Worlds on Tap: Genie 2 brings interactive 3D worlds to life

A new model improves on recent progress in generating interactive virtual worlds from still images.

Berkeley Function Calling Leaderboard with metrics like accuracy, latency, and relevance.

Generative AI

Competitive Performance, Competitive Prices: Amazon introduces Nova models for text, image, and video

Amazon introduced a range of models that confront competitors head-on.

Generative AI

Human Action in 3D: Stanford researchers use generated video to animate 3D interactions without motion capture

Faster Learning for Diffusion Models: Pretrained embeddings accelerate diffusion transformers’ learning

Better Images in Fewer Steps: Researchers introduce shortcut models to speed up diffusion

Vision-Language, Compact and Open: Google releases Gemma 3 vision-language models with open weights

Some AI-Generated Works Are Copyrightable: U.S. Copyright Office says that no new laws are needed for AI-generated works

Amazon’s Next-Gen Voice Assistant: Alexa+ adds generative AI and agents, using Claude and other models

Reasoning in Vectors, Not Text: Meta introduces Chain of Continuous Thought (Coconut) to improve next-token prediction

Fine-Tuning Fine Points: Active inheritance, a smarter way to fine-tune models on synthetic data

Computer Use Gains Momentum: OpenAI’s Operator automates online tasks with a new AI agent

David Ding: Generated video with music, sound effects, and dialogue

Hanno Basse: Generative AI for artists

Generative Video Takes Off: Generative video models revolutionize content creation with stunning realism

Open Video Gen Closes the Gap: Tencent releases HunyuanVideo, an open source model rivaling commercial video generators

Game Worlds on Tap: Genie 2 brings interactive 3D worlds to life

Competitive Performance, Competitive Prices: Amazon introduces Nova models for text, image, and video

Subscribe to The Batch