Diffusion Models

18 Posts

Visual model aligning diffusion embeddings with DINOv2 encoders using REPA and DiT/SiT blocks.

Faster Learning for Diffusion Models: Pretrained embeddings accelerate diffusion transformers’ learning

Diffusion transformers learn faster when they can look at embeddings generated by a pretrained model like DINOv2.

Diagram comparing diffusion, flow matching, and shortcut models for image generation with fewer steps.

Better Images in Fewer Steps: Researchers introduce shortcut models to speed up diffusion

Diffusion models usually take many noise-removal steps to produce an image, which takes time at inference. There are ways to reduce the number of steps, but the resulting systems are less effective. Researchers devised a streamlined approach that doesn’t sacrifice output quality.

Scientific diagram of a denoising model generating stable materials from random elements based on chemistry and symmetry

Diffusion Models

Designer Materials: MatterGen, a diffusion model that designs new materials with specified properties

Materials that have specific properties are essential to progress in critical technologies like solar cells and batteries. A machine learning model designs new materials to order.

Table comparing AI models on throughput, HumanEval, MBPP, EvalPlus, MultiPL-E, and code completion.

Diffusion Models

Text Generation by Diffusion: Mercury Coder uses diffusion to generate text

Typical large language models are autoregressive, predicting the next token, one at a time, from left to right. A new model hones all text tokens at once.

Diffusion Models

David Ding: Generated video with music, sound effects, and dialogue

Last year, we saw an explosion of models that generate either video or audio outputs in high quality. In the coming year, I look forward to models that produce video clips complete with audio soundtracks including speech, music, and sound effects.

A GIF with scenes of a man at a café, a working robot, a ghost in a mirror, and a speeding truck.

Diffusion Models

Open Video Gen Closes the Gap: Tencent releases HunyuanVideo, an open source model rivaling commercial video generators

The gap is narrowing between closed and open models for video generation.

Game character climbing a ladder with visible controls (QWASD) and health bars.

Diffusion Models

Game Worlds on Tap: Genie 2 brings interactive 3D worlds to life

A new model improves on recent progress in generating interactive virtual worlds from still images.

Berkeley Function Calling Leaderboard with metrics like accuracy, latency, and relevance.

Diffusion Models

Competitive Performance, Competitive Prices: Amazon introduces Nova models for text, image, and video

Amazon introduced a range of models that confront competitors head-on.

Temporal pyramids in rows (left) and position encoding in space-time pyramid shown in the pyramidal flow matching process.

Diffusion Models

Faster, Cheaper Video Generation: Pyramidal Flow Matching, a cost-cutting method for training video generators

Researchers devised a way to cut the cost of training video generators. They used it to build a competitive open source text-to-video model and promised to release the training code.

Diffusion Models

For Faster Diffusion, Think a GAN: Adversarial Diffusion Distillation, a method to accelerate diffusion models

Generative adversarial networks (GANs) produce images quickly, but they’re of relatively low quality. Diffusion image generators typically take more time, but they produce higher-quality output. Researchers aimed to achieve the best of both worlds.

Excerpt from Google Pixel 8 promotional video

Diffusion Models

Generative AI Calling: Google brings advanced computer vision and audio tech to Pixel 8 and 8 Pro phones.

Google’s new mobile phones put advanced computer vision and audio research into consumers’ hands. The Alphabet division introduced its flagship Pixel 8 and Pixel 8 Pro smartphones at its annual hardware-launch event. Both units feature AI-powered tools for editing photos and videos.

Diffusion Models

Diffusion Transformed: A new class of diffusion models based on the transformer architecture

A tweak to diffusion models, which are responsible for most of the recent excitement about AI-generated images, enables them to produce more realistic output.

Diffusion Models

Stable Biases: Stable Diffusion may amplify biases in its training data.

Stable Diffusion may amplify biases in its training data in ways that promote deeply ingrained social stereotypes.

Diffusion Models

Text-to-Image Editing Evolves: InstructPix2Pix for text-to-image editing, explained

Text-to-image generators like DALL·E 2, Stable Diffusion, and Adobe’s new Generative Fill feature can revise images in a targeted way — say, change the fruit in a bowl from oranges to bananas — if you enter a few words that describe the change plus an indication of the areas to be changed.

Example of generation of new videos out of existing ones, using Gen-1

Diffusion Models

Text-Driven Video Alteration: Gen-1 uses text prompts to modify videos.

On the heels of systems that generate video directly from text, new work uses text to adjust the imagery in existing videos. Researchers unveiled Gen-1...

Diffusion Models

Faster Learning for Diffusion Models: Pretrained embeddings accelerate diffusion transformers’ learning

Better Images in Fewer Steps: Researchers introduce shortcut models to speed up diffusion

Designer Materials: MatterGen, a diffusion model that designs new materials with specified properties

Text Generation by Diffusion: Mercury Coder uses diffusion to generate text

David Ding: Generated video with music, sound effects, and dialogue

Open Video Gen Closes the Gap: Tencent releases HunyuanVideo, an open source model rivaling commercial video generators

Game Worlds on Tap: Genie 2 brings interactive 3D worlds to life

Competitive Performance, Competitive Prices: Amazon introduces Nova models for text, image, and video

Faster, Cheaper Video Generation: Pyramidal Flow Matching, a cost-cutting method for training video generators

For Faster Diffusion, Think a GAN: Adversarial Diffusion Distillation, a method to accelerate diffusion models

Generative AI Calling: Google brings advanced computer vision and audio tech to Pixel 8 and 8 Pro phones.

Diffusion Transformed: A new class of diffusion models based on the transformer architecture

Stable Biases: Stable Diffusion may amplify biases in its training data.

Text-to-Image Editing Evolves: InstructPix2Pix for text-to-image editing, explained

Text-Driven Video Alteration: Gen-1 uses text prompts to modify videos.

Subscribe to The Batch