Video generation exploded in an abundance of powerful models.
What happened: Companies big and small introduced new or updated text-to-video generators. Some added image-to-video and/or video-to-video capabilities. While most models focus on generating cinematic clips, some specialize in videos for social media.
Driving the story: Even at the extraordinary pace of AI lately, video generators in the past year matured with remarkable speed. Virtually every major model produces convincing, highly detailed scenes, both realistic and fantastical, while ramping up image resolution, speed, output length, and users’ ability to control their outputs.
- OpenAI Sora set a high bar early in the year. Introduced in February and shown privately to Hollywood creators, it built a formidable buzz despite being available to only selected users. Unauthorized users gained access in November, and OpenAI made the model available the following month. Built on a diffusion transformer, Sora generates consistent (if somewhat dreamlike) scenes of up to 1 minute long.
- Runway Gen 3 Alpha and Gen 3 Alpha Turbo improved on their predecessors, generating higher-resolution videos (up to 1,280x768-pixel resolution) and introducing an API. Runway struck a deal with the film studio Lionsgate, which will use a custom version fine-tuned on its archive for visual effects and pre-visualizations.
- Adobe took a different approach with its Firefly Video model. In addition to offering a web application, the company incorporated the model directly into its best-selling Adobe Premiere Pro video editing suite. The integration enables video artists to generate clips, extend or enhance existing ones, and add effects within the program.
- Meta introduced Movie Gen, a suite of four systems. While its video output rivals that of competitors, it stands out especially for its ability to generate soundtracks. One system produces sound effects and music that match video. Another specializes in producing videos in which characters’ faces remain consistent, and another performs video-to-video alterations. Movie Gen will be available on Instagram in 2025.
- Model builders in China tailored their models for producing social media. Kling AI emphasized making TikTok and Instagram Reels. PixVerse and Jimeng AI likewise introduced video generators designed for social media users. In October, TikTok’s parent ByteDance added two video generation models, PixelDance and Seaweed, that produce 10-second and 30-second clips respectively.
Behind the news: Video generation is already reshaping the movie industry. In February, after seeing a preview of Sora, American filmmaker Tyler Perry halted a planned expansion of his production studio, arguing that within a few years, AI video could put traditional studios out of business. Members of the video graphics team at The Late Show with Stephen Colbert use Runway’s technology to add special effects to conventional digital video, cutting editing time from hours to minutes.
Where things stand: Video generation came a long way in 2024, but there’s still plenty of room for improvement. Because most models only generate a small number of frames at a time, they can struggle to track physics and geometry and to generate consistent characters and scenery over time. The computational demands of maintaining consistency across frames means that generated clips are brief. And even short outputs take substantial time and resources to generate: Sora can take 10 to 20 minutes to render clips as short as 3 seconds. OpenAI and Runway released faster versions — Sora Turbo and Gen-3 Alpha Turbo — to address the challenge.