Loading the Elevenlabs Text to Speech AudioNative Player...

Dear friends,

Is AI progressing rapidly? Yes! But while the progress of underlying AI technology has indeed sped up over the past 2 years, the fastest acceleration is in applications.

Consider this: GPT-4 was released March 2023. Since then, models have become much faster, cheaper, sometimes smaller, more multimodal, and better at reasoning, and many more open weight versions are available — so progress has been fantastic! (Claims that AI is “hitting a wall” seem extremely ill-informed.) But more significantly, many applications that  already were theoretically possible using the March 2023 version of GPT-4 — in areas such as customer service, question answering, and process automation — now have significant early momentum.

I’m confident 2025 will see even faster and more exciting advances than 2024 in both AI technology and applications. Looking back, the one thing that could have stopped AI was bad, anti-competitive regulation that would have put onerous burdens on developers, particularly of open models. So long as we remain vigilant and hold off these anti-innovation forces, we’ll keep up or even further accelerate progress.

Deer training class with sleigh diagrams on a chalkboard.

I’m also seeing a widening gap between those at the cutting edge (which includes many readers of The Batch!) and those who have not yet tried out ChatGPT even once (yes, a lot of people are still in this group!). As technology changes around us, we all have to keep up to remain relevant and be able to make significant contributions. I’m committed to making sure DeepLearning.AI continues to help you learn the most useful and important AI technologies. If you’re making New Year’s resolutions, I hope you’ll include us in your learning plan!

AI is the most important technological change happening in the world right now. I’m thrilled to be working in this exciting sector alongside you, and I’m grateful for your efforts to learn about and apply it to better the lives of yourself and others.

Happy holidays!

Andrew

Top Stories of 2024

Snow-covered tree branches with intricate snowflakes.

A Blizzard of Progress

What a year! AI made dramatic advances in 2024. Agentic systems improved their abilities to reason, use tools, and control desktop applications. Smaller models proliferated, many of them more capable and less expensive than their larger forbears. While some developments raised worries, far more sparked wonder and optimism. As in the waning days of earlier years, we invite you to pour a cup of hot cocoa and consider the high points of the last 12 months.


Santas in line with gifts and a ‘Photos with Santa’ sign.

Agents Ascendant

The AI community laid the foundation for systems that can act by prompting large language models iteratively, leading to much higher performance across a range of applications.

What happened: AI gained a new buzzword — agentic — as researchers, tool vendors, and model builders equipped large language models (LLMs) to make choices and take actions to achieve goals. These developments set the stage for an upswell of agentic activity in the coming year and beyond.

Driving the story: Several tools emerged to help developers build agentic workflows.

  • Microsoft primed the pump for agentic development tools in late 2023 with Autogen, an open source conversational framework that orchestrates collaboration among multiple agents. (Learn how to take advantage of it in our short course “AI Agentic Design Patterns with Autogen.”) In late 2024, part of the Autogen team split off to build AG2 based on a fork of the code base.
  • In October 2023, CrewAI released its open source Python framework for building and managing multi-agent systems. Agents can be assigned roles and goals, gain access to tools like web search, and collaborate with each other. (DeepLearning.AI’s short courses “Multi-Agent Systems with crewAI” and “Practical Multi AI-Agents and Advanced Use Cases with crewAI” can give you a fast start.)
  • In January, LangChain, a provider of development tools, introduced LangGraph, which orchestrates agent behaviors using cyclical graphs. The framework enables LLM-driven agents to receive inputs, reason over them, decide on actions, use tools, evaluate the results, and repeat these steps to improve results. (Our short course “AI Agents in LangGraph” offers an introduction.)
  • In September, Meta introduced Llama Stack for building agentic applications based on Llama models. Llama Stack provides memory, conversational skills, orchestration services, and ethical guardrails.
  • Throughout the year, integrated development environments implemented agentic workflows to generate code. For instance, Devin and OpenHands accept natural-language instructions to generate prototype programs. Replit Agent, Vercel’s V0, and Bolt streamline projects by automatically writing code, fixing bugs, and managing dependencies.
  • Meanwhile, a number of LLM makers supported agentic workflows by implementing tool use and function calling. Anthropic added computer use, enabling Claude 3.5 Sonnet to control users’ computers directly.
  • Late in the year, OpenAI rolled out its o1 models and the processing-intensive o1 pro mode, which use agentic loops to work through prompts step by step. DeepSeek-R1 and Google Gemini 2.0 Flash Thinking Mode followed with similar agentic reasoning. In the final days of 2024, OpenAI announced o3 and o3-preview, which further extend o1’s agentic reasoning capabilities with impressive reported results.

Behind the news: Techniques for prompting LLMs in more sophisticated ways began to take off in 2022. They coalesced in moves toward agentic AI early this year. Foundational examples of this body of work include:

  • Chain of Thought prompting, which asks LLMs to think step by step
  • Self-consistency, which prompts a model to generate several responses and pick the one that’s most consistent with the others
  • ReAct, which interleaves reasoning and action steps to accomplish a goal
  • Self-Refine, which enables an agent to reflect on its own output
  • Reflexion, which enables a model to act, evaluate, reflect, and repeat.
  • Test-time compute, which increases the amount of processing power allotted to inference

Where things stand: The agentic era is upon us! Regardless of how well scaling laws continue to drive improved performance of foundation models, agentic workflows are making AI systems increasingly helpful, efficient, and personalized.


Sleigh rides sign with pricing adjustments and hot cocoa.

Prices Tumble

Fierce competition among model makers and cloud providers drove down the price of access to state-of-the-art models.

What happened: AI providers waged a price war to attract paying customers. A leading indicator: From March 2023 to November 2024, OpenAI cut the per-token prices of cloud access to its models by nearly 90 percent even as performance improved, input context windows expanded, and the models became capable of processing images as well as text.

Driving the story: Factors that pushed down prices include open source, more compute-efficient models, and excitement around agentic workflows that consume more tokens at inference. OpenAI’s GPT-4 Turbo set a baseline when it debuted in late 2023 at $10.00/$30.00 per million tokens of input/output. Top model makers slashed prices in turn: Google and OpenAI at the higher end of the market, companies in China at the lower end, and Amazon at both. Meanwhile, startups with specialized hardware offered open models at prices that dramatically undercut the giants.

  • Competitive models with open weights helped drive prices down by enabling cloud providers to offer high-performance models without bearing the cost of developing or licensing them. Meta released Llama 3 70B in April, and various cloud providers offered it at an average price of $0.78/$0.95 per million input/output tokens. Llama 3.1 405B followed in July 2024; Microsoft Azure priced it at almost half the price of GPT-4 Turbo ($5.33/$16.00).
  • Per-token prices for open weights models tumbled in China. In May, DeepSeek released DeepSeek V2 and soon dropped the price to $0.14/$0.28 per million tokens of input/output. Alibaba, Baidu, and Bytedance slashed prices for Qwen-Long ($0.06/$0.06), Ernie-Speed and Ernie-Lite (free), and Doubau ($0.11/$0.11) respectively.
  • Makers of closed models outdid one another with lower and lower prices. In May, OpenAI introduced GPT-4o at $5.00/$15.00 per million tokens of input/output, half as much as GPT-4 Turbo. By August, GPT-4o cost $2.50/$10.00 and the newer GPT-4o mini cost $0.15/$0.60 (half as much for jobs with slower turnaround times).
  • Google ultimately cut the price of Gemini 1.5 Pro to $1.25/$5.00 per million input/output tokens (twice as much for prompts longer than 128,000 tokens) and slashed Gemini 1.5 Flash to $0.075/$0.30 per million input/output tokens (twice as much for prompts longer than 128,000 tokens). As of this writing, Gemini 2.0 Flash is free to use as an experimental preview, and API prices have not been announced.
  • In December, Amazon introduced the Nova family of LLMs. At launch, Nova Pro ($0.80/$3.20 per million tokens of input/output) cost much less than top models from OpenAI or Google, while Nova Lite ($0.06/$0.24) and Nova Micro ($0.035/$0.14 respectively) cost much less than GPT-4o mini. (Disclosure: Andrew Ng serves on Amazon’s board of directors.)
  • Even as model providers cut their prices, startups including Cerebrus, Groq, and SambaNova designed specialized chips that enabled them to serve open weights models faster and more cheaply. For example, SambaNova offered Llama 3.1 405B for $5.00/$10.00 per million tokens of input/output, processing a blazing 132 tokens per second. DeepInfra offered the same model at a slower speed for as little as $2.70/$2.70.

Yes, but: The trend toward more processing-intensive models is challenged but not dead. In September, OpenAI introduced token-hungry models with relatively hefty price tags: o1-preview ($15.00/$60.00 per million tokens input/output) and o1-mini ($3.00/$12.00). In December, o1 arrived with a more accurate pro mode that’s available only to subscribers who are willing to pay $200 per month.

Behind the news: Prominent members of the AI community pushed against regulations that threatened to restrict open source models, which played an important role in bringing down prices. Opposition by developers helped to block California SB 1047, a proposed law that would have held developers of models above certain size limits liable for unintended harms caused by their models and required a “kill switch” that would enable developers to disable them — a problematic requirement for open weights models that anyone could modify and deploy. California Governor Gavin Newsom vetoed the bill in October.

Where things stand: Falling prices are a sign of a healthy tech ecosystem. It’s likely that in-demand models will always fetch relatively high prices, but the market is increasingly priced in pennies, not dollars, per million tokens.


Snowman using a camera during snowfall.

Generative Video Takes Off

Video generation exploded in an abundance of powerful models.

What happened: Companies big and small introduced new or updated text-to-video generators. Some added image-to-video and/or video-to-video capabilities. While most models focus on generating cinematic clips, some specialize in videos for social media.

Driving the story: Even at the extraordinary pace of AI lately, video generators in the past year matured with remarkable speed. Virtually every major model produces convincing, highly detailed scenes, both realistic and fantastical, while ramping up image resolution, speed, output length, and users’ ability to control their outputs.

  • OpenAI Sora set a high bar early in the year. Introduced in February and shown privately to Hollywood creators, it built a formidable buzz despite being available to only selected users. Unauthorized users gained access in November, and OpenAI made the model available the following month. Built on a diffusion transformer, Sora generates consistent (if somewhat dreamlike) scenes of up to 1 minute long.
  • Runway Gen 3 Alpha and Gen 3 Alpha Turbo improved on their predecessors, generating higher-resolution videos (up to 1,280x768-pixel resolution) and introducing an API. Runway struck a deal with the film studio Lionsgate, which will use a custom version fine-tuned on its archive for visual effects and pre-visualizations.
  • Adobe took a different approach with its Firefly Video model. In addition to offering a web application, the company incorporated the model directly into its best-selling Adobe Premiere Pro video editing suite. The integration enables video artists to generate clips, extend or enhance existing ones, and add effects within the program.
  • Meta introduced Movie Gen, a suite of four systems. While its video output rivals that of competitors, it stands out especially for its ability to generate soundtracks. One system produces sound effects and music that match video. Another specializes in producing videos in which characters’ faces remain consistent, and another performs video-to-video alterations. Movie Gen will be available on Instagram in 2025.
  • Model builders in China tailored their models for producing social media. Kling AI emphasized making TikTok and Instagram Reels. PixVerse and Jimeng AI likewise introduced video generators designed for social media users. In October, TikTok’s parent ByteDance added two video generation models, PixelDance and Seaweed, that produce 10-second and 30-second clips respectively.

Behind the news: Video generation is already reshaping the movie industry. In February, after seeing a preview of Sora, American filmmaker Tyler Perry halted a planned expansion of his production studio, arguing that within a few years, AI video could put traditional studios out of business. Members of the video graphics team at The Late Show with Stephen Colbert use Runway’s technology to add special effects to conventional digital video, cutting editing time from hours to minutes.

Where things stand: Video generation came a long way in 2024, but there’s still plenty of room for improvement. Because most models only generate a small number of frames at a time, they can struggle to track physics and geometry and to generate consistent characters and scenery over time. The computational demands of maintaining consistency across frames means that generated clips are brief. And even short outputs take substantial time and resources to generate: Sora can take 10 to 20 minutes to render clips as short as 3 seconds. OpenAI and Runway released faster versions — Sora Turbo and Gen-3 Alpha Turbo — to address the challenge.


A hand holding a snow globe with skaters and a snowman.

Smaller Is Beautiful

For years, the best AI models got bigger and bigger. But in 2024, some popular large language models were small enough to run on a smartphone.

What happened: Instead of putting all their resources into building big models, top AI companies promoted families of large language models that offer a choice of small, medium, and large. Model families such as Microsoft Phi-3 (in versions of roughly 3.8 billion, 7 billion, and 14 billion parameters), Google Gemma 2 (2 billion, 9 billion, and 27 billion), and Hugging Face SmolLM (135 million, 360 million, and 1.7 billion) specialize in small.

Driving the story: Smaller models have become more capable thanks to techniques like knowledge distillation (in which a larger teacher model is used to train a smaller student model to match its output), parameter pruning (which removes less-influential parameters), quantization (which reduces neural network sizes by representing each parameter with fewer bits), and greater attention to curating training sets for data quality. Beyond performance, speed, and price, the ability to run on relatively low-powered hardware is a competitive advantage for a variety of uses.

  • Model builders have offered model families that include members of various sizes since at least 2019, when Google introduced the T5 family (five models between roughly 77 million parameters and 11 billion parameters). The success of OpenAI’s GPT series, which over time grew from 117 million parameters to a hypothesized 1.76 trillion parameters, demonstrated the power of bigger models. OpenAI researchers formulated scaling laws that appeared to guarantee that bigger models, training sets, and compute budgets would lead to predictable improvements in performance. This finding spurred rivals to build larger and larger models.
  • The tide started to turn in early 2023. Meta’s Llama 2 came in parameter counts of roughly 7 billion, 13 billion, and 70 billion with open weights. 
  • In December 2023, Google launched the Gemini family, including Gemini Nano (1.8 billion parameters). In February, it released the small, open weights family Gemma 1 (2 billion and 7 billion parameters), followed by Gemma 2 (9 billion and 27 billion).
  • Microsoft introduced Phi-2 (2.7 billion parameters) in December 2023 and Phi-3 (3.8 billion, 7 billion, and 14 billion) in April. 
  • In August, Nvidia released its Minitron models. It used a combination of distillation and pruning to shrink Llama 3.1 from 8 billion to 4 billion parameters and Mistral NeMo from 12 billion to 8 billion parameters, boosting speed and lowering computing costs while maintaining nearly the same level of accuracy.

Behind the news: Distillation, pruning, quantization, and data curation are longstanding practices. But these techniques have not resulted in models quite this ratio of size and capability before, arguably because the larger models that are distilled, pruned, or quantized have never been so capable.

  • In 1989, Yann LeCun and colleagues at Bell Labs published “Optimal Brain Damage,” which showed that  deleting weights selectively could reduce a model’s size and, in some cases, improve its ability to generalize.
  • Quantization dates to 1990, when E. Fiesler and colleagues at the University of Alabama demonstrated various ways to represent the parameters of a neural network in “A Weight Discretization Paradigm for Optical Neural Networks.” It made a resurgence 2010’s with the growth in popularity and sizes of neural networks, which spurred the refinements quantization-aware training and post-training quantization.
  • In 2006, Rich Caruana and colleagues at Cornell published “Model Compression,” showing how to train a single model to mimic the performance of multiple models. Geoffrey Hinton and colleagues at Google Brain followed in 2015 with “Distilling the Knowledge in a Neural Network,” which improved the work of Caruana et al. and introduced the term distillation to describe a more general way to compress models.
  • Most of the current crop of smaller models were trained on datasets that were carefully curated and cleaned. Higher-quality data makes it possible to get more performance out of fewer parameters. This is an example of data-centric AI, the practice of improving model performance by improving the quality of their training data.

Where things stand: Smaller models dramatically widen the options for cost, speed, and deployment. As researchers find ways to shrink models without sacrificing performance, developers are gaining new ways to build profitable applications, deliver timely services, and distribute processing to the edges of the internet.


Gift box labeled ‘Innovation Energizer’ with a rocket logo, truck, and snowy background.

Alternatives to Acquisitions

Big AI companies found creative ways to gain cutting-edge technology and talent without buying startups.

What happened: In 2024, some tech giants entered into novel partnership arrangements with AI startups, hiring top executives and securing access to technology without acquiring the companies outright. These agreements enabled the giants to take on elite talent and proven technology quickly with less risk that regulators might hinder such actions. The startups lost their leadership teams and control over key technical developments. In return, they received cash (in some cases, at least), rewarded investors, and were able to step back from the expense of building cutting-edge models.

Driving the story: Microsoft, Amazon, and Google used their deep pockets and cloud infrastructure to strike deals with Inflection AI, Adept AI and Covariant, and Character.ai respectively. (Disclosure: Andrew Ng is a member of Amazon’s board of directors.)

  • Microsoft blazed the trail in March. The tech giant invested $650 million in Inflection AI, licensed the startup’s models, integrated its conversational AI technologies, and hired much of its staff, including co-founders Mustafa Suleyman and Karén Simonyan. Microsoft named Suleyman CEO of a new AI division, putting him in charge of Microsoft’s own model building efforts and consumer-facing products like Bing and the Copilot product line. The remainder of Inflection focuses on customizing AI models for commercial clients.
  • In July, Amazon inked a similar agreement with Adept, a startup that built agents for tasks such as automating data entry and managing customer support tickets, under undisclosed terms. Amazon hired most of Adept AI’s staff, including CEO David Luan and other co-founders who were alumni from Google and OpenAI, and licensed Adept’s models, datasets, and other technology non-exclusively. Adept stopped developing in-house models to concentrate on building agents.
  • In October, Amazon further bolstered its logistics capabilities by forging an agreement with Covariant, a maker of AI-driven warehouse robots, also under undisclosed terms. Amazon hired most of the startup’s staff, including CEO/co-founder Peter Chen and chief scientist/co-founder Pieter Abbeel, and licensed its robotics models. In December, Amazon paired Abbeel and former Adept CEO Luan to run a new lab devoted to developing agents and artificial general intelligence. Covariant continues to serve customers in fulfillment centers and other industries.
  • In August, Google and conversational AI startup Character.ai cut a similar deal. Google hired Character.ai’s co-founders, Noam Shazeer and Daniel De Freitas, along with key team members, and inked a non-exclusive license to its technology. Shazeer joined Google’s Deep Learning research team, and other new hires set to work on Google’s chat services. Google gave Character.ai an undisclosed sum to buy out its investors and continue developing personalized AI products.

Behind the news: Tech giants have long relied on traditional acquisitions to gain new talent and capabilities, often acquiring startups specifically for their skilled teams (known as an acquihire) and/or their products or underlying technology, which can be expensive and time-consuming to develop and test in the market. But traditional acquisitions increasingly face scrutiny from antitrust regulators who are concerned about big companies reducing competition by buying out smaller ones. For example, the United States Federal Trade Commission sought to block Amazon’s acquisition of iRobot, prompting the companies to abandon the transaction in January 2024.

Where things stand: Giving startups a lump sum and/or licensing fees in return for top talent and technology looks like the new normal for tech giants that are challenged to keep pace with rapidly advancing research and markets. But even arms-length arrangements don’t immunize tech giants and startups against regulatory investigation. Microsoft’s investment in Inflection AI was briefly scrutinized in Europe and is still being evaluated by U.S. regulators. Even Microsoft’s more traditional investment in OpenAI and the interests of Amazon and Google in Anthropic faced regulatory hurdles. So far, however, regulators have yet to conclude that any of these agreements violates antitrust law.

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox