Short CourseBeginner1 Hour 25 Minutes

How Transformer LLMs Work

Instructors: Jay Alammar, Maarten Grootendorst

Co-authors of "Hands-On Large Language Models"

  • Beginner
  • 1 Hour 25 Minutes
  • 12 Video Lessons
  • 3 Code Examples
  • Instructors: Jay Alammar, Maarten Grootendorst
    • Jay Alammar, Maarten Grootendorst
    Jay Alammar, Maarten Grootendorst

What you'll learn

  • Gain an understanding of the key components of transformers, including tokenization, embeddings, self-attention, and transformer blocks, to build a strong technical foundation.

  • Understand recent transformer improvements to the attention mechanism such as KV cache, multi-query attention, grouped query attention, and sparse attention.

  • Compare tokenization strategies used in modern LLMS and explore transformers in the Hugging Face Transformers library.

About this course

Introducing “How Transformer LLMs Work,” created with Jay Alammar and Maarten Grootendorst, authors of the “Hands-On Large Language Models” book. This course offers a deep dive into the main components of the transformer architecture that powers large language models (LLMs). 

The transformer architecture revolutionized generative AI. In fact, the “GPT” in ChatGPT stands for “Generative Pre-Trained Transformer.” 

Originally introduced in the groundbreaking 2017 paper Attention Is All You Need, by Ashish Vaswani and others, transformers were a highly scalable model for machine translation tasks. Variants of this architecture now power today’s LLMs such as those from OpenAI, Google, Meta, Cohere, and Anthropic.

In their book, Jay and Maarten beautifully illustrated the underlying architecture of LLMs through insightful and easy-to-understand explanations.

In this course, you’ll learn how a transformer network architecture that powers LLMs works. You’ll build the intuition of how LLMs process text and work with code examples that illustrate the key components of the transformer architecture.

Key topics covered in this course include:

  • The evolution of how language has been represented numerically, from the Bag-of-Words model through Word2Vec embeddings to the transformer architecture that captures word meanings in full context.
  • How LLM inputs are broken down into tokens, which represent words or pieces before they are sent to the language model.
  • The details of a transformer and the three main stages, consisting of tokenization and embedding, the stack of transformer blocks, and the language model head.
  • The details of the transformer block, including attention, which calculates relevance scores followed by the feedforward layer, which incorporates stored information learned in training.
  • How cached calculations make transformers faster, how the transformer block has evolved over the years since the original paper was released, and how they continue to be widely used.
  • Explore an implementation of recent models in the Hugging Face transformer library.

By the end of this course,  you’ll have a deep understanding of how LLMs process language and you’ll be able to read through papers describing models and understand the details that are used to describe these architectures. This intuition will help improve your approach to building  LLM applications.

Who should join?

Anyone interested in understanding the inner workings of transformer architectures that power today’s LLMs.

Course Outline

12 Lessons・3 Code Examples
  • Introduction

    Video5 mins

  • Understanding Language Models: Laguage as a Bag-of-Words

    Video5 mins

  • Understanding Language Models: (Word) Embeddings

    Video5 mins

  • Understanding Language Models: Encoding and Decoding Context with Attention

    Video5 mins

  • Understanding Language Models: Transformers

    Video7 mins

  • Tokenizers

    Video with code examples11 mins

  • Architectural Overview

    Video6 mins

  • The Transformer Block

    Video6 mins

  • Self-Attention

    Video10 mins

  • Model Example

    Video with code examples9 mins

  • Recent Improvements

    Video10 mins

  • Conclusion

    Video1 min

  • Appendix – Tips, Help, and Download

    Code examples1 min

Instructors

Jay Alammar

Jay Alammar

Director and Engineering Fellow at Cohere and co-author of Hands-On Large Language Models

Maarten Grootendorst

Maarten Grootendorst

 Senior Clinical Data Scientist at Netherlands Comprehensive Cancer Organization and co-author of Hands-On Large Language Models

Course access is free for a limited time during the DeepLearning.AI learning platform beta!

Want to learn more about Generative AI?

Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!