Short CourseBeginner1 Hour 4 Minutes

Quantization Fundamentals with Hugging Face

Instructors: Younes Belkada, Marc Sun

Hugging Face
  • Beginner
  • 1 Hour 4 Minutes
  • 7 Video Lessons
  • 3 Code Examples
  • Instructors: Younes Belkada, Marc Sun
    • Hugging Face
    Hugging Face

What you'll learn

  • Learn how to compress models with the Hugging Face Transformers library and the Quanto library.

  • Learn about linear quantization, a simple yet effective method for compressing models.

  • Practice quantizing open source multimodal and language models.

About this course

Generative AI models, like large language models, often exceed the capabilities of consumer-grade hardware and are expensive to run. Compressing models through methods such as quantization makes them more efficient, faster, and accessible. This allows them to run on a wide variety of devices, including smartphones, personal computers, and edge devices, and minimizes performance degradation.

Join this course to:

  • Quantize any open source model with linear quantization using the Quanto library.
  • Get an overview of how linear quantization is implemented. This form of quantization can be applied to compress any model, including LLMs, vision models, etc.
  • Apply “downcasting,” another form of quantization, with the Transformers library, which enables you to load models in about half their normal size in the BFloat16 data type.

By the end of this course, you will have a foundation in quantization techniques and be able to apply them to compress and optimize your own generative AI models, making them more accessible and efficient.

Who should join?

This is an introduction to the fundamental concepts of quantization for learners with a basic understanding of machine learning concepts and some experience with PyTorch, who is interested in learning about model quantization in generative AI.

Course Outline

7 Lessons・3 Code Examples
  • Introduction

    Video3 mins

  • Handling Big Models

    Video5 mins

  • Data Types and Sizes

    Video with code examples17 mins

  • Loading Models by data type

    Video with code examples15 mins

  • Quantization Theory

    Video with code examples15 mins

  • Quantization of LLMs

    Video6 mins

  • Conclusion

    Video1 min

Instructors

Younes Belkada

Younes Belkada

Machine Learning Engineer at Hugging Face

Marc Sun

Marc Sun

Machine Learning Engineer at Hugging Face

Course access is free for a limited time during the DeepLearning.AI learning platform beta!

Want to learn more about Generative AI?

Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!