Short CourseIntermediate2 Hours 11 Minutes

Quantization in Depth

Instructors: Marc Sun, Younes Belkada

Hugging Face
  • Intermediate
  • 2 Hours 11 Minutes
  • 18 Video Lessons
  • 13 Code Examples
  • Instructors: Marc Sun, Younes Belkada
    • Hugging Face
    Hugging Face

What you'll learn

  • Try out different variants of Linear Quantization, including symmetric vs. asymmetric mode, and different granularities like per tensor, per channel, and per group quantization.

  • Build a general-purpose quantizer in Pytorch that can quantize the dense layers of any open source model for up to 4x compression on dense layers.

  • Implement weights packing to pack four 2-bit weights into a single 8-bit integer.

About this course

In Quantization in Depth you will build model quantization methods to shrink model weights to ¼ their original size, and apply methods to maintain the compressed model’s performance. Your ability to quantize your models can make them more accessible, and also faster at inference time. 

Implement and customize linear quantization from scratch so that you can study the tradeoff between space and performance, and then build a general-purpose quantizer in PyTorch that can quantize any open source model. You’ll implement  techniques to compress model weights from 32 bits to 8 bits and even 2 bits.

Join this course to:

  • Build and customize linear quantization functions, choosing between two “modes”: asymmetric and symmetric; and three granularities: per-tensor, per-channel, and per-group quantization.
  • Measure the quantization error of each of these options as you balance the performance and space tradeoffs for each option.
  • Build your own quantizer in PyTorch, to quantize any open source model’s dense layers from 32 bits to 8 bits.
  • Go beyond 8 bits, and pack four 2-bit weights into one 8-bit integer.

Quantization in Depth lets you build and customize your own linear quantizer from scratch, going beyond standard open source libraries such as PyTorch and Quanto, which are covered in the short course Quantization Fundamentals, also by Hugging Face.

This course gives you the foundation to study more advanced quantization methods, some of which are recommended at the end of the course.

Who should join?

Building on the concepts introduced in Quantization Fundamentals with Hugging Face, this course will help deepen your understanding of linear quantization methods. If you’re looking to go further into quantization, this course is the perfect next step.

Course Outline

18 Lessons・13 Code Examples
  • Introduction

    Video4 mins

  • Overview

    Video3 mins

  • Quantize and De-quantize a Tensor

    Video with code examples10 mins

  • Get the Scale and Zero Point

    Video with code examples13 mins

  • Symmetric vs Asymmetric Mode

    Video with code examples7 mins

  • Finer Granularity for more Precision

    Video with code examples2 mins

  • Per Channel Quantization

    Video with code examples11 mins

  • Per Group Quantization

    Video with code examples7 mins

  • Quantizing Weights & Activations for Inference

    Video with code examples3 mins

  • Custom Build an 8-Bit Quantizer

    Video with code examples13 mins

  • Replace PyTorch layers with Quantized Layers

    Video with code examples5 mins

  • Quantize any Open Source PyTorch Model

    Video with code examples8 mins

  • Load your Quantized Weights from HuggingFace Hub

    Video with code examples7 mins

  • Weights Packing

    Video5 mins

  • Packing 2-bit Weights

    Video with code examples8 mins

  • Unpacking 2-Bit Weights

    Video with code examples8 mins

  • Beyond Linear Quantization

    Video7 mins

  • Conclusion

    Video1 min

Instructors

Marc Sun

Marc Sun

Machine Learning Engineer at Hugging Face

Younes Belkada

Younes Belkada

Machine Learning Engineer at Hugging Face

Course access is free for a limited time during the DeepLearning.AI learning platform beta!

Want to learn more about Generative AI?

Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!