Short CourseBeginner1 Hour 19 Minutes

Introducing Multimodal Llama 3.2

Instructor: Amit Sangani

Meta
  • Beginner
  • 1 Hour 19 Minutes
  • 9 Video Lessons
  • 6 Code Examples
  • Instructor: Amit Sangani
    • Meta
    Meta

What you'll learn

  • Explore the features of the new Llama 3.2 model, from image classification, vision reasoning to tool use.

  • Learn the details of Llama 3.2 prompting, tokenization, built-in and custom tool calling.

  • Gain knowledge of the Llama stack, which is a standardized interface for building AI applications.

About this course

Join our new short course, Introducing Multimodal Llama 3.2, and learn from Amit Sangani, Senior Director of AI Partner Engineering at Meta, to learn all about the latest additions to the Llama models 3.1 and 3.2, from custom tool calling to multimodality and the new Llama stack.

Open models are a key building block of AI and a key enabler of AI research. With Meta’s family of open models, anyone can download, customize, fine-tune, or build new applications on top of them, allowing AI innovation. The Llama model family now ranges from 1B model parameters to its 405B foundation model, allowing for diverse use cases and applications.

In this course, you’ll learn about the new vision capabilities that Llama 3.2 brings to the Llama family. You’ll learn how to leverage this along with tool-calling, and Llama Stack, which is an open-source orchestration layer for building on top of the Llama family of models.

In detail, you’ll: 

  • Learn about the new models, how they were trained, their features, and how they fit into the Llama family.
  • Understand how to do multimodal prompting with Llama and work on advanced image reasoning use cases such as understanding errors on a car dashboard, adding up the total of three restaurant receipts, grading written math homework, and many more.
  • Learn different roles—system, user, assistant, ipython—in the Llama 3.1 and 3.2 family and the prompt format that identifies those roles.
  • Understand how Llama uses the tiktoken tokenizer, and how it has expanded to a 128k vocabulary size that improves encoding efficiency and enables support for seven non-English languages.
  • Learn how to prompt Llama to call both built-in and custom tools with examples for web search and solving math equations.
  • Learn about ‘Llama Stack API’, which is a standardized interface for canonical toolchain components like fine-tuning or synthetic data generation to customize Llama models and build agentic applications.

Start building exciting applications on Llama!

Who should join?

Anyone who has basic Python knowledge and wants to learn to quickly build on Llama and Llama stack.

Course Outline

9 Lessons・6 Code Examples
  • Introduction

    Video3 mins

  • Overview of Llama 3.2

    Video5 mins

  • Multimodal Prompting

    Video with code examples10 mins

  • Multimodal Use Cases

    Video with code examples14 mins

  • Prompt Format

    Video with code examples11 mins

  • Tokenization

    Video with code examples7 mins

  • Tool Calling

    Video with code examples19 mins

  • Llama Stack

    Video6 mins

  • Conclusion

    Video1 min

  • Appendix – Tips and Help

    Code examples1 min

Instructor

Amit Sangani

Amit Sangani

Senior Director of Partner Engineering of Meta

Course access is free for a limited time during the DeepLearning.AI learning platform beta!

Want to learn more about Generative AI?

Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!