Short CourseIntermediate1 Hour 16 Minutes

Reinforcement Learning from Human Feedback

Instructor: Nikita Namjoshi

Google Cloud
  • Intermediate
  • 1 Hour 16 Minutes
  • 6 Video Lessons
  • 4 Code Examples
  • Instructor: Nikita Namjoshi
    • Google Cloud
    Google Cloud

What you'll learn

  • Get a conceptual understanding of Reinforcement Learning from Human Feedback (RLHF), as well as the datasets needed for this technique

  • Fine-tune the Llama 2 model using RLHF with the open source Google Cloud Pipeline Components Library

  • Evaluate tuned model performance against the base model with evaluation methods

About this course

Large language models (LLMs) are trained on human-generated text, but additional methods are needed to align an LLM with human values and preferences.

Reinforcement Learning from Human Feedback (RLHF) is currently the main method for aligning LLMs with human values and preferences. RLHF is also used for further tuning a base LLM to align with values and preferences that are specific to your use case.  

In this course, you will gain a conceptual understanding of the RLHF training process, and then practice applying RLHF to tune an LLM. You will: 

  • Explore the two datasets that are used in RLHF training: the “preference” and “prompt” datasets.
  • Use the open source Google Cloud Pipeline Components Library, to fine-tune the Llama 2 model with RLHF.
  • Assess the tuned LLM against the original base model by comparing loss curves and using the “Side-by-Side (SxS)” method.

Who should join?

Anyone with intermediate Python knowledge who’s interested in learning about using the Reinforcement Learning from Human Feedback technique.

Course Outline

6 Lessons・4 Code Examples
  • Introduction

    Video4 mins

  • How does RLHF work

    Video12 mins

  • Datasets for RL training

    Video with code examples9 mins

  • Tune an LLM with RLHF

    Video with code examples24 mins

  • Evaluate the tuned model

    Video with code examples22 mins

  • Google Cloud Setup

    Code examples1 min

  • Conclusion

    Video4 mins

Instructor

Nikita Namjoshi

Nikita Namjoshi

Developer Advocate at Google Cloud

Course access is free for a limited time during the DeepLearning.AI learning platform beta!

Want to learn more about Generative AI?

Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!