Short CourseBeginner1 Hour 22 Minutes

Prompt Engineering for Vision Models

Instructors: Abby Morgan, Jacques Verré, Caleb Kaiser

Comet
  • Beginner
  • 1 Hour 22 Minutes
  • 7 Video Lessons
  • 5 Code Examples
  • Instructors: Abby Morgan, Jacques Verré, Caleb Kaiser
    • Comet
    Comet

What you'll learn

  • Prompt vision models with text, coordinates, and bounding boxes, and tune hyper-parameters like guidance scale, strength, and number of inference steps.

  • Replace parts of an image with generated content with in-painting, a technique that combines object detection, image segmentation, and image generation.

  • Fine-tune a diffusion model to have even more control over your image generation to create specific images, including your own, rather than generically generated images.

About this course

Prompt engineering is used not only in text models but also in vision models. Depending on the vision model, they may use text prompts, but can also work with pixel coordinates, bounding boxes, or segmentation masks.

In this course, you’ll learn to prompt different vision models like Meta’s Segment Anything Model (SAM), a universal image segmentation model, OWL-ViT, a zero-shot object detection model, and Stable Diffusion 2.0, a widely used diffusion model. You’ll also use a fine-tuning technique called DreamBooth to tune a diffusion model to associate a text label with an object of your preference.

In detail, you’ll explore:

  • Image Generation: Prompt with text and by adjusting hyperparameters like strength, guidance scale, and number of inference steps. 
  • Image Segmentation: Prompt with positive or negative coordinates, and with bounding box coordinates.
  • Object detection: Prompt with natural language to produce a bounding box to isolate specific objects within images.
  • In-painting: Combine the above techniques to replace objects within an image with generated content.
  • Personalization with Fine-tuning: Generate custom images based on pictures of people or places that you provide, using a fine-tuning technique called DreamBooth.
  • Iterating and Experiment Tracking: Prompting and hyperparameter tuning are iterative processes, and therefore experiment tracking can help to identify the most effective combinations. This course will use Comet, a library to track experiments and optimize visual prompt engineering workflows.

Who should join?

Prompt Engineering for Vision Models is a hands-on course that helps you get started with prompting vision models. Python experience is recommended.

Course Outline

7 Lessons・5 Code Examples
  • Introduction

    Video6 mins

  • Overview

    Video7 mins

  • Image Segmentation

    Video with code examples12 mins

  • Object Detection

    Video with code examples23 mins

  • Image Generation

    Video with code examples11 mins

  • Fine-tuning

    Video with code examples20 mins

  • Conclusion

    Video1 min

  • Appendix

    Code examples1 min

Instructors

Abby Morgan

Abby Morgan

Machine Learning Engineer at Comet

Jacques Verré

Jacques Verré

Head of Product at Comet

Caleb Kaiser

Caleb Kaiser

Machine Learning Engineer at Comet

Course access is free for a limited time during the DeepLearning.AI learning platform beta!

Want to learn more about Generative AI?

Keep learning with updates on curated AI news, courses, and events, as well as Andrew’s thoughts from DeepLearning.AI!