Your Guide to Generative AI Courses

A great way to learn about GenAI is to build a project. Check below for a description of a project and a list of courses that can help you with theory and example code.

Retrieval Augmented Generation: Question Answering Over Documents

RAG over documents has two phases. In the first phase the documents are processed and installed in a database, often a vector database. The second phase is retrieving the data as shown below:

Let’s go over what these steps are and then below what courses you can take to get up to speed on them. The explanations will be brief. You can take a course to learn more!

  • Extract: Documents come in all sorts of file formats (.doc, .pdf, etc.) and contain all sorts of data formats (text, tables, images, movies). These must be extracted and put into a format that can be processed by the next stages. 
  • Chunking: Text data is broken into smaller chunks – a process inventively named ‘chunking’.
  • Embedding:  Converting a chunk into a ‘dense vector’ that represents the meaning of the text. 
  • Loading: Adding the embedding and original data to a database.
  • Database
    • The database is going to provide storage for the embedding and data. Often these are vector databases due to the embedding, but graph databases and traditional databases are also used.
  • Query
    • Embedding: The query is converted to a dense vector using the same embedding model.
    • Retrieval: The stored and query vectors represent meaning, so retrieval is the process of finding the k entries in the database that are ‘closest’ to the query vector. Lots of details here!
    • k results are provided to an LLM which uses them to form an ‘augmented’  response.

Courses

The courses you might find useful may depend on what stage you are at. You may want general or specific information.

General Overview

These courses provide an overview of the RAG process. An easy way to get started is to use a framework such as Langchain or LlamaIndex.

  • LangChain: Chat with Your Data uses the LangChain framework to build a RAG pipeline. It provides a nice overview, discusses all the steps and in the end builds a pipeline and GUI you could use in your applications.
  • Building and Evaluating Advanced RAG Applications uses the LlamaIndex framework. It also provides a nice overview and builds a RAG application. In addition, it discusses evaluation techniques with the TruEra evaluation tool set.
  • Building Agentic RAG with LlamaIndex uses the LlamaIndex framework to implement an agentic version of RAG. This is a little more advanced.

Pipeline Specifics

You may want more details on some of the steps above. Many courses have focused on a particular aspect of the pipeline.