Cory Zue

Accepted Talks:

How do domain-specific chatbots work? An Overview of Retrieval Augmented Generation (RAG)

There’s a popular open-source Python library called LangChain that can create chatbots that—among other things—do Q&A over any website/document in 3 lines of code. Here’s an example of that from the langchain docs.

from langchain.document_loaders import WebBaseLoader

from langchain.indexes import VectorstoreIndexCreator

loader = WebBaseLoader("http://www.paulgraham.com/greatwork.html")

index = VectorstoreIndexCreator().from_loaders([loader])

index.query("What should I work on?")

Which outputs an answer specific to Paul Graham’s essay:

"The work you choose should ideally have three qualities: it should be something you have a natural aptitude for, something you have a deep interest in, and something that offers scope to do great work. If you're unsure, you could start by working on your own projects that seem excitingly ambitious to you..."

The first time you run this it feels like pure magic. How does this work?

The answer is a process called retrieval augmented generation, or RAG for short. It is a remarkably simple concept, though also has incredible depth in the details of its implementation.

This entry-level talk will provide a high-level overview of RAG, suitable for all audiences. We’ll start from the big picture workflow of what’s happening, and then zoom in on all the individual pieces. By the end of it, you should have a solid understanding of how those three magic lines of Python code work, and all the principles involved in creating these Q&A bots. It will include details about LLMs, Embeddings, Vector Databases, and more.