RAG (Retrieval-Augmented Generation) with the Agent component
The Agent component has built-in capabilities to search message history with hybrid text & vector search. You can also use the RAG component to use other data to search for context.
What is RAG?
Retrieval-Augmented Generation (RAG) is a technique that allows an LLM to search through custom knowledge bases to answer questions.
RAG combines the power of Large Language Models (LLMs) with knowledge retrieval. Instead of relying solely on the model's training data, RAG allows your AI to:
- Search through custom documents and knowledge bases
- Retrieve relevant context for answering questions
- Provide more accurate, up-to-date, and domain-specific responses
- Cite sources and explain what information was used
RAG Component
The RAG component is a Convex component that allows you to add data that you can search. It breaks up the data into chunks and generates embeddings to use for vector search. See the RAG component docs for details, but here are some key features:
- Namespaces: Use namespaces for user-specific or team-specific data to isolate search domains.
- Add Content: Add or replace text content by key.
- Semantic Search: Vector-based search using configurable embedding models
- Custom Filtering: Define filters on each document for efficient vector search.
- Chunk Context: Get surrounding chunks for better context.
- Importance Weighting: Weight content by providing a 0 to 1 "importance" to affect per-document vector search results.
- Chunking flexibility: Bring your own document chunking, or use the default.
- Graceful Migrations: Migrate content or whole namespaces without disruption.
RAG Approaches
This directory contains two different approaches to implementing RAG:
1. Prompt-based RAG
A straightforward implementation where the system automatically searches for relevant context for a user query.
- The message history will only include the original user prompt and the response, not the context.
- Looks up the context and injects it into the user's prompt.
- Works well if you know the user's question will always benefit from extra context.
For example code, see ragAsPrompt.ts for the overall code. The simplest version is:
const { thread } = await agent.continueThread(ctx, { threadId });
const context = await rag.search(ctx, {
namespace: "global",
query: userPrompt,
limit: 10,
});
const result = await thread.generateText({
prompt: `# Context:\n\n ${context.text}\n\n---\n\n# Question:\n\n"""${userPrompt}\n"""`,
});
2. Tool-based RAG
The LLM can intelligently decide when to search for context or add new information by providing a tool to search for context.
- The message history will include the original user prompt and message history.
- After a tool call and response, the message history will include the tool call and response for the LLM to reference.
- The LLM can decide when to search for context or add new information.
- This works well if you want the Agent to be able to dynamically search.
See ragAsTools.ts for the code. The simplest version is:
searchContext: createTool({
description: "Search for context related to this user prompt",
args: z.object({ query: z.string().describe("Describe the context you're looking for") }),
handler: async (ctx, { query }) => {
const context = await rag.search(ctx, { namespace: userId, query });
return context.text;
},
}),
Key Differences
Feature | Basic RAG | Tool-based RAG |
---|---|---|
Context Search | Always searches | AI decides when to search |
Adding Context | Manual via separate function | AI can add context during conversation |
Flexibility | Simple, predictable | Intelligent, adaptive |
Use Case | FAQ systems, document search | Dynamic knowledge management |
Predictability | Defined by code | AI may query too much or little |
Ingesting content
On the whole, the RAG component works with text. However, you can turn other files into text, either using parsing tools or asking an LLM to do it.
Parsing images
Image parsing does oddly well with LLMs. You can use generateText
to describe
and transcribe the image, and then use that description to search for relevant
context. And by storing the associated image, you can then pass the original
file around once you've retrieved it via searching.
const description = await thread.generateText({
message: {
role: "user",
content: [{ type: "image", data: url, mimeType: blob.type }],
},
});
Parsing PDFs
For PDF parsing, I suggest using Pdf.js in the browser.
Why not server-side?
Opening up the pdf can use hundreds of MB of memory, and requires downloading a big pdfjs bundle - so big it's usually fetched dynamically in practice. You probably wouldn't want to load that bundle on every function call server-side, and you're more limited on memory usage in serverless environments. If the browser already has the file, it's a pretty good environment to do the heavy lifting in (and free!).
There's an example in the RAG demo, used in the UI here, with Pdf.js served statically.
If you really want to do it server-side and don't worry about cost or latency, you can pass it to an LLM, but note it takes a long time for big files.
Parsing text files
Generally you can use text files directly, for code or markdown or anything with a natural structure an LLM can understand.
However, to get good embeddings, you can once again use an LLM to translate the text into a more structured format.
Examples in Action
To see these examples in action, check out the RAG example.
- Adding text, pdf, and image content to the RAG component
- Searching and generating text based on the context.
- Introspecting the context produced by searching.
- Browsing the chunks of documents produced.
- Try out searching globally, per-user, or with custom filters.
Run the example with:
git clone https://github.com/get-convex/rag.git
cd rag
npm run setup
npm run example