Skip to main content

LLM Context

By default, the Agent will provide context based on the message history of the thread. This context is used to generate the next message.

The context can include recent messages, as well as messages found via text and /or vector search.

If a promptMessageId is provided, the context will include that message, as well as any other messages on that same order. More details on order are in messages.mdx, but in practice this means that if you pass the ID of the user-submitted message as the promptMessageId and there had already been some assistant and/or tool responses, those will be included in the context, allowing the LLM to continue the conversation.

You can also use RAG to add extra context to your prompt.

Customizing the context

You can customize the context provided to the agent when generating messages with custom contextOptions. These can be set as defaults on the Agent, or provided at the call-site for generateText or others.

const result = await agent.generateText(
ctx,
{ threadId },
{ prompt },
{
// Values shown are the defaults.
contextOptions: {
// Whether to exclude tool messages in the context.
excludeToolMessages: true,
// How many recent messages to include. These are added after the search
// messages, and do not count against the search limit.
recentMessages: 100,
// Options for searching messages via text and/or vector search.
searchOptions: {
limit: 10, // The maximum number of messages to fetch.
textSearch: false, // Whether to use text search to find messages.
vectorSearch: false, // Whether to use vector search to find messages.
// Note, this is after the limit is applied.
// E.g. this will quadruple the number of messages fetched.
// (two before, and one after each message found in the search)
messageRange: { before: 2, after: 1 },
},
// Whether to search across other threads for relevant messages.
// By default, only the current thread is searched.
searchOtherThreads: false,
},
},
);

Full context control

To have full control over which messages are passed to the LLM, you can either:

  1. Provide a contextHandler to filter, modify, or enrich the context messages.
  2. Provide all messages manually via the messages argument and specify contextOptions to use no recent or search messages. See below for how to fetch context messages manually.

Providing a contextHandler

The Agent will combine messages from search, recent, input messages, and all messages on the same order as the promptMessageId if that is provided.

You can customize how they are combined, as well as add or remove messages by providing a contextHandler which returns the ModelMessage[] which will be passed to the LLM.

You can specify a contextHandler in the Agent constructor, or at the call-site for a single generation, which overrides any Agent default.

const myAgent = new Agent(components.agent, {
///...
contextHandler: async (ctx, args) => {
// This is the default behavior.
return [
...args.search,
...args.recent,
...args.inputMessages,
...args.inputPrompt,
...args.existingResponses,
];
// Equivalent to:
return args.allMessages;
},
);

With this callback, you can:

  1. Filter out messages you don't want to include.
  2. Add memories or other context.
  3. Add sample messages to guide the LLM on how it should respond.
  4. Inject extra context based on the user or thread.
  5. Copy in messages from other threads.
  6. Summarize messages.

For example:

// Note: when you specify it at the call-site, you can also leverage variables
// available in the scope, e.g. if the user is in a specific step in a workflow.
const result = await agent.generateText(
ctx,
{ threadId },
{ prompt },
{
contextHandler: async (ctx, args) => {
// Filter out messages that are not relevant.
const relevantSearch = args.search.filter((m) => messageIsRelevant(m));
// Fetch user memories to include in every prompt.
const userMemories = await getUserMemories(ctx, args.userId);
// Fetch sample messages to instruct the LLM on how to respond.
const sampleMessages = [
{ role: "user", content: "Generate a function that adds two numbers" },
{ role: "assistant", content: "function add(a, b) { return a + b; }" },
];
// Fetch user context to include in every prompt.
const userContext = await getUserContext(ctx, args.userId, args.threadId);
// Fetch messages from a related / parent thread.
const related = await getRelatedThreadMessages(ctx, args.threadId);
return [
// Summarize or truncate context messages if they are too long.
...(await summarizeOrTruncateIfTooLong(related)),
...relevantSearch,
...userMemories,
...sampleMessages,
...userContext,
...args.recent,
...args.inputMessages,
...args.inputPrompt,
...args.existingResponses,
];
},
},
);

Fetch context manually

If you want to get context messages for a given prompt, without calling the LLM, you can use fetchContextWithPrompt. This is used internally to get the context messages passed to the AI SDK generateText, streamText, etc.

As with normal generation, you can provide a prompt or messages, and/or a promptMessageId to fetch the context messages using a given pre-saved message as the prompt.

This will return recent and search messages combined with the input messages.

import { fetchContextWithPrompt } from "@convex-dev/agent";

const { messages } = await fetchContextWithPrompt(ctx, components.agent, {
prompt,
messages,
promptMessageId,
userId,
threadId,
contextOptions,
});

Search for messages

This is what the agent does automatically, but it can be useful to do manually, e.g. to find custom context to include.

For text and vector search, you can provide a targetMessageId and/or searchText. It will embed the text for vector search. If searchText is not provided, it will use the target message's text.

If targetMessageId is provided, it will only fetch search messages previous to that message, and recent messages up to and including that message's "order". This enables re-generating a response for an earlier message.

import type { MessageDoc } from "@convex-dev/agent";

const messages: MessageDoc[] = await agent.fetchContextMessages(ctx, {
threadId,
searchText: prompt, // Optional unless you want text/vector search.
targetMessageId: promptMessageId, // Optionally target the search.
userId, // Optional, unless `searchOtherThreads` is true.
contextOptions, // Optional, defaults are used if not provided.
});

Note: you can also search for messages without an agent. The main difference is that in order to do vector search, you need to create the embeddings yourself, and it will not run your usage handler.

import { fetchRecentAndSearchMessages } from "@convex-dev/agent";

const { recentMessages, searchMessages } = await fetchRecentAndSearchMessages(
ctx,
components.agent,
{
threadId,
searchText: prompt, // Optional unless you want text/vector search.
targetMessageId: promptMessageId, // Optionally target the search.
contextOptions, // Optional, defaults are used if not provided.
getEmbedding: async (text) => {
const embedding = await textEmbeddingModel.embed(text);
return { embedding, textEmbeddingModel };
},
},
);

Searching other threads

If you set searchOtherThreads to true, the agent will search across all threads belonging to the provided userId. This can be useful to have multiple conversations that the Agent can reference.

The search will use a hybrid of text and vector search.

Passing in messages as context

You can pass in messages as context to the Agent's LLM, for instance to implement Retrieval-Augmented Generation. The final messages sent to the LLM will be:

  1. The system prompt, if one is provided or the agent has instructions
  2. The messages found via contextOptions
  3. The messages argument passed into generateText or other function calls.
  4. If a prompt argument was provided, a final { role: "user", content: prompt } message.

This allows you to pass in messages that are not part of the thread history and will not be saved automatically, but that the LLM will receive as context.

Manage embeddings manually

The textEmbeddingModel argument to the Agent constructor allows you to specify a text embedding model to use for vector search.

If you set this, the agent will automatically generate embeddings for messages and use them for vector search.

When you change models or decide to start or stop using embeddings for vector search, you can manage the embeddings manually.

Generate embeddings for a set of messages. Optionally pass config with a usage handler, which can be a globally shared Config.

import { embedMessages } from "@convex-dev/agent";

const embeddings = await embedMessages(
ctx,
{ userId, threadId, textEmbeddingModel, ...config },
[{ role: "user", content: "What is love?" }],
);

Generate and save embeddings for existing messages.

const embeddings = await supportAgent.generateAndSaveEmbeddings(ctx, {
messageIds,
});

Get and update embeddings, e.g. for a migration to a new model.

const messages = await ctx.runQuery(components.agent.vector.index.paginate, {
vectorDimension: 1536,
targetModel: "gpt-4o-mini",
cursor: null,
limit: 10,
});

Updating the embedding by ID.

const messages = await ctx.runQuery(components.agent.vector.index.updateBatch, {
vectors: [{ model: "gpt-4o-mini", vector: embedding, id: msg.embeddingId }],
});

Note: If the dimension changes, you need to delete the old and insert the new.

Delete embeddings

await ctx.runMutation(components.agent.vector.index.deleteBatch, {
ids: [embeddingId1, embeddingId2],
});

Insert embeddings

const ids = await ctx.runMutation(components.agent.vector.index.insertBatch, {
vectorDimension: 1536,
vectors: [
{
model: "gpt-4o-mini",
table: "messages",
userId: "123",
threadId: "123",
vector: embedding,
// Optional, if you want to update the message with the embeddingId
messageId: messageId,
},
],
});