LLM Context
By default, the Agent will provide context based on the message history of the thread. This context is used to generate the next message.
The context can include recent messages, as well as messages found via text and /or vector search.
If a promptMessageId
is provided, the context will include that message, as
well as any other messages on that same order
. More details on order are in
messages.mdx, but in practice this means that
if you pass the ID of the user-submitted message as the promptMessageId
and
there had already been some assistant and/or tool responses, those will be
included in the context, allowing the LLM to continue the conversation.
You can also use RAG to add extra context to your prompt.
Customizing the context
You can customize the context provided to the agent when generating messages
with custom contextOptions
. These can be set as defaults on the Agent
, or
provided at the call-site for generateText
or others.
const result = await agent.generateText(
ctx,
{ threadId },
{ prompt },
{
// Values shown are the defaults.
contextOptions: {
// Whether to exclude tool messages in the context.
excludeToolMessages: true,
// How many recent messages to include. These are added after the search
// messages, and do not count against the search limit.
recentMessages: 100,
// Options for searching messages via text and/or vector search.
searchOptions: {
limit: 10, // The maximum number of messages to fetch.
textSearch: false, // Whether to use text search to find messages.
vectorSearch: false, // Whether to use vector search to find messages.
// Note, this is after the limit is applied.
// E.g. this will quadruple the number of messages fetched.
// (two before, and one after each message found in the search)
messageRange: { before: 2, after: 1 },
},
// Whether to search across other threads for relevant messages.
// By default, only the current thread is searched.
searchOtherThreads: false,
},
},
);
Full context control
To have full control over which messages are passed to the LLM, you can either:
- Provide a
contextHandler
to filter, modify, or enrich the context messages. - Provide all messages manually via the
messages
argument and specifycontextOptions
to use no recent or search messages. See below for how to fetch context messages manually.
Providing a contextHandler
The Agent will combine messages from search, recent, input messages, and all
messages on the same order
as the promptMessageId
if that is provided.
You can customize how they are combined, as well as add or remove messages by
providing a contextHandler
which returns the ModelMessage[]
which will be
passed to the LLM.
You can specify a contextHandler
in the Agent constructor, or at the call-site
for a single generation, which overrides any Agent default.
const myAgent = new Agent(components.agent, {
///...
contextHandler: async (ctx, args) => {
// This is the default behavior.
return [
...args.search,
...args.recent,
...args.inputMessages,
...args.inputPrompt,
...args.existingResponses,
];
// Equivalent to:
return args.allMessages;
},
);
With this callback, you can:
- Filter out messages you don't want to include.
- Add memories or other context.
- Add sample messages to guide the LLM on how it should respond.
- Inject extra context based on the user or thread.
- Copy in messages from other threads.
- Summarize messages.
For example:
// Note: when you specify it at the call-site, you can also leverage variables
// available in the scope, e.g. if the user is in a specific step in a workflow.
const result = await agent.generateText(
ctx,
{ threadId },
{ prompt },
{
contextHandler: async (ctx, args) => {
// Filter out messages that are not relevant.
const relevantSearch = args.search.filter((m) => messageIsRelevant(m));
// Fetch user memories to include in every prompt.
const userMemories = await getUserMemories(ctx, args.userId);
// Fetch sample messages to instruct the LLM on how to respond.
const sampleMessages = [
{ role: "user", content: "Generate a function that adds two numbers" },
{ role: "assistant", content: "function add(a, b) { return a + b; }" },
];
// Fetch user context to include in every prompt.
const userContext = await getUserContext(ctx, args.userId, args.threadId);
// Fetch messages from a related / parent thread.
const related = await getRelatedThreadMessages(ctx, args.threadId);
return [
// Summarize or truncate context messages if they are too long.
...(await summarizeOrTruncateIfTooLong(related)),
...relevantSearch,
...userMemories,
...sampleMessages,
...userContext,
...args.recent,
...args.inputMessages,
...args.inputPrompt,
...args.existingResponses,
];
},
},
);
Fetch context manually
If you want to get context messages for a given prompt, without calling the LLM,
you can use fetchContextWithPrompt
. This is used internally to get the context
messages passed to the AI SDK generateText
, streamText
, etc.
As with normal generation, you can provide a prompt
or messages
, and/or a
promptMessageId
to fetch the context messages using a given pre-saved message
as the prompt.
This will return recent and search messages combined with the input messages.
import { fetchContextWithPrompt } from "@convex-dev/agent";
const { messages } = await fetchContextWithPrompt(ctx, components.agent, {
prompt,
messages,
promptMessageId,
userId,
threadId,
contextOptions,
});
Search for messages
This is what the agent does automatically, but it can be useful to do manually, e.g. to find custom context to include.
For text and vector search, you can provide a targetMessageId
and/or
searchText
. It will embed the text for vector search. If searchText
is not
provided, it will use the target message's text.
If targetMessageId
is provided, it will only fetch search messages previous to
that message, and recent messages up to and including that message's "order".
This enables re-generating a response for an earlier message.
import type { MessageDoc } from "@convex-dev/agent";
const messages: MessageDoc[] = await agent.fetchContextMessages(ctx, {
threadId,
searchText: prompt, // Optional unless you want text/vector search.
targetMessageId: promptMessageId, // Optionally target the search.
userId, // Optional, unless `searchOtherThreads` is true.
contextOptions, // Optional, defaults are used if not provided.
});
Note: you can also search for messages without an agent. The main difference is that in order to do vector search, you need to create the embeddings yourself, and it will not run your usage handler.
import { fetchRecentAndSearchMessages } from "@convex-dev/agent";
const { recentMessages, searchMessages } = await fetchRecentAndSearchMessages(
ctx,
components.agent,
{
threadId,
searchText: prompt, // Optional unless you want text/vector search.
targetMessageId: promptMessageId, // Optionally target the search.
contextOptions, // Optional, defaults are used if not provided.
getEmbedding: async (text) => {
const embedding = await textEmbeddingModel.embed(text);
return { embedding, textEmbeddingModel };
},
},
);
Searching other threads
If you set searchOtherThreads
to true
, the agent will search across all
threads belonging to the provided userId
. This can be useful to have multiple
conversations that the Agent can reference.
The search will use a hybrid of text and vector search.
Passing in messages as context
You can pass in messages as context to the Agent's LLM, for instance to implement Retrieval-Augmented Generation. The final messages sent to the LLM will be:
- The system prompt, if one is provided or the agent has
instructions
- The messages found via contextOptions
- The
messages
argument passed intogenerateText
or other function calls. - If a
prompt
argument was provided, a final{ role: "user", content: prompt }
message.
This allows you to pass in messages that are not part of the thread history and will not be saved automatically, but that the LLM will receive as context.
Manage embeddings manually
The textEmbeddingModel
argument to the Agent constructor allows you to specify
a text embedding model to use for vector search.
If you set this, the agent will automatically generate embeddings for messages and use them for vector search.
When you change models or decide to start or stop using embeddings for vector search, you can manage the embeddings manually.
Generate embeddings for a set of messages. Optionally pass config
with a usage
handler, which can be a globally shared Config
.
import { embedMessages } from "@convex-dev/agent";
const embeddings = await embedMessages(
ctx,
{ userId, threadId, textEmbeddingModel, ...config },
[{ role: "user", content: "What is love?" }],
);
Generate and save embeddings for existing messages.
const embeddings = await supportAgent.generateAndSaveEmbeddings(ctx, {
messageIds,
});
Get and update embeddings, e.g. for a migration to a new model.
const messages = await ctx.runQuery(components.agent.vector.index.paginate, {
vectorDimension: 1536,
targetModel: "gpt-4o-mini",
cursor: null,
limit: 10,
});
Updating the embedding by ID.
const messages = await ctx.runQuery(components.agent.vector.index.updateBatch, {
vectors: [{ model: "gpt-4o-mini", vector: embedding, id: msg.embeddingId }],
});
Note: If the dimension changes, you need to delete the old and insert the new.
Delete embeddings
await ctx.runMutation(components.agent.vector.index.deleteBatch, {
ids: [embeddingId1, embeddingId2],
});
Insert embeddings
const ids = await ctx.runMutation(components.agent.vector.index.insertBatch, {
vectorDimension: 1536,
vectors: [
{
model: "gpt-4o-mini",
table: "messages",
userId: "123",
threadId: "123",
vector: embedding,
// Optional, if you want to update the message with the embeddingId
messageId: messageId,
},
],
});