Files and Images in Agent messages
You can add images and files for the LLM to reference in the messages.
NOTE: Sending URLs to LLMs is much easier with the cloud backend, since it has
publicly available storage URLs. To develop locally you can use ngrok
or
similar to proxy the traffic.
Example code:
- files/autoSave.ts has a simple example of how to use the automatic file saving.
- files/addFile.ts has an example of how to save the file, submit a question, and generate a response in separate steps.
- files/generateImage.ts has an example of how to generate an image and save it in an assistant message.
- FilesImages.tsx has client-side code.
Running the example
git clone https://github.com/get-convex/agent.git
cd agent
npm run setup
npm run example
Sending an image by uploading first and generating asynchronously
The standard approach is to:
- Upload the file to the database (
uploadFile
action). Note: this can be in a regular action or in an httpAction, depending on what's more convenient. - Send a message to the thread (
submitFileQuestion
action) - Send the file to the LLM to generate / stream text asynchronously
(
generateResponse
action) - Query for the messages from the thread (
listThreadMessages
query)
Rationale:
It's better to submit a message in a mutation vs. an action because you can use an optimistic update on the client side to show the sent message immediately and have it disappear exactly when the message comes down in the query.
However, you can't save to file storage from a mutation, so the file needs to already exist (hence the fileId).
You can then asynchronously generate the response (with retries / etc) without the client waiting.
1: Saving the file
import { storeFile } from "@convex-dev/agent";
import { components } from "./_generated/api";
const { file } = await storeFile(
ctx,
components.agent,
new Blob([bytes], { type: mimeType }),
filename,
sha256,
);
const { fileId, url, storageId } = file;
2: Sending the message
// in your mutation
const { filePart, imagePart } = await getFile(ctx, components.agent, fileId);
const { messageId } = await fileAgent.saveMessage(ctx, {
threadId,
message: {
role: "user",
content: [
imagePart ?? filePart, // if it's an image, prefer that kind.
{ type: "text", text: "What is this image?" },
],
},
metadata: { fileIds: [fileId] }, // IMPORTANT: this tracks the file usage.
});
3: Generating the response & querying the responses
This is done in the same way as text inputs.
// in an action
await thread.generateText({ promptMessageId: messageId });
// in a query
const messages = await agent.listMessages(ctx, { threadId, paginationOpts });
Inline saving approach
You can also pass in an image / file direction when generating text, if you're
in an action. Any image or file passed in the message
argument will
automatically be saved in file storage if it's larger than 64k, and a fileId
will be saved to the message.
Example:
await thread.generateText({
message: {
role: "user",
content: [
{ type: "image", image: imageBytes, mimeType: "image/png" },
{ type: "text", text: "What is this image?" },
],
},
});
Under the hood
Saving to the files has 3 components:
- Saving to file storage (in your app, not in the component's storage). This
means you can access it directly with the
storageId
and generate URLs. - Saving a reference (the storageId) to the file in the component. This will automatically keep track of how many messages are referencing the file, so you can vacuum files that are no longer used (see files/vacuum.ts).
- Inserting a URL in place of the data in the message sent to the LLM, along
with the mimeType and other metadata provided. It will be inferred if not
provided in
guessMimeType
.
Can I just store the file myself an pass in a URL?
Yes! You can always pass a URL in the place of an image or file to the LLM.
const storageId = await ctx.storage.store(blob);
const url = await ctx.storage.getUrl(storageId);
await thread.generateText({
message: {
role: "user",
content: [
{ type: "image", data: url, mimeType: blob.type },
{ type: "text", text: "What is this image?" },
],
},
});
Generating images
There's an example in files/generateImage.ts that takes a prompt, generates an image with OpenAI's dall-e 2, then saves the image to a thread.
You can try it out with:
npx convex run files:generateImage:replyWithImage '{prompt: "make a picture of a cat" }'