AI Prompt
Send a prompt to an AI model and get a response. Optionally attach an image so a vision-capable model can describe or classify it.
Description
The AI Prompt node is the core building block for adding intelligence to workflows. Send any text prompt with payload variable interpolation, optionally enforce structured JSON output with schema validation, and route the response downstream.
Use it to classify data, extract information, generate text, summarize content, translate languages, or perform any task that a language model can handle in a single turn. With a vision-capable model, it can also see an image — describe what's in a photo, classify a screenshot, or read a receipt.
Ports
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
textarea | — | User prompt with {{variable}} references to payload data. |
image |
variable picker | None | Optional. Selects which payload value holds the image to send to a vision-capable model. The image travels through the normal Input connection — there is no separate image port. Empty means no image is sent — vision is strictly opt-in. From a Photo Added trigger, pick {{filePath}} here. |
systemPrompt |
textarea | — | System instructions that set context and behavior for the model. Optional. |
provider |
provider picker | — | AI provider to use (configured in Settings). |
model |
model picker | — | Model selection. Available models depend on the selected provider. |
temperature |
number | 0.7 |
Controls response randomness. Lower values produce more deterministic output. |
maxTokens |
number | 0 |
Maximum response tokens. Set to 0 for unlimited (model default). |
outputSchema |
textarea | — | JSON Schema for structured output validation. When provided, the response is validated against this schema and retried if invalid. Optional. |
Output Payload
The AI Prompt node adds the following variables to the payload. imageProvided is true when an image was actually sent to the model, so downstream nodes can branch on whether vision ran.
{
"result": "The model's response text (or parsed JSON if outputSchema is set)",
"model": "llama3.1:8b",
"tokens": 142,
"provider": "Ollama",
"imageProvided": false
}
Image / Vision Input
A vision-capable model can look at an image and answer in plain language — for example “a blue bird over the ocean.” The image field picks which payload value is the image — from a Photo Added trigger, pick {{filePath}}. Leave the field empty and no image is sent: vision is strictly opt-in, so the node never ships a file to a model behind your back. The image rides through that normal Input connection — there is no separate image port.
Multiple images & PDF pages
The image field also accepts an ordered array of images. Bind {{pageImages}} from an Extract Text (PDF) node's Document outlet and every rendered page goes to the model in a single request (capped at 20 images; use the PDF node's Page outlet with {{image}} for one call per page instead). Remember a vision model can't read a PDF directly — the PDF node renders each page to an image first; see how that works. Non-image content bound to this field is ignored, so wiring text in by mistake won't break the request.
Supported models
Vision requires a model that supports image input. Supported today:
- Local via Ollama or LM Studio — e.g.
llava,llama3.2-vision,qwen2.5-vl,moondream. - OpenAI-compatible (including OpenRouter) — e.g.
gpt-4o,gpt-4o-mini,gpt-4.1,o4-mini. - Anthropic (Claude) — e.g.
claude-opus-4-8,claude-sonnet-4-6,claude-3-5-sonnet. - Google (Gemini) — e.g.
gemini-2.5-pro,gemini-2.0-flash,gemini-1.5-pro.
Watchflows asks your provider whether the model can actually see images: Ollama and LM Studio report per-model capabilities, and a provider-confirmed vision model is always accepted — even one the app has never heard of. If the provider confirms the model is text-only, the node returns a clear error (“model does not support image input”) instead of silently dropping the image. When the capability can't be verified (cloud providers and unrecognized models), the image is sent anyway with a note in the execution log — if the model truly can't read images, the provider's own error shows up on the node.
Privacy depends on the provider
A local model keeps the image on your Mac. A cloud provider sends the image to that provider. The node inspector shows a label stating which applies to the selected provider.
Formats & size
Apple HEIC photos are automatically transcoded to JPEG; JPEG, PNG, WebP, and GIF are sent as-is. Large images (over roughly 10 MB) are rejected with an error.
Structured Output
When you provide a JSON Schema in the outputSchema field, Watchflows instructs the model to return JSON matching that schema. The response is then validated, and if it fails validation, the node automatically retries the request.
This is useful when downstream nodes need to read specific fields from the AI response. For example, a classification prompt with a schema ensures the output always contains the expected fields:
{
"type": "object",
"properties": {
"category": { "type": "string", "enum": ["bug", "feature", "question"] },
"priority": { "type": "string", "enum": ["low", "medium", "high"] },
"summary": { "type": "string" }
},
"required": ["category", "priority", "summary"]
}
Example Workflow
A workflow that receives webhook data, builds a prompt, sends it to an AI model, and posts the result to Slack:
The Template node builds the prompt from webhook data using {{variable}} interpolation. The AI Prompt processes it and passes the result to an API Request that sends it to Slack.
A vision flow that describes every new photo and notifies you with the description — set the AI node's image field to {{filePath}}:
Pick a vision model on the AI Prompt node and choose {{filePath}} in its Image field. The photo flows in through the Input connection, the model describes it, and the description goes to a Notification (or into a file/folder).