🧠

AI Prompt

ai.prompt

Send a prompt to an AI model and get a response. Optionally attach an image so a vision-capable model can describe or classify it.

Description

The AI Prompt node is the core building block for adding intelligence to workflows. Send any text prompt with payload variable interpolation, optionally enforce structured JSON output with schema validation, and route the response downstream.

Use it to classify data, extract information, generate text, summarize content, translate languages, or perform any task that a language model can handle in a single turn. With a vision-capable model, it can also see an image — describe what's in a photo, classify a screenshot, or read a receipt.

Ports

Inlets
Input (any)
Outlets
Result (string)

Configuration

Parameter Type Default Description
prompt textarea User prompt with {{variable}} references to payload data.
image variable picker None Optional. Selects which payload value holds the image to send to a vision-capable model. The image travels through the normal Input connection — there is no separate image port. Empty means no image is sent — vision is strictly opt-in. From a Photo Added trigger, pick {{filePath}} here.
systemPrompt textarea System instructions that set context and behavior for the model. Optional.
provider provider picker AI provider to use (configured in Settings).
model model picker Model selection. Available models depend on the selected provider.
temperature number 0.7 Controls response randomness. Lower values produce more deterministic output.
maxTokens number 0 Maximum response tokens. Set to 0 for unlimited (model default).
outputSchema textarea JSON Schema for structured output validation. When provided, the response is validated against this schema and retried if invalid. Optional.

Output Payload

The AI Prompt node adds the following variables to the payload. imageProvided is true when an image was actually sent to the model, so downstream nodes can branch on whether vision ran.

{
  "result": "The model's response text (or parsed JSON if outputSchema is set)",
  "model": "llama3.1:8b",
  "tokens": 142,
  "provider": "Ollama",
  "imageProvided": false
}

Image / Vision Input

A vision-capable model can look at an image and answer in plain language — for example “a blue bird over the ocean.” The image field picks which payload value is the image — from a Photo Added trigger, pick {{filePath}}. Leave the field empty and no image is sent: vision is strictly opt-in, so the node never ships a file to a model behind your back. The image rides through that normal Input connection — there is no separate image port.

Multiple images & PDF pages

The image field also accepts an ordered array of images. Bind {{pageImages}} from an Extract Text (PDF) node's Document outlet and every rendered page goes to the model in a single request (capped at 20 images; use the PDF node's Page outlet with {{image}} for one call per page instead). Remember a vision model can't read a PDF directly — the PDF node renders each page to an image first; see how that works. Non-image content bound to this field is ignored, so wiring text in by mistake won't break the request.

Supported models

Vision requires a model that supports image input. Supported today:

  • Local via Ollama or LM Studio — e.g. llava, llama3.2-vision, qwen2.5-vl, moondream.
  • OpenAI-compatible (including OpenRouter) — e.g. gpt-4o, gpt-4o-mini, gpt-4.1, o4-mini.
  • Anthropic (Claude) — e.g. claude-opus-4-8, claude-sonnet-4-6, claude-3-5-sonnet.
  • Google (Gemini) — e.g. gemini-2.5-pro, gemini-2.0-flash, gemini-1.5-pro.

Watchflows asks your provider whether the model can actually see images: Ollama and LM Studio report per-model capabilities, and a provider-confirmed vision model is always accepted — even one the app has never heard of. If the provider confirms the model is text-only, the node returns a clear error (“model does not support image input”) instead of silently dropping the image. When the capability can't be verified (cloud providers and unrecognized models), the image is sent anyway with a note in the execution log — if the model truly can't read images, the provider's own error shows up on the node.

Privacy depends on the provider

A local model keeps the image on your Mac. A cloud provider sends the image to that provider. The node inspector shows a label stating which applies to the selected provider.

Formats & size

Apple HEIC photos are automatically transcoded to JPEG; JPEG, PNG, WebP, and GIF are sent as-is. Large images (over roughly 10 MB) are rejected with an error.

Structured Output

When you provide a JSON Schema in the outputSchema field, Watchflows instructs the model to return JSON matching that schema. The response is then validated, and if it fails validation, the node automatically retries the request.

This is useful when downstream nodes need to read specific fields from the AI response. For example, a classification prompt with a schema ensures the output always contains the expected fields:

{
  "type": "object",
  "properties": {
    "category": { "type": "string", "enum": ["bug", "feature", "question"] },
    "priority": { "type": "string", "enum": ["low", "medium", "high"] },
    "summary": { "type": "string" }
  },
  "required": ["category", "priority", "summary"]
}

Example Workflow

A workflow that receives webhook data, builds a prompt, sends it to an AI model, and posts the result to Slack:

The Template node builds the prompt from webhook data using {{variable}} interpolation. The AI Prompt processes it and passes the result to an API Request that sends it to Slack.

A vision flow that describes every new photo and notifies you with the description — set the AI node's image field to {{filePath}}:

Pick a vision model on the AI Prompt node and choose {{filePath}} in its Image field. The photo flows in through the Input connection, the model describes it, and the description goes to a Notification (or into a file/folder).