Extract Text (PDF)
Pull the text out of any PDF — born-digital or scanned. Pages with no selectable text are automatically run through on-device OCR, so a scanned document reads as cleanly as an exported one. Optionally render each page to an image so a downstream AI vision model can see the layout, not just the stripped text.
Text extraction and OCR run entirely on your Mac — no network, no AI provider required for the text itself. Extract Text (PDF) pairs naturally with the Directory Changed trigger (drop a PDF in a watched folder) or the Input trigger (pick a file and run by hand), but works on any PDF file path in the incoming payload.
Two outlets, one extraction
A single extraction feeds both outlets — you choose which to wire based on whether you want one result for the whole document or one result per page.
Emits once with the whole document — full text, the pages array, and (with rendering on) the pageImages array. Use it for one answer about the whole PDF.
Fans out — runs everything downstream once per page, each run carrying that page's text, index, and (with rendering on) its single image. Use it for a per-page result.
Ports
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
filePath |
variable picker | {{filePath}} |
The PDF to read. Point this at the upstream file — {{filePath}} from a Directory Changed or Input trigger. |
pageRange |
text | blank (all) | Leave blank for every page, or limit to a range like 1-5. Handy to stay under the vision image cap on long PDFs (see below). |
outputKey |
text | text |
The payload key the extracted text is stored under. |
renderPageImages |
toggle | false | Render page images for vision. Off by default. When on, each page is drawn to an image so a vision model can see it (see How page-image rendering works). Leave off for pure text — rendering every page is extra work. |
Output Payload
These keys are merged onto the incoming payload, so upstream keys (like filePath) are preserved. Which keys you get depends on the outlet you wired.
text
Document: the full document text. Page: that page's text. (Key name is configurable via outputKey.)
pages
Document: an array of {index, text, wasOCR} for every page.
pageCount
Total number of pages in the document.
metadata
Document: document attributes (Title, Author, Producer, dates, …).
index
Page: this page's 0-based index.
wasOCR
Page: true when this page's text came from OCR (a scanned page) rather than the PDF's own text layer.
pageImages
Document: an ordered array of rendered page-image file references — present only when renderPageImages is on. Bind this into an AI node's Image field for vision.
image
Page: this page's single rendered image file reference — present only when renderPageImages is on.
How page-image rendering works (vision)
This is the part that surprises people, so it's worth stating plainly: an AI vision model cannot read a PDF. Vision models accept exactly one visual input — an image (PNG/JPEG). A PDF is not an image. So to let a model look at a page, something has to turn that page into a picture first.
That something is this node. When you turn on Render page images for vision, here's what actually happens:
Render. Each page is rasterized — drawn out to a PNG image — the same way it would look if you exported the page as a picture. The PDF itself doesn't contain these images; the node generates them on the fly.
Reference, not bytes. Each PNG is written to a temporary scratch file, and only a lightweight file reference travels in the payload (pageImages on Document, image on Page). The image data never enters the payload — that's what keeps a 30-page PDF from bloating your execution history.
Read back at the model. When you bind those references into an AI Prompt node's Image field, that node reads the PNGs back off disk and attaches them to the request to a vision-capable model. The scratch files are disposable — they're purged automatically and never become history.
When is it worth it? Born-digital PDFs (exported from Word, Excel, etc.) already have clean, extractable text — for those, text alone is cheaper and just as good. Reach for page images when the visual matters: scanned documents, forms, signatures, stamps, tables, or any layout where the stripped text loses meaning.
Vision requests are token-heavy. Each page image counts against the model's input, and a single AI request is capped at 20 images — on a longer PDF, either route the Page outlet (one image per call) or set a pageRange.
Example: PDF triage with AI
Watch a folder, render each dropped PDF to page images, have a vision model read and summarize it, and notify you with the result.
Wiring that makes it work:
- On the PDF node, set PDF Path to the upstream file and turn Render page images for vision on.
- Connect the Document outlet to the AI node, then set the AI node's Image field to
{{pageImages}}— all pages go to the model in one request. (For a per-page result instead, wire the Page outlet and use{{image}}.) - Pick a vision-capable model on the AI node. A text-only model returns a clear error rather than silently dropping the images.
Sending both the extracted {{text}} (as the prompt) and the page images is the most robust setup — the model can cross-check what it reads against what it sees. See the AI Prompt node for the image-binding details.