Skip to main content

The pipeline — panel by panel

How it works

A fully automated line that upgrades your PDFs one page at a time — and proves the improvement with a score.

A technician feeding faded scanned pages into a processing machine; crisp readable pages roll out the other side
Dusty scans in. Archival-grade, searchable pages out.

The document upgrade pipeline

  1. 1Upload

    Submit your PDF via REST API, SDK, CLI, or drag-and-drop portal. We hand back a secure presigned upload URL.

  2. 2Triage

    Every page is inspected — production type, script, quality, existing text layer — and a routing decision is made page by page.

  3. 3Process

    Each page is dispatched to the engines best suited to its content — fast readers for clean print, heavyweight vision models for gnarly layouts, specialists for script and handwriting.

  4. 4Assemble

    Processed pages are reassembled into one PDF. Layout, images, and visual appearance are preserved exactly as they came in.

  5. 5Verify

    A per-page Retrievability Score is computed before and after. Pages we did not meaningfully improve are flagged — and never billed.

  6. 6Deliver

    Your upgraded PDF is ready on a signed URL, then queued for automatic deletion on your retention schedule.

Engine routing

No single OCR engine wins on every document. Each page is dispatched to the engine — or stack of engines — most likely to nail it.

Page typeApproach
Standard printed textFast print OCR
Complex tables / formsLayout-aware OCR + vision model
Mathematical equationsDetection + math-aware model
Handwritten (cursive)Detection + heavyweight vision models
Mixed scriptsMultilingual vision model
Low quality / degradedMulti-engine + AI arbitration
Already searchableRe-wrapped as archival PDF/A-3b — no OCR charge
Blank pageSkipped — not charged