DocumentFlowAI DocsSign Up
API Reference

Batch Processing

Submit up to 100 documents in a single request. Files are processed in parallel by background workers. The API returns a batch_id immediately — you either poll for status or receive a webhook when processing is complete.

Plan requirement:Batch processing is available on the Starter plan and above. Free plan accounts are limited to single-document extraction via /v1/extract.

Step 1 — Submit the batch

POST/v1/batch

Send files as multipart/form-data. Each file must use the field name files.

ParameterTypeRequiredDescription
filesFile[]RequiredUp to 100 document files. Each field must be named files. Supported formats: PDF, JPG, PNG, WEBP, TIFF, BMP, HEIC, HEIF. Max 10 MB per file.
webhook_urlstringOptionalA public HTTPS URL we will POST results to when the batch completes. See Webhooks section below. Requires Starter plan or above.
callback_idstringOptionalYour own reference ID. Echoed back in the webhook payload so you can correlate batches with your internal jobs.
bash
curl -X POST https://api.documentflowai.com/v1/batch \
  -H "Authorization: Bearer af_YOUR_API_KEY" \
  -F "[email protected]" \
  -F "[email protected]" \
  -F "[email protected]" \
  -F "webhook_url=https://yourserver.com/webhook" \
  -F "callback_id=order_123"
submit_batch.py
import requests

files = [
    ("files", open("invoice1.pdf", "rb")),
    ("files", open("invoice2.pdf", "rb")),
    ("files", open("receipt.png", "rb")),
]

response = requests.post(
    "https://api.documentflowai.com/v1/batch",
    headers={"Authorization": "Bearer af_YOUR_API_KEY"},
    files=files,
    data={
        "webhook_url": "https://yourserver.com/webhook",
        "callback_id": "order_123",
    },
)

batch = response.json()
print(batch["batch_id"])   # e.g. "7dfd45b2-d7b7-40ba-9bbf-2995b5108f35"
print(batch["status"])     # "queued"

Response from Step 1:

submit_response.json
{
  "batch_id": "7dfd45b2-d7b7-40ba-9bbf-2995b5108f35",
  "total_files": 3,
  "status": "queued",
  "estimated_seconds": 9
}

Step 2 — Poll for status

GET/v1/batch/{batch_id}

Poll this endpoint until status is complete. The percent field lets you show a progress bar. If you provided a webhook_url, you can skip polling entirely — we'll call you when done.

poll_status.py
import time, requests

API_KEY = "af_YOUR_API_KEY"
batch_id = "7dfd45b2-d7b7-40ba-9bbf-2995b5108f35"

while True:
    r = requests.get(
        f"https://api.documentflowai.com/v1/batch/{batch_id}",
        headers={"Authorization": f"Bearer {API_KEY}"},
    )
    data = r.json()
    print(f"{data['percent']}% complete ({data['processed']}/{data['total']} files)")

    if data["status"] == "complete":
        download_url = data["download_url"]
        break

    time.sleep(3)
status_response.json
{
  "batch_id": "7dfd45b2-d7b7-40ba-9bbf-2995b5108f35",
  "status": "complete",
  "processed": 3,
  "total": 3,
  "percent": 100,
  "download_url": "https://..."
}

Step 3 — Download results

The download_url is a pre-signed link valid for 24 hours. Fetch it with a plain GET — no auth header required. It returns a JSON array, one object per file, in the same order as submitted.

download_results.py
results = requests.get(download_url).json()

for r in results:
    ext = r["extraction"]
    print(f"File {r['file_index']}: {ext['document_type']} — confidence {ext['confidence_score']:.0%}")

    if "financial" in ext:
        print(f"  Grand total: {ext['financial']['totals']['grand_total']}")
results.json
[
  {
    "file_index": 0,
    "file_key": "batch/7dfd45b2-.../input/0000.pdf",
    "extraction": {
      "document_type": "invoice",
      "document_category": "financial",
      "confidence_score": 0.97,
      "vendor": { "name": "Acme Pvt Ltd", "tax_id": "29AABCA1234K1ZX" },
      "document_meta": { "invoice_number": "INV-001", "date": "2024-07-01", "currency": "INR" },
      "financial": {
        "line_items": [
          { "description": "Consulting Services", "total": 50000.0, "tax_rate": 18.0 }
        ],
        "totals": { "subtotal": 50000.0, "cgst": 4500.0, "sgst": 4500.0, "grand_total": 59000.0 }
      }
    },
    "validation": { "math_valid": true, "errors": [], "warnings": [] },
    "tokens_used": 1284
  }
]

Webhooks (skip polling)

Pass a webhook_url when submitting the batch and we will POST the following payload to your server the moment all files are processed. We retry up to 3 times with exponential backoff if your server returns a non-2xx response.

webhook_payload.json
{
  "batch_id": "7dfd45b2-d7b7-40ba-9bbf-2995b5108f35",
  "status": "complete",
  "download_url": "https://...",
  "callback_id": "order_123"
}

Every webhook request includes an X-Webhook-Signature header. Verify it to confirm the request came from us:

verify_webhook.py
import hmac, hashlib
from fastapi import Request, HTTPException

WEBHOOK_SECRET = "your_webhook_signing_secret"

async def handle_webhook(request: Request):
    body = await request.body()
    sig_header = request.headers.get("x-webhook-signature", "")

    expected = "sha256=" + hmac.new(
        WEBHOOK_SECRET.encode(), body, hashlib.sha256
    ).hexdigest()

    if not hmac.compare_digest(expected, sig_header):
        raise HTTPException(status_code=401, detail="Invalid signature")

    payload = await request.json()
    download_url = payload["download_url"]
    # fetch and process results...
Webhook signing secret:Contact support to obtain your webhook signing secret. Keep it server-side — never expose it in client code.

Limits

LimitValue
Max files per batch100
Max single file size10 MB
Download URL validity24 hours
Webhook retries3 (exponential backoff)
Plans with batch accessStarter, Growth, Enterprise
Plans with webhook accessStarter, Growth, Enterprise