Batch Processing
Submit up to 100 documents in a single request. Files are processed in parallel by background workers. The API returns a batch_id immediately — you either poll for status or receive a webhook when processing is complete.
/v1/extract.Step 1 — Submit the batch
/v1/batchSend files as multipart/form-data. Each file must use the field name files.
| Parameter | Type | Required | Description |
|---|---|---|---|
files | File[] | Required | Up to 100 document files. Each field must be named files. Supported formats: PDF, JPG, PNG, WEBP, TIFF, BMP, HEIC, HEIF. Max 10 MB per file. |
webhook_url | string | Optional | A public HTTPS URL we will POST results to when the batch completes. See Webhooks section below. Requires Starter plan or above. |
callback_id | string | Optional | Your own reference ID. Echoed back in the webhook payload so you can correlate batches with your internal jobs. |
curl -X POST https://api.documentflowai.com/v1/batch \
-H "Authorization: Bearer af_YOUR_API_KEY" \
-F "[email protected]" \
-F "[email protected]" \
-F "[email protected]" \
-F "webhook_url=https://yourserver.com/webhook" \
-F "callback_id=order_123"import requests
files = [
("files", open("invoice1.pdf", "rb")),
("files", open("invoice2.pdf", "rb")),
("files", open("receipt.png", "rb")),
]
response = requests.post(
"https://api.documentflowai.com/v1/batch",
headers={"Authorization": "Bearer af_YOUR_API_KEY"},
files=files,
data={
"webhook_url": "https://yourserver.com/webhook",
"callback_id": "order_123",
},
)
batch = response.json()
print(batch["batch_id"]) # e.g. "7dfd45b2-d7b7-40ba-9bbf-2995b5108f35"
print(batch["status"]) # "queued"Response from Step 1:
{
"batch_id": "7dfd45b2-d7b7-40ba-9bbf-2995b5108f35",
"total_files": 3,
"status": "queued",
"estimated_seconds": 9
}Step 2 — Poll for status
/v1/batch/{batch_id}Poll this endpoint until status is complete. The percent field lets you show a progress bar. If you provided a webhook_url, you can skip polling entirely — we'll call you when done.
import time, requests
API_KEY = "af_YOUR_API_KEY"
batch_id = "7dfd45b2-d7b7-40ba-9bbf-2995b5108f35"
while True:
r = requests.get(
f"https://api.documentflowai.com/v1/batch/{batch_id}",
headers={"Authorization": f"Bearer {API_KEY}"},
)
data = r.json()
print(f"{data['percent']}% complete ({data['processed']}/{data['total']} files)")
if data["status"] == "complete":
download_url = data["download_url"]
break
time.sleep(3){
"batch_id": "7dfd45b2-d7b7-40ba-9bbf-2995b5108f35",
"status": "complete",
"processed": 3,
"total": 3,
"percent": 100,
"download_url": "https://..."
}Step 3 — Download results
The download_url is a pre-signed link valid for 24 hours. Fetch it with a plain GET — no auth header required. It returns a JSON array, one object per file, in the same order as submitted.
results = requests.get(download_url).json()
for r in results:
ext = r["extraction"]
print(f"File {r['file_index']}: {ext['document_type']} — confidence {ext['confidence_score']:.0%}")
if "financial" in ext:
print(f" Grand total: {ext['financial']['totals']['grand_total']}")[
{
"file_index": 0,
"file_key": "batch/7dfd45b2-.../input/0000.pdf",
"extraction": {
"document_type": "invoice",
"document_category": "financial",
"confidence_score": 0.97,
"vendor": { "name": "Acme Pvt Ltd", "tax_id": "29AABCA1234K1ZX" },
"document_meta": { "invoice_number": "INV-001", "date": "2024-07-01", "currency": "INR" },
"financial": {
"line_items": [
{ "description": "Consulting Services", "total": 50000.0, "tax_rate": 18.0 }
],
"totals": { "subtotal": 50000.0, "cgst": 4500.0, "sgst": 4500.0, "grand_total": 59000.0 }
}
},
"validation": { "math_valid": true, "errors": [], "warnings": [] },
"tokens_used": 1284
}
]Webhooks (skip polling)
Pass a webhook_url when submitting the batch and we will POST the following payload to your server the moment all files are processed. We retry up to 3 times with exponential backoff if your server returns a non-2xx response.
{
"batch_id": "7dfd45b2-d7b7-40ba-9bbf-2995b5108f35",
"status": "complete",
"download_url": "https://...",
"callback_id": "order_123"
}Every webhook request includes an X-Webhook-Signature header. Verify it to confirm the request came from us:
import hmac, hashlib
from fastapi import Request, HTTPException
WEBHOOK_SECRET = "your_webhook_signing_secret"
async def handle_webhook(request: Request):
body = await request.body()
sig_header = request.headers.get("x-webhook-signature", "")
expected = "sha256=" + hmac.new(
WEBHOOK_SECRET.encode(), body, hashlib.sha256
).hexdigest()
if not hmac.compare_digest(expected, sig_header):
raise HTTPException(status_code=401, detail="Invalid signature")
payload = await request.json()
download_url = payload["download_url"]
# fetch and process results...Limits
| Limit | Value |
|---|---|
| Max files per batch | 100 |
| Max single file size | 10 MB |
| Download URL validity | 24 hours |
| Webhook retries | 3 (exponential backoff) |
| Plans with batch access | Starter, Growth, Enterprise |
| Plans with webhook access | Starter, Growth, Enterprise |