API Reference
Extract Document
Submit any document and receive structured JSON with extracted fields, confidence scores, and built-in validation.
POST
/v1/extractRequest
Send as multipart/form-data.
| Parameter | Type | Required | Description |
|---|---|---|---|
file | File | Required | The document to extract. Supported formats: PDF, PNG, JPG, JPEG, WEBP, TIFF, BMP, HEIC, HEIF. Max size: 10 MB. |
doc_type | string | Optional | Optional type hint (e.g. invoice, payslip, resume, ticket). Defaults to auto — Claude detects it automatically. |
lang | string | Optional | ISO 639-1 language code override (e.g. hi, ar, zh). Auto-detected if omitted. |
Example Request
bash
curl -X POST https://api.documentflowai.com/v1/extract \
-H "Authorization: Bearer af_YOUR_API_KEY" \
-F "[email protected]" \
-F "doc_type=invoice"extract.py
import requests
response = requests.post(
"https://api.documentflowai.com/v1/extract",
headers={"Authorization": "Bearer af_YOUR_API_KEY"},
files={"file": open("invoice.pdf", "rb")},
data={"doc_type": "invoice"},
)
print(response.json())Response
Returns a JSON object. The top-level fields are consistent across all document types; the nested extracted fields vary by document type.
| Parameter | Type | Required | Description |
|---|---|---|---|
job_id | string | Optional | Unique extraction ID. Format: "extr_" + 12 hex chars. |
status | string | Optional | "success" on completion. |
document_type | string | Optional | Detected type: invoice, receipt, payslip, resume, ticket, identity, bank_statement, etc. |
document_category | string | Optional | Broad category: financial, hr, travel, legal, identity, medical, etc. |
confidence | float | Optional | Confidence score 0.0–1.0. Above 0.85 is high accuracy. |
vendor / buyer | object | Optional | Party details — name, address, tax_id, country, etc. |
document_meta | object | Optional | Invoice number, date, currency, language, country. |
financial | object | Optional | Line items, totals (subtotal, cgst, sgst, igst, grand_total), payment info. Present for invoices, receipts, POs. |
payslip / resume / ticket / … | object | Optional | Document-type-specific block. Only the relevant block is present — others are omitted. |
validation | object | Optional | math_valid boolean and warnings array. |
extraction_notes | string | Optional | Extra context Claude captured from the document. |
response.json
{
"job_id": "extr_a1b2c3d4e5f6",
"status": "success",
"document_type": "invoice",
"document_category": "financial",
"confidence": 0.97,
"vendor": {
"name": "Acme Corporation",
"tax_id": "27AABCU9603R1ZX",
"address": "Mumbai, Maharashtra, India",
"country": "IN"
},
"buyer": {
"name": "Tech Startup Pvt Ltd",
"tax_id": "29AAGCT1234A1Z5"
},
"document_meta": {
"invoice_number": "INV-2024-0042",
"date": "2024-03-15",
"due_date": "2024-04-14",
"currency": "INR",
"language": "en",
"country": "IN"
},
"financial": {
"line_items": [
{
"description": "Cloud Infrastructure Services",
"hsn_code": "998315",
"quantity": 1,
"unit_price": 42000.00,
"total": 42000.00,
"tax_rate": 18.0
}
],
"totals": {
"subtotal": 42000.00,
"cgst": 3780.00,
"sgst": 3780.00,
"grand_total": 49560.00
}
},
"validation": {
"math_valid": true,
"warnings": []
},
"extraction_notes": "Payment terms: Net 30"
}Supported File Types
PDF
Multi-page, scanned or digital
PNG
Single page images
JPG / JPEG
Photos of documents
WEBP
Modern image format
TIFF
High-res scans
BMP / HEIC / HEIF
Additional image formats
Max 10 MB
Per file upload limit
Large documents:For documents with more than 20 pages, consider splitting them or using the batch endpoint which handles up to 100 files per request.
Error Responses
| Status | Code | Description |
|---|---|---|
401 | invalid_api_key | Missing or invalid Authorization header |
422 | invalid_file | File type not supported or file too large |
429 | quota_exceeded | Monthly extraction quota reached |
429 | rate_limit_exceeded | Too many requests — slow down |
500 | extraction_failed | Document could not be parsed |