DocumentFlowAI DocsSign Up
API Reference

Extract Document

Submit any document and receive structured JSON with extracted fields, confidence scores, and built-in validation.

POST/v1/extract

Request

Send as multipart/form-data.

ParameterTypeRequiredDescription
fileFileRequiredThe document to extract. Supported formats: PDF, PNG, JPG, JPEG, WEBP, TIFF, BMP, HEIC, HEIF. Max size: 10 MB.
doc_typestringOptionalOptional type hint (e.g. invoice, payslip, resume, ticket). Defaults to auto — Claude detects it automatically.
langstringOptionalISO 639-1 language code override (e.g. hi, ar, zh). Auto-detected if omitted.

Example Request

bash
curl -X POST https://api.documentflowai.com/v1/extract \
  -H "Authorization: Bearer af_YOUR_API_KEY" \
  -F "[email protected]" \
  -F "doc_type=invoice"
extract.py
import requests

response = requests.post(
    "https://api.documentflowai.com/v1/extract",
    headers={"Authorization": "Bearer af_YOUR_API_KEY"},
    files={"file": open("invoice.pdf", "rb")},
    data={"doc_type": "invoice"},
)
print(response.json())

Response

Returns a JSON object. The top-level fields are consistent across all document types; the nested extracted fields vary by document type.

ParameterTypeRequiredDescription
job_idstringOptionalUnique extraction ID. Format: "extr_" + 12 hex chars.
statusstringOptional"success" on completion.
document_typestringOptionalDetected type: invoice, receipt, payslip, resume, ticket, identity, bank_statement, etc.
document_categorystringOptionalBroad category: financial, hr, travel, legal, identity, medical, etc.
confidencefloatOptionalConfidence score 0.0–1.0. Above 0.85 is high accuracy.
vendor / buyerobjectOptionalParty details — name, address, tax_id, country, etc.
document_metaobjectOptionalInvoice number, date, currency, language, country.
financialobjectOptionalLine items, totals (subtotal, cgst, sgst, igst, grand_total), payment info. Present for invoices, receipts, POs.
payslip / resume / ticket / …objectOptionalDocument-type-specific block. Only the relevant block is present — others are omitted.
validationobjectOptionalmath_valid boolean and warnings array.
extraction_notesstringOptionalExtra context Claude captured from the document.
response.json
{
  "job_id": "extr_a1b2c3d4e5f6",
  "status": "success",
  "document_type": "invoice",
  "document_category": "financial",
  "confidence": 0.97,
  "vendor": {
    "name": "Acme Corporation",
    "tax_id": "27AABCU9603R1ZX",
    "address": "Mumbai, Maharashtra, India",
    "country": "IN"
  },
  "buyer": {
    "name": "Tech Startup Pvt Ltd",
    "tax_id": "29AAGCT1234A1Z5"
  },
  "document_meta": {
    "invoice_number": "INV-2024-0042",
    "date": "2024-03-15",
    "due_date": "2024-04-14",
    "currency": "INR",
    "language": "en",
    "country": "IN"
  },
  "financial": {
    "line_items": [
      {
        "description": "Cloud Infrastructure Services",
        "hsn_code": "998315",
        "quantity": 1,
        "unit_price": 42000.00,
        "total": 42000.00,
        "tax_rate": 18.0
      }
    ],
    "totals": {
      "subtotal": 42000.00,
      "cgst": 3780.00,
      "sgst": 3780.00,
      "grand_total": 49560.00
    }
  },
  "validation": {
    "math_valid": true,
    "warnings": []
  },
  "extraction_notes": "Payment terms: Net 30"
}

Supported File Types

PDF
Multi-page, scanned or digital
PNG
Single page images
JPG / JPEG
Photos of documents
WEBP
Modern image format
TIFF
High-res scans
BMP / HEIC / HEIF
Additional image formats
Max 10 MB
Per file upload limit
Large documents:For documents with more than 20 pages, consider splitting them or using the batch endpoint which handles up to 100 files per request.

Error Responses

StatusCodeDescription
401invalid_api_keyMissing or invalid Authorization header
422invalid_fileFile type not supported or file too large
429quota_exceededMonthly extraction quota reached
429rate_limit_exceededToo many requests — slow down
500extraction_failedDocument could not be parsed