Analyzing Invoices and Receipts

Amazon Textract extracts relevant data such as vendor and receiver contact information, from almost any invoice or receipt without the need for any templates or configuration. Invoices and receipts often use various layouts, making it difficult and time-consuming to manually extract data at scale. Amazon Textract uses ML to understand the context of invoices and receipts. It automatically extracts data such as invoice or receipt date, invoice or receipt number, item prices, total amount, and payment terms.

Amazon Textract also identifies vendor names that are critical for your workflows but may not be explicitly labeled. For example, Amazon Textract can find the vendor name on a receipt even if it's only indicated within a logo at the top of the page without an explicit key-value pair combination.

Amazon Textract also makes it easy for you to consolidate input from diverse receipts and invoices that use different words for the same concept. For example, Amazon Textract maps relationships between field names in different documents such as bill number, invoice number, receipt number, outputting standard taxonomy as INVOICE_RECEIPT_ID . In this case, Amazon Textract represents data consistently across different document types. The address fields are categorized as 'receiver', 'supplier', 'vendor', 'bill to', 'ship to', and 'remit to'. When expense documents do not have unique values for each of these categories, Amazon Textract will return only the categories will unique values.

Fields that do not align with the standard taxonomy are categorized as OTHER .

Following is a list of standard fields supported by expense analysis operations.

Invoice Receipt Date — INVOICE_RECEIPT_DATE
Invoice Receipt ID — INVOICE_RECEIPT_ID
Invoice Tax Payer ID — TAX_PAYER_ID
Customer Number — CUSTOMER_NUMBER
Account Number — ACCOUNT_NUMBER
Vendor Name — VENDOR_NAME
Receiver Name — RECEIVER_NAME
Vendor Address — VENDOR_ADDRESS
Receiver Address — RECEIVER_ADDRESS
Order Date — ORDER_DATE
Due Date — DUE_DATE
Delivery Date — DELIVERY_DATE
PO Number — PO_NUMBER
Payment Terms — PAYMENT_TERMS
Total — TOTAL
Amount Due — AMOUNT_DUE
Amount Paid — AMOUNT_PAID
Subtotal — SUBTOTAL
Tax — TAX
Service Charge — SERVICE_CHARGE
Gratuity — GRATUITY
Prior Balance — PRIOR_BALANCE
Discount — DISCOUNT
Shipping and Handling Charge — SHIPPING_HANDLING_CHARGE
Vendor ABN Number — VENDOR_ABN_NUMBER
Vendor GST Number — VENDOR_GST_NUMBER
Vendor PAN Number — VENDOR_PAN_NUMBER
Vendor VAT Number — VENDOR_VAT_NUMBER
Receiver ABN Number — RECEIVER_ABN_NUMBER
Receiver GST Number — RECEIVER_GST_NUMBER
Receiver PAN Number — RECEIVER_PAN_NUMBER
Receiver VAT Number — RECEIVER_VAT_NUMBER
Vendor Phone — VENDOR_PHONE
Receiver Phone — RECEIVER_PHONE
Vendor URL — VENDOR_URL
Line Item/Item Description — ITEM
Line Item/Quantity — QUANTITY
Line Item/Total Price — PRICE
Line Item/Unit Price — UNIT_PRICE
Line Item/ProductCode — PRODUCT_CODE
Address (Bill To, Ship To, Remit To, Supplier) — ADDRESS
Name (Bill To, Ship To, Remit To, Supplier) — NAME
Core Address (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) — ADDRESS_BLOCK
Street Address (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) — STREET
City (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) — CITY
State (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) — STATE
Country (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) — COUNTRY
ZIP Code (Vendor, Receiver, Bill To, Ship To, Remit To, Supplier) — ZIP_CODE

The AnalyzeExpense API returns the following elements for a given document page:

The number of receipts or invoices within a document represented as ExpenseIndex
The standardized name for individual fields represented as Type
The actual name of the field as it appears on the document, represented as LabelDetection
The value of the corresponding field represented as ValueDetection
The number of pages within the submitted document represented as Pages
The page number on which the field, value, or line items are detected, represented as PageNumber
The geometry, which includes the bounding box and coordinates location of the individual field, value, or line items on the page, represented as Geometry
The confidence score associated with each piece of data detected on the document, represented as Confidence
The entire row of individual line items purchased, represented as EXPENSE_ROW

The following is a portion of the API output for a receipt processed by AnalyzeExpense that shows the Total: $55.64 in the document extracted as standard field TOTAL . Actual text on the document appears as “Total,” Confidence Score as “97.1,” Page Number as “1,” and the total value as “$55.64.” This also includes the bounding box and polygon coordinates:

 "Type":  "Text": "TOTAL", "Confidence": 99.94717407226562 >, "LabelDetection":  "Text": "Total:", "Geometry":  "BoundingBox":  "Width": 0.09809663146734238, "Height": 0.0234375, "Left": 0.36822840571403503, "Top": 0.8017578125 >, "Polygon": [  "X": 0.36822840571403503, "Y": 0.8017578125 >,  "X": 0.466325044631958, "Y": 0.8017578125 >,  "X": 0.466325044631958, "Y": 0.8251953125 >,  "X": 0.36822840571403503, "Y": 0.8251953125 > ] >, "Confidence": 97.10792541503906 >, "ValueDetection":  "Text": "$55.64", "Currency":  "Code": USD > "Geometry":  "BoundingBox":  "Width": 0.10395314544439316, "Height": 0.0244140625, "Left": 0.66837477684021, "Top": 0.802734375 >, "Polygon": [  "X": 0.66837477684021, "Y": 0.802734375 >,  "X": 0.7723279595375061, "Y": 0.802734375 >,  "X": 0.7723279595375061, "Y": 0.8271484375 >,  "X": 0.66837477684021, "Y": 0.8271484375 > ] >, "Confidence": 99.85165405273438 >, "PageNumber": 1 >

You can use synchronous operations to analyze an invoice or receipt. To analyze these documents, you use the AnalyzeExpense operation and pass a receipt or invoice to it. AnalyzeExpense returns the entire set of results. For more information, see Analyzing Invoices and Receipts with Amazon Textract.

To analyze invoices and receipts asynchronously, use StartExpenseAnalysis to start processing an input document file. To get the results, call GetExpenseAnalysis. The results for a given call to StartExpenseAnalysis are returned by GetExpenseAnalysis . For more information and an example, see Processing Documents with Asynchronous Operations.