BLOGS

Document Extraction with SAP Intelligent RPA – Using Pre-Trained AI Model

Jan 13, 2022

Introduction

SAP Intelligent Robotic Process Automation provides convenient and smart solutions to process large amounts of business documents that have content in headers and tables. We will use an example use-case, modelled after a business case to extract information from such business documents. After you provide the document and specify its type, it returns the extraction results from Header Fields and Line Items.

This is the second blog in the “Document Extraction with SAP Intelligent RPA” series. The goal of the series is to empower the community with step by step guide showcasing the Document Extraction capabilities within SAP Intelligent RPA. Its predecessor, regarding “Text Operations to Ease Data Capture” can be found here. Further information on integration touchpoints can be viewed in this blog.

Prerequisites

  • SAP Intelligent Robotic Process Automation platform (Trial / Full-Version)
  • Installation as per the instructions in Help Portal
  • Knowledge about Projects, Automation. Tutorials can be found under: Tutorials

Setup

  • Create a project in the Cloud Studio
  • Add the following dependencies in the respective project:
    • Document Information Extraction SDK
    • PDF SDK Please Note: Core SDK and Excel SDK will be automatically added to the project when an automation is created

Business Use-Case

As an organization, there are numerous invoices, purchase orders and payment advices from multiple vendors which go through different departments. For example, travel expense documents or equipment invoices from vendors.

Let us take an example of an invoice from an IT vendor who has provided the equipment for a new hire in the organization. Often, an organization will have multiple vendors for procuring this equipment and the document would contain information like Invoice Number, Sender Name, Item description, Net Amount etc. You can use the extracted information, for example, to automatically process payables, invoices, or payment notes while making sure that invoices and payables match.

Although, different vendor invoices could be structurally similar but manual data entry for an organization would mean a lot of man-hours put in data entry. Automating data extraction from business documents could be challenging too incase of a new vendor or discontinuation of an existing vendor.

To simplify this use-case we will be using the new “Extract Data (Pre-trained model)” activity along with some pre-existing activities.

Proposed Sequence of Execution

Sample Invoice:

  1. Create an Automation
  2. Drag and drop the Extract Data (Pre-trained Model) activity. This activity accepts a machine readable or scanned document in PDF or Image format(s). This activity requires two inputs viz. the type of document (Invoice, Payment Advice or Purchase Order) and the path to the document for extraction.Note: There are two additional non-mandatory fields which are beyond the scope of this use-case. The would be covered in future blogs in this series.
  3. We can now add a Log Message activity to view the output of the activity. The output of the activity is “extractedData” which contains various Header Fields and Line Item Fields.

Logging data

  1. A custom message can be logged by clicking on the icon marked in red(see screenshot below). Under “Variables” we are able to see the output “extractedData” which contains various Header Fields and Line Item Fields. These fields are dependent upon the type of document selected in the previous step.
  2. We can put the following message in the “Log Message” activity to view the extracted result. "Invoice Number: " + Step1.extractedData.headerFields.documentNumber.value​
  3. Test the automation to view the extraction result in the Test Console. This result can be verified against the information provided in the sample document above.

Conclusion

After going through this blog post, you would have become acquainted with the new Extract Data (Pre-trained Model) activity and its usage. In addition, you would have an appreciation for the convenience of the activity with regards to information extraction from frequent business documents.

In the Proposed Sequence of Execution section we were able to log the invoice number from the invoice. Similarly, we could extract the other fields such as: Sender Name, Item description, Net Amount etc. These fields could then be transferred to a data source like MS Excel and stored in a shared location, like One Drive. Or we could use these extracted fields to process the invoice by extending the automation.

Thanks for reading and f eel free to leave a comment  with questions or feedback 🙂

Find more information on SAP Intelligent RPA:

Exchange knowledge: SAP Community | QA | Blog

Learn more: Webinars | Help Portal | openSAP

Explore: Product Information

Try SAP Intelligent RPA for Free: Trial Version | Pre-built Bots

Follow us on: LinkedIn, Twitter and  YouTube

Related Posts