Nintex Community Menu Bar

RPA Claim Processing - Part 3: OCR Google Cloud Vision API

5 years ago
October 6, 2019
1 reply
53 views
Translate

kkgan
Novice

In my previous blog post (i.e. RPA Claim Processing - Part 2: Simple OCR), we have learn how to use the built-in Simple OCR to read printed text from a PDF form. This helps us to process all inbox pdf and categorize the documents to different claim categories.

Let us take a step further, instead of OCR the printed form title, if we use the same technique to OCR a PDF that is filled with handwritting, we will come to realize the Simple OCR is having difficulty to get the right recognition of handwritting for us.

The results of the Simple OCR i have tested are shown in the below captures (i.e. using PDF, Image, and Cropped Image). Take note of the Preview results, the result is somehow unpredictable and unexpected.

Figure 1: Simple OCR with PDF

Figure 2: Simple OCR with JPG Image

Figure 3: Simple OCR with Cropped JPG Image

Google Vision API

Google Vision API can detect and transcribe text from PDF and TIFF files stored in Google Cloud Storage (i.e. Google Cloud Vision API ). Unfortunately, as our users concern about having to save the entire PDF files in Google Cloud Storage, we are going to convert the PDF to Image file, and take each of the "input field" of the document to be sent to Google Vision API. Google Cloud Vision API takes base64 image for OCR purpose, there is no need for us to save the Image/PDF to the Cloud Storage. By OCR input field by field, it minimizes the effort to parse data that is for the entire document.

While testing on Google Vision API, I come to realize Mathias Balslow @mbalslow of Foxtrot Alliance has already shared a great post on How-To Use Google Cloud Vision API (OCR & Image Analysis), without reinventing the wheel, we can simply follow what was shared by Mathias on how to setup and use Google Vision API.

I will be attaching my script on my testing in this article later, but without the iteration part. Below are the steps of my script:

Create a list of "Input Fields" to be OCR
Open the Image file and saved it to duplicate the file as current <fieldname.jpg>.
Open the <fieldname.jpg>
Crop the image to the area representing the input field
use the REST action to send the <fieldname.jpg> to Google Cloud Vision API endpoint.

Here is the cropped image of my Fullname field:

The Google Cloud Vision API returns the result that is very promissing to me, the returned result includes the blurry/noised field label in my case (i.e. Insured Member (Emplyee)), and the handwritten full name. The result in json format as summarized below:

{    "responses":[       {          "textAnnotations":[             {                "locale":"en",               "description":"Insured Member (Employee)GAN KOK KOON",               "boundingPoly":{...}            },            {                "description":"Insured",               "boundingPoly":{...}            },            {                "description":"Member",               "boundingPoly":{...}            },            {                "description":"(Employee)",               "boundingPoly":{...}            },            {                "description":"GAN",               "boundingPoly":{...}            },            {                "description":"KOK",               "boundingPoly":{...}            },            {                "description":"KOON",               "boundingPoly":{...}            }         ],         "fullTextAnnotation":{             "pages":[...],            "text":"Insured Member (Employee)GAN KOK KOON"         }      }   ]}

With the returned result above, it makes the parsing much easier compared to if the result consists of data of the entire document.

Up to the current stage, you might be wondering do I have to get every single machine with the capability to convert the PDF to image, or if every bots we have, to categorize the documents for processing, I will be sharing and discussing on bots deployment options for the Claim Process. After that I am also planning to revisit our python code to further explore how we can overcome the challenges on parsing the return result of Google Vision API.

Did this topic help you find an answer to your question?

ArohShukla
5 years ago
October 14, 2019

An awesome and detailed blogpost. Thanks for sharing.

Translate

Reply

Cookie policy

We use cookies to enhance and personalize your experience. If you accept you agree to our full cookie policy. Learn more about our cookies.

Cookie settings

We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.

Basic
Functional

Normal
Functional + analytics

Complete
Functional + analytics + social media + embedded videos

2 Attachments

Reply

Related topics

Restrict via User Security to access Reassignment of Approvalsicon

Ask a Question, Get an Answer - May 6 - May 10, 2024 - Need extra help this week!

How to configure User can restrict to see his only approval documentsicon

How access rights are defined in MYOB Acumatica?icon

Row-level Security Doesn't Restrict File Attachments on Entities that are Restricted?icon

Sign up

Log in with SSO

Login to the community

Log in with SSO

Scanning file for viruses.

This file cannot be downloaded

Cookie policy

Cookie settings