Analyze PDF - Get PDF Page Text (Formatted)

Forum|Forum|4 years ago
November 5, 2021
3 replies
58 views

abrommel

I am curious if anyone has any insight into how the PDF text extraction works behind the scenes. The results produced using the command are different than any other method I have used in the past for extracting the text from PDF files using code. In some cases that's a good thing, in other cases not so much. Just curious if anyone had any ideas.

A

Ayelet_Gazit
Forum|Forum|3 years ago
August 29, 2022

@andy Brommel you want us to share our top secrets? ?

@Ivgeni Rapoport what can we share on this?

Like

I

Ivgeni___Ly8v__
Forum|Forum|3 years ago
August 29, 2022

Kryon's "read PDF" command works on searchable PDFs, which means that there is a text layer that exists inside the PDF, and our command extracts this layer to the string variable. the "formatted" option is adding Tabs and newlines into the string.

Like

A

abrommel
Author
Forum|Forum|3 years ago
August 29, 2022

Thanks for the insight! After I started digging "under the hood" (through the application folders) I believe I was able to gain a better understanding of how it works. I really appreciate the assistance.

Like

Sign up

Log in with SSO

Login to the community

Log in with SSO