Skip to main content
Nintex Community Menu Bar

Analyze PDF - Get PDF Page Text (Formatted)

  • November 5, 2021
  • 3 replies
  • 43 views

I am curious if anyone has any insight into how the PDF text extraction works behind the scenes. The results produced using the command are different than any other method I have used in the past for extracting the text from PDF files using code. In some cases that's a good thing, in other cases not so much. Just curious if anyone had any ideas.

3 replies

@andy Brommel​  you want us to share our top secrets? ?

@Ivgeni Rapoport​ what can we share on this?

 


Kryon's "read PDF" command works on searchable PDFs, which means that there is a text layer that exists inside the PDF, and our command extracts this layer to the string variable. the "formatted" option is adding Tabs and newlines into the string.


  • Author
  • August 29, 2022

Thanks for the insight! After I started digging "under the hood" (through the application folders) I believe I was able to gain a better understanding of how it works. I really appreciate the assistance.