Extract text from email

  • 13 April 2016
  • 6 replies
  • 6 views

Badge +2

Hi,

I'm building a helpdesk system and have most of it working. What I'm having issues with is getting the content out of a 'reply' email that goes into the system. Currently, the full body of the email gets inserted. What I want to do is just get the most recent part of it.

I've tried using a regex action, but it isn't working - it still puts the entire email in. I use 'split' with an index of 0; using 'extract' for some reason doesn't put anything in the variable. The details are:

Pattern:    ^(.*?s*)From: ~Service

Operation:    Split

Store results in:   collBody

Sample text:  

Plain text try?

From: ~Service Desk

Sent: 12 April 2016 15:36

If I use an online regex validator, it works the way I expect it to, but it doesn't work in the workflow. Interestingly, dumping the colBody value into the history list, it has a bunch of html code which I assume is part of what is causing the issue.

I'm sure it's something small that I'm missing, but I have no idea what.


6 replies

Userlevel 5
Badge +14

it sounds like you receive HTML formatted mails back so your sample text doesn't represents what you in fact provide as a regex input and so your pattern need not appear there.

if split doesn't match, it return whole input as a first token.

if extract doesn't match it returns nothing.

have you tried to history log you input as well? how does it look like?

Badge +2

Thanks Marian,

I was wondering if that might be the case. Yes, I've logged it to the history list, and it does as you suspected - displays html code. I haven't tried much yet to see how to get rid of the html code (I only tried the inline function of xmldecode but that didn't do anything).

I'll do some research today when I have time to see how to remove the html, but if you have any tips on doing so, they would be greatly appreciated.

Nigel

Userlevel 5
Badge +14

I think the easiest thing would be if you could force your clients to reply in plain text.

Badge +2

I've just sent a reply, formatted as plain text, and even so it's still failing. The history list is now showing this.

14/04/2016 11:59Workflow Comment<div class="ExternalClassA028D408FDC64317A2B91C8192D0BDA4"><div>
Formatted as plain text. What is going to happen now? <br> <br>From: ~Service Desk  <br>Sent: 14 April 2016 08:25 <br>
Badge +2

So, I've been playing around a bit more. I saw a post with replies from Andrew Glasser​ which has given me some hope silly.png

Marian Hatala​ - I'd love to be able to force them to use plain text, but that isn't an option unfortunately.

I'm now trying some variations and multiple regex actions to strip out the html, but I think it might be a large mission, considering how much.... 'code' is in an Outlook email.

A combination of these regex are slowly getting me to where I want to be:

<style>(.| )*?</style>

<(.| )*?>

:followed by this, which is what I want based on the orange text

^(.*?s*)From: ~Service

This is the text that will be the actual reply on the email. Below is the headers of the rest of the thread, which I want to discard. It's only this initial line that I want to keep.


From: ~Service Desk

Sent: 12 April 2016 15:36

To: John Smith

Subject: New call logged (ID#12345)

Hi there... Blah blah blah.

Userlevel 5
Badge +14

ok, I understand you have to cope with html mails.

just FYI, with the plain text you maybe hit following bug Multi Text Field and Update Document Error

have  a look on what version you are on.

apart from that, I've spot following post today https://community.nintex.com/message/37379#comment-37379

so you might come to other problems with html mails

Reply