Reg exp query


Badge +8

I have a email library. When the email is received SP reads the email body and stores the email text in a column. The whole of the email body text is pulled but I only need the first few sentences. How do set up a reg ex to only extract text from the email body up to a certain word.


14 replies

Badge +5

When you say a certain word, do you mean a given word, such as received?

Then this should do it

^.*received

If you mean 5 words, then this would do it

^(w+s+){5}

but it doesn't take into account any punctuation characters

This interactive site is brilliant for testing out regexs https://regexr.com/

Note, don't use IE, chrome or firefox work just fine

Badge +8

Let me add some details for my query.  So the emails received into the library are emails that advise that the email was undeliverable to a list of people. I want to pull out this list people (their email address) into a column.

Example: The blue boxes are the email addresses.

220621_pastedImage_1.png

There is lots of text after this part of the email body which I do not need.

Badge +5

It looks like you want to capture anything that looks like an email address from the message body

This regex tester site https://www.regextester.com/19 has a pre-filled regex that appears to do exactly what you want - it matches emails and ignores any other text

Badge +8

Thanks Graham

This is what I need. The only thing is under the text showed above is the  

Diagnostic information for administrators: which contains all the emails address the email was sent too. I only need the failed ones in the to part of the email.  If there was a why to only pull the top part of the text this would then work. 

Badge +5

That sounds straight-forward - the first recipe I showed would allow you to retrieve everything before Diagnostic

^.*Diagnostic

Then pull out the emails that are in the resulting string

Badge +8

Thanks graham lattin 

What operation do I need to use?

I can't get this to work. Are you able to send a screen shot?

Thanks

Badge +5

I don't have a screenshot to share

assume BodyStr holds the email body

the first regex task would take BodyStr and extract everything up to Diagnostic into TempStr

a second regex task would then process TempStr to extract the emails and place the results into a collection

I had a quick play but it only seems to get the first email address - also, I amended the regex slightly to not match the text up to the string end (ie remove the trailing $)

^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*

The issue might be that it cannot do a multiline extract, in which case there should be a previous step that replaces returns with blanks

Badge +5

OK, Ive had a little more time ot look at it this morning

The problem was that it cannot cope with multiline strings, so here goes

1 select everything before Diagnostic into TempStr

2 replace all the line breaks with a space (find [ ]+)

3 capture all the remaining email addresses into a collection using the Extract option - ignore the fixed to start and end of string markers:

[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*

I used the log history to output the collection and got all of the emails I expected

I hope this helps

Badge +8

I can't get ^.*Diagnostic part to work.

Workflow Details

Error in regular expression action. parsing "*(?=Diagnostic)" - Quantifier {x,y} following nothing
Badge +5

I realise that steps 1 and 2 should be reversed - remove the linebreaks before doing any further processing

I ran this to extract everything up to (and including) Diagnostic:

And it produced 'this is a Diagnostic'

Badge +8

Hi

Still not working - I think its the paragraph markers. Do you know how t remove them?

221398_pastedImage_1.png

Badge +5

Yes, introduce a regular expression task as the first task, replace in the Body

[ ]+

with a single space

This replaces all hard linebreaks with a space (to prevent text on a previous line from merging with a valid email address)

Then continue the processing with the result of this

Badge +8

Perfect - now its working! 

Now the next bit - How can I just extract the email address with mailto at the start? 

"mailto:xxxxx.xxxx@xxxxxx.com"

Badge +5

Use a regex extract on the remaining part of the body text

[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*

 

This will capture all email addresses into a collection

Reply