Skip to main content

I have a email library. When the email is received SP reads the email body and stores the email text in a column. The whole of the email body text is pulled but I only need the first few sentences. How do set up a reg ex to only extract text from the email body up to a certain word.

When you say a certain word, do you mean a given word, such as received?

Then this should do it

^.*received

If you mean 5 words, then this would do it

^(w+s+){5}

but it doesn't take into account any punctuation characters

This interactive site is brilliant for testing out regexs https://regexr.com/

Note, don't use IE, chrome or firefox work just fine


Let me add some details for my query.  So the emails received into the library are emails that advise that the email was undeliverable to a list of people. I want to pull out this list people (their email address) into a column.

Example: The blue boxes are the email addresses.

220621_pastedImage_1.png

There is lots of text after this part of the email body which I do not need.


It looks like you want to capture anything that looks like an email address from the message body

This regex tester site https://www.regextester.com/19 has a pre-filled regex that appears to do exactly what you want - it matches emails and ignores any other text


Thanks Graham

This is what I need. The only thing is under the text showed above is the  

Diagnostic information for administrators: which contains all the emails address the email was sent too. I only need the failed ones in the to part of the email.  If there was a why to only pull the top part of the text this would then work. 


That sounds straight-forward - the first recipe I showed would allow you to retrieve everything before Diagnostic

^.*Diagnostic

Then pull out the emails that are in the resulting string


Thanks graham lattin 

What operation do I need to use?

I can't get this to work. Are you able to send a screen shot?

Thanks


I don't have a screenshot to share

assume BodyStr holds the email body

the first regex task would take BodyStr and extract everything up to Diagnostic into TempStr

a second regex task would then process TempStr to extract the emails and place the results into a collection

I had a quick play but it only seems to get the first email address - also, I amended the regex slightly to not match the text up to the string end (ie remove the trailing $)

^^a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@@a-zA-Z0-9](?::a-zA-Z0-9-]{0,61}}a-zA-Z0-9])?(?:..a-zA-Z0-9](?::a-zA-Z0-9-]{0,61}}a-zA-Z0-9])?)*

The issue might be that it cannot do a multiline extract, in which case there should be a previous step that replaces returns with blanks


OK, Ive had a little more time ot look at it this morning

The problem was that it cannot cope with multiline strings, so here goes

1 select everything before Diagnostic into TempStr

2 replace all the line breaks with a space (find [
]+
)

3 capture all the remaining email addresses into a collection using the Extract option - ignore the fixed to start and end of string markers:

>a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@@a-zA-Z0-9](?::a-zA-Z0-9-]{0,61}}a-zA-Z0-9])?(?:..a-zA-Z0-9](?::a-zA-Z0-9-]{0,61}}a-zA-Z0-9])?)*

I used the log history to output the collection and got all of the emails I expected

I hope this helps


I can't get ^.*Diagnostic part to work.

Workflow Details

Error in regular expression action. parsing "*(?=Diagnostic)" - Quantifier {x,y} following nothing

I realise that steps 1 and 2 should be reversed - remove the linebreaks before doing any further processing

I ran this to extract everything up to (and including) Diagnostic:

And it produced 'this is a Diagnostic'


Hi

Still not working - I think its the paragraph markers. Do you know how t remove them?

221398_pastedImage_1.png


Yes, introduce a regular expression task as the first task, replace in the Body

[
]+

with a single space

This replaces all hard linebreaks with a space (to prevent text on a previous line from merging with a valid email address)

Then continue the processing with the result of this


Perfect - now its working! 

Now the next bit - How can I just extract the email address with mailto at the start? 

"mailto:xxxxx.xxxx@xxxxxx.com"


Use a regex extract on the remaining part of the body text

[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*

 

This will capture all email addresses into a collection


Reply