Extract first three words using RegEx?

  • 15 January 2016
  • 6 replies
  • 20 views

Badge +9

I have created a workflow that extracts a certain number of characters from a string , however what is the best way to extract the first 5 words instead of characters of a string?

166364_pastedImage_0.png


6 replies

Userlevel 7
Badge +11

Hi Rency,

I can't think of a way to limit the results of a Regular Expression.

w+

This expression will return all the words from the input text and store them in a collection.

Then you workflow need only process the first 5 words.

cheers,

Vadim

Badge +9

Thank you so much Vadim!!!

Userlevel 7
Badge +17

Vadim is right, you can EXTRACT on words and store into a collection, or SPLIT on whitespace and store into a collection

You can also store the result into a string by matching the first three sets of characters that are followed by a whitespace using

^(w+s+){3}

w - is a word character

s - is a whitespace character

^ - an anchor to match the start of the string

{3} - repeat the match exactly 3 times

To do the opposite and match all other words except the first three you add the ?: to make the first sets a non-capturing group.

^(?:w+s+){3}

Badge +4

Can I apply this to extracting numbers from a date field?

Eg, ^(d+/+){3} To extract 3042016 from 3/04/2016

d - is a digit character

s - is a whitespace character

^ - an anchor to match the start of the string

{3} - repeat the match exactly 3 times

Let me know if this would work

Userlevel 7
Badge +17

If you just want the numbers, you should be able to match them using only d+.

But if you use extract, it will put each digit match into a collection if i'm not mistaken. So maybe you can do the opposite of use Replace text ?:d+, and replace it with blank, then store the result in text var. It should result in 3042016. I didn't test though, let me know if that works.

Badge +4

Hey Andrew,

That reply was definitely helpful!!

My original date ({ItemProperty:Created}) was 3/04/2016 12:00 am. and my desired result was getting '3042016'

My first step was to extract the date value only (everything before the first space) using an 'Extract' Regular Expression of ^S* and passing it into a collection variable called varCreatedDate

s - is a whitespace character

^ - an anchor to match the start of the string

* - matching the preceeding element zero or more times.

182137_pastedImage_1.png

This gave me 3/04/2016; as a result in varCreatedDate

The next step (getting my idea from your response) was to replace any non-digit character ("/" and ";") in my new collection variable ({WorkflowVariable:varCreatedDate}) with a blank and pass that into a string variable

182136_pastedImage_0.png

giving me the result 3042016 into CreatedDateString

YAY!

Thanks for the help Andrew Glasser

Reply