Workflow Failure During For Each Action


Userlevel 4
Badge +10

Hi Folks,

I have a workflow (2013 - On Prem) that queries a large list, and writes all records from the previous month to a CSV file then PUTs the file in a library via the Web Request PUT method. When testing with small record sets, everything works fine. The production use of this will write about 5000 records to the CSV.

When looping (For Each Action) through the collection variables and writing each field value to the Build String action, the workflow fails part way the process. After about 5 minutes of looping, the process just stops with no explicit errors. the only error is "BackUp Easterner Status Records failed to start". Mind you, it has definitely started because verbose logging shows that it has progressed quite a way into the process. It is not failing on a particular record either. Some times it will write a few hundred records and sometimes it writes just over 100.

Does anyone have any ideas as to what the problem may be?

Thanks and Best Regards,

Patrick Kelligan


19 replies

Userlevel 5
Badge +14

put a pause action into your for each loop and let it pause once after X iterations.

you will have to test what the value of X will be suitable for your environment, but I would say 1000 might be a reasonable value.

Userlevel 4
Badge +10

Hi Marian Hatala,

Thanks for the response! I will give that a try. Will be closer to 100 as it is failing sometimes at 180 and sometimes at 350ish. I will let you know what happens.

Thanks and Regards,

Patrick Kelligan

Userlevel 4
Badge +10

Hi Marian,

I set it up and tried it with a pause after 100 and it failed at 96. I lowered the threshold to 50 and it is still running so far and is on its 3rd pause.

With 5000 items this looks to be over an 8 hour workflow process.

This seems like an ok workaround but I wonder what is causing the issue in the first place. Any ideas? While 50 may work for now, it seems like it still may be unreliable since there is no definite breakpoint in either time or record count. Why do you think this might be happening?

Thanks!

Patrick Kelligan

Userlevel 5
Badge +14

Patrick Kelligan I'm not able to answer this.

just my personal guess is that it is some limitation on sharepoint side (since similar problem quite regularly appears with long running loops or actions that need to process bigger payloads), either some timeout or sp batch size???

maybe someone more experienced with nintex or sp internals could clarify it.

Userlevel 5
Badge +14

seeing your timing assumption, I don't know whether it's an option for you but download data from client/consumer (excel?) side would be  much more effective

Badge +5

Patrick, is this a workflow you're running manually? Try it as a scheduled workflow to start it. I've seen similar issues when processing large volumes of records and running the WF manually. If I run it scheduled - it has better luck at processing all requested items. Another thing that may help is to add a Commit Pending Changes at the bottom of your loop. This would be preferable to using the pause because the smallest interval is 5 min. Multiply that times 5000 items and your workflow will spend more time paused then it will processing the items.

I know it seems weird relying on luck - but remember this is SharePoint by Microsoft. Seriously though, there does seem to be some workflow processing limitation in either buffering the workflow actions or in the processing.

Good luck and post any details you might uncover.

Gerard

Userlevel 4
Badge +10

I will try this. The Commit pending Changes is a good idea. I am still running the first test with a 50 record threshold form last night. I will also try a scheduled WF to see if it runs ok.

Userlevel 4
Badge +10

Hi Gerard Rodriguez,

I tried running it manually with the Commit action and it failed. I just set up a schedule that should kick off here in a few to see if it matters if it scheduled or not. Will keep you posted.

Thanks!

Patrick

Userlevel 4
Badge +10

Ok Gerard Rodriguez... It is still running after several hours from a scheduled WF instead of a manual trigger and without the pause. This is encouraging.

Thanks for the tip!

Patrick

Userlevel 5
Badge +14

that's interesting...

Patrick Kelligan​, I would have few questions.

1. do you commit in every single iteration?

2. are you able to provide some performance comparison between pausing and committing?

I mean how much time either approach need to process your whole data set or how much time does it need to process data subset of the same size.

3. could you perform one more test with following setup?

     - pause action at the very beginning of the workflow

     - no pause and no commit within the loop

     - start it manually

Userlevel 4
Badge +10

Hi Marian Hatala,

1. I do commit in every iteration

2. Committing is much faster than pausing

3. I will run the tests tomorrow after the current run completes.

I will let you know what I find.

Thanks!

Patrick

Userlevel 5
Badge +14

ok, thanks.

two more questions though, if I can :

- what was the delay you configured in pause for action within the loop?

- what is your SP workflow timer job frequency?

Userlevel 4
Badge +10

5 min and 5 min.

Userlevel 4
Badge +10

Hi Marian Hatala,

I am running the test and it has been running for over an hour without failing. This is looking good so far. Can you offer some insight into why you think it is successful like this and not before?

Thanks,

Patrick

Userlevel 5
Badge +14

Patrick Kelligan​, thanks for keeping me updated.

sure I can share my thoughts, but note I'm not very experienced with sharepoint internals.

so this is how I would see it:

- when you start workflow manually shrepoint's/IIS' worker process takes care of it, when you schedule it timer job takes care of it

- when you put a pause at the very beginning of the workflow you moreless force sharepoint not to execute it by worker process but rather by timer job process, see Defensive Workflow Design Part 4 - Slow Down and Speed Up for further insights. that said, it doesn't matter what way you start it it's always being executed by timer job.

- when you put a pause inside a loop, you stop the workflow for a moment and gives sharepoint a chance to handle all the collected payload. AFAIK, this should be performed by another background job, but I do not know the name.

until first pause it's being executed by worker proces, after the pause by timer job

- when you put a commit inside a loop, it has moreless the same impact as a pause - in order sharepoint could process all the collected payload it stops your workflow and let the payload process.

the difference between the two I see is that with commit sharepoint guarantees that all the payload is processed before the workflow continue (so you can not be sure how many timer job iterations passes through in the meantime resp. how much time it will be paused) whereas with pause action your workflow continues after configured time period + some small time period to wait for next timer job turn (in this case you can not be sure all payload has already been processed)

so if your last test run will finish successfully, it seems that all the problems (limitations) are caused by worker process. once you force sharepoint to process your workflow by timer job, it doesn't matter how big payload it has to process and how much time does it need for that.

that's my point of view. some sharepoint expert might correct me if I'm wrong.

Badge +5

Marian,

Thanks for providing the details. I agree with your assessment as well, based on what I know.

Regards,

Gerard

Userlevel 4
Badge +10

Hi Marian Hatala,

The process ran over the weekend. The process with the Commit action in the loop worked. The workflow with the pause at the start of the workflow failed after about 5 hours and 10 minutes.

Userlevel 5
Badge +14

thanks for info Patrick Kelligan​, good to know.

so it looks timer job itself has certain limit s as well,  but much bigger.

Badge +5

Patrick, another consideration we have run into is whether you're dealing with large volumes of items verses large volumes of documents. Large volumes of list items seeesm to process easier. I'm modifying one workflow now that processes 100's of documents and we find the Committ Pending Changes isn't sufficient to prevent the scheduled workflow from stopping somewhere in the middle.

To work around this, I've developed a method to count the number of items in the loop and when it reaches a specified limit; say 25, it will pause in the loop for several minutes before continuing the For Each loop. I don't use the Pause action (with minumum of 5 minutes) but use the Execute SQL action and the WAITFOR DELAY statement. That way I can set the number of minutes to something more reasonable - usually 2 minutes.

If you or anyone encounters similar issues with large numbers of documents in a For Each loop - hopefully this can increase your success rate.

Gerard

Reply