Workflows stuck at Pause

  • 23 February 2018
  • 12 replies
  • 230 views

Badge +4

Hello all!

As part of my defensive workflow design (and as advised by Nintex Support) I've added a Pause before each task in a workflow. And now I sometimes see that some workflow instances will be held at that Pause for hours, even days now. Maybe never continuing. (The Pause is set for 1 minute in business hours). Ans users never find out about this if someone doesn't specifically go to check workflow history.

In this context I have 2 questions:

1. Is there a way to find out which workflows are held at that Pause - a report of a kind for all In Progress workflwos in which the current action is Pause?

2. Can I force the ones already stuck on the Pause to continue?

Any other ideas for investigating the core issue are welcome.

Thank you.

Dimiter


12 replies

Badge +8

Hi Dimiter,

Are you actually putting a "Pause action " before any "assign flexi task action" ? Was it a recommended as part of a defensive design? I have seen "Commit Pending Changes" as more of a part of defensive design so it completes anything that's pending at SP level before giving control back to Nintex..

May be Andrew Glasser‌, Caroline Jung‌ could throw some more light on this approach?

One of the ways I have seen sometimes being stuck in "Pause action " or "Wait for Item update" a restart to the timer jobs gets them back in action. Did you happen to try that?

In terms of reporting whats available with Nintex  reporting web parts is pretty much provided OOB..

Regards,

Shrini

Badge +4

Dear Shrini,

Yes, that was a recommendation since we had issues with task that got locked after submitting and never moved on.

Anyway, I've tried before restarting the Timer Service with this:

$farm = Get-SPFarm
$farm.TimerService.Instances | foreach { $_.Stop(); $_.Start(); }

I guess it wont hurt trying again.

As for the reports - I checked all OOTB ones and none will help indeed. I was hoping for a more sophisticated solution happy.png

Thank you!

Dimiter

Badge +8

Hi Dimiter,

Just to clarify you have tried restarting it and it did not work correct? Even manually restarting the SharePoint timer job? Its one of those glitches I guess. 

I guess the main point which I wanted to get some more thoughts from people in the forum was actually adding "Pause" action before any Flexi task. I haven't seen this practice before as generally "Wait for Item update" and "Pause" sometimes create issues and can just wait indefinitely. You could search in the forum for more issues which people have faced similar to this and there would be many.

As part of your history if you are okay to query or code, you could jump to the Nintex workflow history log (Site Settings -> Nintex workflow history) and it logs all the events there. Pause would also put in Event Type as 11 (similar to comments) but if you have not modified the comments in the Pause action then the default message is "Pausing for 5 mins". Nintex workflow history log also has columns List ID (GUID for your SP list) and Primary Item ID (ID for the listitem on which the workflow is running). If the pause completes it adds another event "Pausing complete". So you have the data there if it is easier for you to build something out of it.

Btw you also have Nintex Db's (which I believe we should not touch happy.png similar to SP content Db's) but if you really want to get some reports in a non-production env) or get a backup of production db's on a seperate DB server and extract the history out of it.

213561_pastedImage_2.png

Regards,

Shrini

Badge +4

Shrini,

I used the above PowerShell commands to restart the Timer Service and yes, the workflows are still stuck on the Pause. I am not sure if that's the best way to do it, certainly not the only happy.png Is this what you mean by "manually restarting"?

I went over all the workflows and removed the pauses, I left a Pause only in the beginning of each workflow (this is recommended for sure).

Thanks for the hint on the History list. I can't code but I exported it to Excel and am trying to narrow down the results.

Regards,

Dimiter

Userlevel 5
Badge +9

Hi Dimiter,

Have you checked ULS logs or the event viewer of SharePoint servers?

There may be an error when the workflow is executing the pause action or trying to get the workflow back from the pause.

Have you also checked if SharePoint servers are consuming lot of ressources (CPU, RAM)?

Do you have a lot of workflow instances running?

Badge +4

Hello Caroline,

Thank you for your advice. I ended up removing the Pause and leaving only the one at the beginning of the workflow. Not the greatest solution obviously, but the whole process is split into several workflows and is currently working properly. I may try to simulate that behaviour in a test environent and explore the options you suggested.

Thank you again,

Dimiter

Userlevel 5
Badge +9

Great, thanks for letting us know how you manage this situation. It can be useful for others happy.png

Userlevel 7
Badge +17

I think the core issue would be to investigate the RAM on the server, and how many workflows you have running at once. The KnowYourWorkflow tool can help with that. If pauses are helping your workflows to function, one at the beginning, then the workflows are being processed by the timer service (owstimer) instead of the app pool where all workflows originate (w3wp). But there are per web app limits, and if multiple web apps are using the same app pool, then even more restrictions. Any limits hit on the IIS Worker service, or app pool, are queued in the database for the timer service to pick up later. And its FIFO queue.  

IIS - Needs RAM, has a 15 workflow process bucket available at once. Add RAM, or split web apps per app pool. It is not recommended to change these system settings and manipulate sizes and timings via PowerShell. Other option is to have more WFE servers as user load will be split on the web app app pools on different servers. So you get a 2x scale when adding another server.

Timer Service - does really well working on workflows, a bit faster too. Needs RAM and good database management. Timer services run on nearly all WFE and App servers.

If pausing at the right time helps, you may need to tune IIS. If pausing goes a bit crazy at times, then server tuning, database tuning can help. But, note, that you can over pause workflows. Unnecessarily pausing causes too many queues. Think of it like driving downtown. You get farther when there are more green lights. The more red lights give you time to drink your coffee without spilling, but you don't get to your destination quickly and you back up everyone behind you. 

Badge +4

Andrew,

Thank you for your thorough reply. Indeed, we ended up bumping up the RAM of the WFE server and removed unnecessary Pauses.

Your guidelines shed more light on how SharePoint manages workflows. 

Userlevel 3
Badge +8

I know this is already answered, but we had a similar problem of workflows getting stuck at the pause action. Our issue was only happening with one workflow. We found that our workflow task list had become corrupt so we created a new task list in the workflow and it started working again.

Kassie

Badge +4

Thanks, Kassie

It became obvious that there may be several reasons for that so sharing other solutions in this thread will be helpful for others.

Dimiter

Badge +7

I, too, was advised by Nintex to add a Pause at the top of the workflow when using Nintex 2013. When I changed jobs and built a workflow in O365, I followed the same advice and had the same issue you had. I also had a "wait for item update" action, that would also stall and never progress. My research indicated that it was an issue with the RAM. This thread makes me feel better about that. I will direct my managers to this thread so they can read up on this.

Excellent thread and input. Thank you so much for asking the question!

Reply