do assignees experience any error while submitting task response?
is it possible that somebody/something (workflow, webservice, ...) changes directly task list item (out of the task form)?
couldn't it be after the task was was approved by approver?
do you have enabled versioning on task list to see what changes are performed on task list item?
do you have option to 'Save and continue' or 'Save' (without submit) on task form?
do you have availbale 'Status' field changeable from task form>?
do you upload attachments with task response? any bigger ones? or any higher number of them?
would you be able to investigate (ev. upload) ULS logs from the time the problem happens?
Thank you for your troubleshooting suggestions @emha .
I have checked everything you asked, and found one thing. All of the tasks that have status Completed and outcome Pending show the Assigned To as the Modified By. All of the tasks where the status is Completed and the outcome is Continue show System Account for Modified By. What might cause the System Account not to modify the task outcome?
In answer to each of your questions,
Assignees do not receive any error message.
I can't imagine that anyone is editing the task other than with the form that is in the workflow task action. The task list has no workflow. I have turned on versioning on the task list so that we can check this to find out for sure, next time this happens.
There are only two buttons on the form. One has the Save and Submit action, and the other has the Cancel action.
The Status field is not on the task form.
There is only one task out of 2482 that has an attachment. This attachment is a small email file.
I would have to ask a colleague to look at the ULS logs.
I appreciate any further thoughts you may have.
What might cause the System Account not to modify the task outcome?
that's the question!
task response is processes in two steps - first Status field is updated with user credentials workflow is running with, then read-only outcome field is updated by system.
I suspect there are some locking issues, but I do not know for sure what may cause them in your env.
I'm affraid this only could be identified from ULS logs (*could*, but need not neccesarily be...)
you know your setup and application better, try maybe think of what might be done on task in parallel, what might block mutually.
try to get in contact with people who responded failed task, maybe they do response in some unexpected/unforeseen way (eg. they respond too quick, before task and task item is fully settled - I remember such discussions on the community)
if you could enable verbose logging it might help as well, since from verbose log you can get event timestamps at miliseconds level which help to significantly reduce amount of ULS log entries to investigate.
@emha I posted on the other thread because it looks more like what we have issue with, ours never makes it to "completed" and pending, but its similiar to our issue.
Ours only happens on Lazy approvals tho, where this one appears to happen within the form. Our users who experience this issue only have it for 24 hours and it goes away with no interaction on our part at all. We're having the issue today and we've been around the bush with nintex support and they have no idea what the issue is. We've even enabled verbose logging to watch it, and all signs point to a permission issue, but one day it works, the next day it doesn't, and then it starts working again after 24 hours and will work for sometimes several weeks to several months and then one day it just decides its going to stop again for 24 hours. We never know when it will happen.
The system account can't update the workflow task for some reason. In our case we have the workflow task list locked down so people can't update it accidentally but we have a process within the workflow that will give full control permissions to the user and then take them away once the task is completed. It also waits for an hour before taking it away to make sure everything is cleared on that task. Either way it doesn't help because we've given full control to users who have the issue and it still happens.
It only happens for one workflow and one day and magically goes away for an undetermined time frame and then just happens again. No server patching, no windows patching, no nintex patching, no workflow publishing, NOTHING changes on the box from one day to the next.
Even nintex support has been scratching their heads on this one. Its been going on for almost 2 years now for us with no resolution. Mostly because of the lack of knowing when it will occur. It makes troublshooting next to impossible. We've had verbose logging enabled on nintex workflows for over a year now. They even gave us a custom patch for additional logging but I can't get manager approval to put it on because of the unknown size of logging. They've tied my hands on it. Microsoft won't help either since its all in the lazy approval section of Nintex workflow.
Just started happening again today so I figured I'd do another search to see if anyone else has been having this problem at all, but it seems really uncommon or related to things that aren't our problem as well. This post is the closest thing i've seen anywhere near recently that comes close to the issue.
@waltont, if nintex support is not able to identify a problem in their produt, do you expect me to do so :)
could you explain what exactly happens with (lazzy) response and the task itself? I somehow miss this bit of info in your posts. you mentioned it doesn't change the task status, but what exactly happens? you mentioned "approver comments get processed..." and you can monitor for that. so can you see in workflow history log the response was received and what was the response content? can you see any change in version history on task item? as far as I'm correct, this should be listed as system account's change.
does task get locked after response is received (that's typical symptom if processing of response breaks somewhere in the middle; further attempts to respond result to a message like " task is currently locked by a running workflow")?
weren't you able to pick any clue from ULS logs over two years?
well, I believe you've tried/checked everything possible over the period. so I affraid I can hardly bring in any new ideas.
I'l just try to question what comes to my mind as I read through your posts
- you say you only have problems with lazy approvals. haven't you spot any pattern in mails/responses that cause or start the problem to happen? eg. attachments - big ones, several of them, sensitive or forbidden characters in their names, pictures or bigger objects pasted in body ...
haven't you spot any problems, delays in communication between exchange and sharepoint?
any pattern in regard of who responded, user group or set of users, domain, mail client resp. mail content/format (each mail client structure mail slight differently...), html/rtf/plaintext content...
do they respond with a valid term? or the task action requires responses not listed among site terms?
- you say it recovers after 24h resp. next day - do you restart anything regularly over the night what could help to recover? or do you run any maintnance/clean jobs?
- similarly, if you say it appear irregularly, but if it already appear it last for a 24h or so - haven't checked with your admins whether there is not a conincidence with their irregular maintenance or other jobs/tasks? or maybe coincidence with some batch actions - eg. to recalculate/update something in a list, cleaning/archiving at the at end of month...?
think of whole infrastructure involved (SP, exchange, AD, SQl server, IIS,...)
- if you say it happens just for single workflow - is it a workflow with just single task action? or are there multiple task actions and just one of them experience problems?
I expect you already did "a must" excercises like republish, delete and recreate action, export-delete-import workflow, or even recreate workflow from scratch...
btw, what exact task action do you use? I can't see you'd specify it
- at one place you mentioned task status is set to "in progress", at the other status doesn't change. doas that mean behaviour is inconsistent? or in what circumstances you get what status?
as far as I know task gets "in progress" status when it's (item) updated but response/decision is not yet submitted - like if you have 'Save' or 'Save and continue' buttons on task form to allow users to record progress step by step without submitting the response. may that be the case?
or are there any other possibiities how task could be updated - workflow on task list itself, javascript, webservice call...
have you tried to compare tasks and their version history that were responded succesfully with those that weren't whteher they differ in this regard? do they in both cases "went through" "in progress" state?
whom is the 'In progress" status set by? system account, approver, some-body/-thing else?
I'm not fully sure on this topic and I'm not able to test it right now: I think I've read somewhere that desktop approval form locks (or may lock?) the task once it is opened. could it be that your users first open the task form just to see the task details and then return back, without closing the task form, to mail and send response, causing effectivelly to lock themself?
- you mentioned as well, once you notice it's a "bad luck day" you postpone tasks to be created/assigned to next day. I'm not sure whteher you went with your investigation so deep, but I wonder - does it realy depend on when the task is created/assigned? and not when the task is responded?
this would imho contradict to all of the other symptomps...
and that would as well mean problem happened/ocured (much?) earlier, not when the response is received...
haven't you noticed tasks being already in "in progress' status before receiveing any respopnse yet?
- one of your another statements says, even if there are problems with processing lazy approval response, user still can respond the task from desktop task form. despite lazy approval responses are being processed by system account, it still have to ensure that response comes from the task assignee. couldn't it be that users responds from different (mail) account? maybe they forward/resend task nitification(s) to their private or the other bussines account and respond from there? maybe they have several accounts but mails are delivered to the same mailbox and they do not notice the difference? or may they have configered mail client so that they override the field whom the mail was sent back from?
or maybe matter of delegation? - someone configures time-wise delegation, or delegate single task by purpose, but fatrewards forgets it or tries to respond from original account? or task itself gets delegated/escalated in the meantime?
well, it ended up with long list of unstructured thoughts and questions :)
I'm not sure whether anything of that would help you anyhow, or whether there is anything new for you at all.
but I can't provide anything better at the moment ...
my 2 cents at the end...
I suspect the issue is not related to permissions (itself or directly) but rather some (inter)locking issues.
as you've mentioned somewhere "system account can't update ..." - as you may know, some actions (or their parts) are processed within sharepoint context, some within nintex/user context. I suspect if these are not synchronized properly in some circumstnaces, or they get out of sync for any reasons (delays, timeouts, workload...) they may lock/block each other mutually. but I'd expect nintex should be able to recognize, identify and fix them...
I'm going to take some time to read over the extensive response here and respond later, but for now I'd like to respond to this statement:
"if Nintex support is not able to identify a problem in their product, do you expect me to do so"
I don't EXPECT anything from you specifically. I put this out there as additional troubleshooting that I’ve taken over the past year. I put this out there in case someone else could take that information and hopefully eliminate some other commonly thought of solutions and go “oh then if this is it, then X is your solution.” Be it you, someone else or anyone in the community. Nintex wouldn’t talk to people who have SOLVED the problem, they’d only talk to people like me who continue to have the problem. If someone else knows the solution, I want to know their steps. More heads is better than a single guy at nintex. I’m literally grasping at straws for a solution here. ANYTHING new to try is worth trying now.
Just skimming over the rest quickly it appears you aren’t taking my troubleshooting seriously and questioning my steps as well. I may have misunderstood, but I’d hope that you wouldn’t be doing that. This issue has been plaguing us for the past year. The statements I’ve made are 100% true and tested over several repeat sessions. Ask Brent Read (the manager over nintex support) as I’ve worked with him extensively on the matter.
I'll get back to you on the rest after I read it over more thoroughly.
Let’s hope I can answer most questions you mentioned below in one screenshot:
"Weren't you able to pick any clue from ULS logs over two years?"
Yes the logs suggested the user assigned to the task did not have appropriate access to update the task which is why there is no outcome selected.
"Haven't you spot any pattern in mails/responses that cause or start the problem to happen?"
If I did i'd have this solved probably. Answer is No pattern.
"big ones," - They're all the same size. No attachments.
"several of them," - It happens with 1, 2, or 20 emails randomly. Other times we can process 50 without issue.
"sensitive or forbidden characters in their names," - All emails are identically generated, by different users. no.
"pictures or bigger objects pasted in body - No pictures or objects." They're simple HTML emails. Keep in mind, it works one day and the next day it doesn't.
"haven't you spot any problems, delays in communication between exchange and SharePoint?"
None that i'm aware of, but in all fairness I don't maintain the network or the servers the application resides on so I wouldn't personally know with 100% certanty there isn't any of this. The network and server administrators state there is none. That's all I can go on.
"any pattern in regard of who responded, user group or set of users, domain, mail client resp. mail content/format (each mail client structure mail slight differently...), html/rtf/plaintext content..."
All these things are present when it works or doesn't, so I doubt it has any impact. No.
"you say it recovers after 24h resp. next day - do you restart anything regularly over the night what could help to recover? or do you run any maintenance/clean jobs?"
THIS might be why it fixes itself after 24 hours. Some overnight task runs that unlocks whatever it is that is CAUSING the permission issue perhaps. We have nothing out of the ordinary running (again as far as I know) that would cause this. If we could figure this out, when I notice the issue I could force it to execute early to fix the problem which would be REALLY nice. Any ideas here to common things I could try would be helpful. Note we have already attempted to restart the timer service with no luck.
You can see here, we went 3 months without a single incident, then BAM it hit again, and now we've had it 3 weeks in a row. The only thing we're doing is security patching on windows, and even that has been disabled because of the issue we had with the Oct 2018 CU and the web.configs not having the appropriate permitted functions. So in actuality, we haven't done ANY updates since last October. You'll also note that ones dated back to November last year are still in progress and have not continued. (honestly i hope they all stay in progress, because if we do in fact resolve this I don't want people to get spammed with these tasks magically starting to work, they'll be getting over a year's worth of emails at once and it'll probably upset some directors).
"haven't checked with your admins whether there is not a coincidence with their irregular maintenance or other jobs/tasks?s...] think of whole infrastructure involved (SP, exchange, AD, SQl server, IIS,...)"
The dates of each occurrence have no related or associated repeating or automated task at the time. Again, the 3 month gap pretty much covers that as well.
"if you say it happens just for single workflow - is it a workflow with just single task action? or are there multiple task actions and just one of them experience problems?"
Literally 1 workflow has the issue. I have 4 total workflows on separate lists that operate similarly but not EXACTLY the same. One workflow is nearly identical process-wise and it doesn't have this issue at all. One time it did, but then it hasn't happened to that one in over a year now. If you look at the task IDs above you can get an idea how many we do. Between December and April of this year we processed over 5000 workflow tasks without issue. This however includes workflow tasks associated to the other 3 list workflows that weren't having issues as well so keep that in mind. The workflow that fails to update has quite a few actions within the workflow, it does NOT fail on the same task however it only fails on flexi-task approvals when it does occur. That's probably the ONLY consistant thing I can find in the process.
"I expect you already did "a must" exercises like republish, delete and recreate action, export-delete-import workflow, or even recreate workflow from scratch..."
We did a re-publish, but I don't think we did a complete wipe, and restore from an import process. So that MIGHT be worth a shot. I just don't know what will happen to the other tasks when I wipe out the workflow. At this point, like I said, anything is worth trying now. My problem is I cannot replicate this issue in our development environment at all, and for that matter I can't replicate it in production either. If I knew what was causing it I'd just stop that or block it or something, but we have nothing to go on.
"what exact task action do you use?"
It’s just a regular flexi-task approval.
in what circumstances you get what status?
When the issue occurs, its "in progress", otherwise it goes from "Not started" directly to "Completed" and never hits "in progress" I know this because of the status monitoring workflow I have on the workflow tasks list. It never hits "in progress" if it works properly. Its like its expecting another responder as if there's multiple people assigned the task, but there isn't one.
"<...]you have 'Save' or 'Save and continue' buttons on task form to allow users to record progressr...]"
Let me stop you there, remember the form works fine even when lazy approval does not. Lazy approval has no buttons. The easier solution would be to simply turn off lazy approval, but managers love that thing so they'd whine. It works 95% of the time, so the 5% it doesn't work is our problem not theirs. They want their lazy approvals. Its worth noting that when one user uses lazy approval it will fail, that same exact user can go to another task and use the form fine and it works for them even when lazy approvals aren't working for them. That's where the headscratcher lies. Why would ONLY lazy approvals fail?
"<...]other possibilities how task could be updateda...]'
Lazy approval or the form. That’s all anyone uses. Nobody knows how to respond any other way even if it was made available to them.
"p...]do they in both cases "went through" "in progress" state?"
I touched on that, no it never hits in progress if its working properly. I even added a history monitor to see if it ever changes between one task to the other, but the workflow never sees it. It’s all contained within the flexi-task action and never leaves that action.
"whom is the 'In progress" status set by?"
This should be apparent by the screenshot above. "system account"
"I think I've read somewhere that desktop approval form locks (or may lock?) the task once it is opened. could it be that your users first open the task form just to see the task details and then return back, without closing the task form, to mail and send response, causing effectively to lock themselves?"
I could try to replicate this but I don't know if the user does this. Again, the wide range of people doing this, the likelihood is next to zero. I could see one user doing it, but why would it lock everyone from updating their tasks too? Its kind of hard to replicate that situation but I'll give it a shot. I don't think a user can submit the form unless they actually click a response to the task. There's no "save for later" option within the flexi-task form that I'm aware of. It still doesn't explain why other users wouldn't be able to update their tasks throughout the course of the day and then magically it starts working.
"once you notice it's a "bad luck day" you postpone tasks to be created/assigned to next day. I'm not sure whether you went with your investigation so deep, but I wonder - does it really depend on when the task is created/assigned? and not when the task is responded?"
This is where I'd wish you'd take my information as written. We've extensively tested this aspect. If a task is assigned to the user, if they respond to it ON THAT DAY ONLY it will fail. If they wait 24 hours for that SAME TASK, and respond to it the next day, there is no issues. It’s not the TASK that's the problem, it’s the lazy approval process that is locked up somehow that won't update the task outcome.
"haven't you noticed tasks being already in "in progress' status before receiving any response yet?"
No, it’s either "Not started" or "Completed" when it’s working. Plain and simple.
"even if there are problems with processing lazy approval response, user still can respond the task from desktop task form"
No you misunderstood. If they respond via lazy approval it DOES lock the task for editing and the whole workflow is blown dead forever. It only works if they have not yet responded to the task via lazy approval.
"or maybe matter of delegation? - someone configures time-wise delegation, or delegate single task by purpose, but fatrewards forgets it or tries to respond from original account? or task itself gets delegated/escalated in the meantime"
I've tossed around ideas on this one, but every time I can find where it SHOULD happen in one case but didn't in another. It’s never directly related to delegation, but we have one user responding to the task, giving them permissions to perform actions on the list, then taking it away from THAT user only so they cannot update any other tasks in the list that they aren't assigned to. It’s a complex process. But as I stated before we shot gunned the site and gave everyone full control and it still happened anyway. Either way the system account is supposed to be the one making the update that can't do it, but it’s supposed to be inherited from the workflow owner anyway.
"I suspect the issue is not related to permissions (itself or directly) but rather some (inter)locking issues."
I'd tend to agree with that assessment, but the issue is what is locking and what would cause it and how can we prevent it from happening. That's the million dollar question.
Just skimming over the rest quickly it appears you aren’t taking my troubleshooting seriously and questioning my steps as well.
believe me or not, I spent reasonable amount of time reading and thinking about your problem, and moreless about every single of your statements. Maybe I went too broad, and didn't focus (just) on lazy approvals. but from the eperience sometimes is better to question and clarify even quite obvious things whether we both/all understand them the samw way. sometimes people declare they did something but after 2-3 questions it turns out they in fact did something else/similar.
Let’s hope I can answer most questions you mentioned below in one screenshot:
your screenshot reminded me one of issues I reported to support and which was classified as bug.
similarly to your case, after lazy approval I didn't see response/decision selected in a nintex form. message was processed, I could see mail body in comments. fortunatelly, in my case the task was processed/closed correctly. it seemed like just nintex form cannot pick a decision for whatever reasons, despite correct value was stored in task list item.
then I tried to remove nintex form and tested just with OOTB sharepoint task form, and it didn't have any problem to show outcome/decision.
if you're interested in, it's a bug #00252313, you may want to ask nintex support to check whether there are not any correlations with your case.
might be a hint for you as well to test with OOTB SP task form...
I know, form should be just a presentation layer, which should just show what's already stored in list. but nintex uses a custom content type for it, which might bring in some issues.
"big ones," - They're all the same size. No attachments.
"several of them," - It happens with 1, 2, or 20 emails randomly. Other times we can process 50 without issue.
"sensitive or forbidden characters in their names," - All emails are identically generated, by different users. no.
"pictures or bigger objects pasted in body - No pictures or objects." They're simple HTML emails. Keep in mind, it works one day and the next day it doesn't.
these all were meant about attachments, not number of responses - ie. eg. multiple attachments within single approval response.
problem with multiple attachemnts is that they are processed/uploaded one by one, and just after "main" item update (ie. approval response is already being processed, and single attachment uploads are in progress as well). these are highly prone to lock issues.
I mentioned big picture/objects within mail response, since these need not be be obvious at first sight (need not be response content itself), especially if HTML/RTF mail format is used. eg. one might have a big picture in a signature, or a big picture as a background, complex custom styling etc. if that's a case of rare/irregular approver, it's worth of taking that into account as well.
This however includes workflow tasks associated to the other 3 list workflows that weren't having issues as well so keep that in mind.
it's a best practise from performance point of view to create a dedicated workflow task lists and workflow history lists for each workflow. if you share these list among workflows and they grow too much, performance may suffer. if perfomance suffers, locks are much likely to happen. the fact that just single workflow experience problem need not be very relevant, since that may depend on several different factors.
maybe if you nightly run a job that cleans task/history list(s), and/or reindex them, it's one which recovers from problems...
so I' recommend to revise their sizes in your environment and consider splitting them.
see eg.
https://community.nintex.com/t5/Community-Blogs/Defensive-Workflow-Design-Part-1-Workflow-History-Lists/ba-p/83073
https://community.nintex.com/t5/How-To-s/Improving-Performance-with-Dedicated-Nintex-Workflow-History/ta-p/86769
https://community.nintex.com/t5/How-To-s/Improving-Performance-with-Dedicated-Nintex-Workflow-Task-Lists/ta-p/91941
Let me stop you there, remember the form works fine even when lazy approval does not. Lazy approval has no buttons
....
I took it in general - I wondered how and why task status may get to 'In progress'. I mentioned few scenarios that may cause this, wasn't sure which of them might be applicable in your environment.
and to be honest, I still wonder how and why task can get to this status...
I wonder, why it is able to update task status from 'Not started' => 'In progress' but not from 'Not started' => 'Completed' like in the 95% of the other cases?
is in those 5% of cases is needed one more step/update to process approval? or is that somehow caused by user activity or manual intervention? does it changes to 'In process' immediatelly or later on?
could you compare (and maybe post) version history for task items correctly completed right away with those still being in progress?
I think I've already mentioned that - lazy approvals are (normally?) processed in two steps - first response message is written and then status is updated.
this is how it looks like in my env.
can you see any difference for 'In process' status?
"whom is the 'In progress" status set by?"
This should be apparent by the screenshot above. "system account"
I believe it is so, but screenshot shows 'Last modified' info, which need not necessarily be status change :)
Hello @cherylshah,
we are facing the same issue with one of our workflows. Do you find any solution to solve this problem? Is there a possibility to restart the stucked tasks again?
Thanks a lot
Did you ever find a solution to this issue? We are experiencing intermittent failures on lazy approvals on 2 out of 4 site collections.
Since a few people have asked, no we still haven't solved this now 3 years later. Since it only happens for 24 hours on the single workflow, we just don't use lazy approvals during that time for that workflow.
The user can go into the form and approve the task without issue during this period. So they've learned to use the work around instead of wasting more time on this clearly unsolvable problem. I spent about a year off and on going back and forth with nintex developers, managers in nintex, the highest level people inside nintex and have even followed the career path of the original tech who helped us at first through the supervisor role, and last I talked to him he was the manager over the support group. That was about 2 years ago. Nobody could fix the issue.
What we did find is that lazy approval THINKS there's another person to respond to the task even though there's not. That's why it goes into "In progress" its waiting on another user to respond to their task for whatever reason. None of the techs could figure out why.
I've moved on to another department so I don't work with SharePoint anymore, but I do know this is still an issue for us. We just figured out how to get around it enough so people could do their jobs.