Symptoms
Suppose that you occasionally experience a situation when K2 host server service stop responding to, let's say your custom WCF service and WCF service starts to receive time outs for any tasks whereas at the same time K2 host server seemingly running normal with no errors being logged. K2 service restart temporarily resolves this issue.
Diagnoses
This behavior may require certain conditions to coincide and could not be easily reproducible. If this is the case, then you may rise K2 host service logging level and wait until the next appearance of this issue, so that you can see whether any errors are being logged. Following error can be registered at the time when you experience aforementioned behavior:
"Error","General","1","GeneralErrorMessage","K2Worker.StartProcessInstance","1 28027 Process StartRule failed","","","k2bps:C:Program Files (x86)K2 blackpearlHost ServerBin","232567161","a67f3f8b610741eab3ba923744978ba1",""
In case you see the behavior described earlier along with error message "28027 Process StartRule failed" which persists until service restart this means that the rule “StartProcessInstance” does not work correctly. I.e. it unable to start new process instance, likely because of the absence of available threads (number of threads on K2 server is limited to 20 by default).
It could be the case that you performed load testing, and were able to run hundreds of instances of your process without any issues, but load testing may not precisely simulate real workload. It could be the case that you could run hundreds of workflows during your load testing, but in reality there is also an important factor of instances status at any given point in time – only 20 of those could be in a Running state simultaneously on one K2 server (by default).
Resolution
To resolve this you have either increase number of threads available or investigate why you running out of free threads.
First, your K2 deployment can be scaled out, and as you add extra K2 servers you got extra threads available, i.e. 1 K2HostServer = 20 threads, 2 K2HostServers = 40 threads, 3 = K2HostServers 60 threads.
Next, and more important, it is necessary to find out which events in workflow “hold” the threads. For example, workflow can contain events, which are not being completed/staying in “Running” status for a long time. This could be verified in Server.ProcInst table in K2 database, by way filtering entries in this table using State = 1 (where 1 means Running state). Once you got this information you can do the following:
1. If at the time when this issue occurs you see >= 20 entries with status 1 it confirms that the issue is caused by absence of free threads. All threads are busy.
2. Next (if (1) is true), before restarting K2 service it makes sense to investigate which events holding the threads. Looking at the entries with State 1 you may check which ProcessInstances they belong to, and next, by accessing View Flow of those particular instances it is possible may see which activity/event is holding threads.