Symptoms
Suppose the following scenario: you are unable to open K2 Worklist webpart placed on any SharePoint page - it stays endlessly in a loading state. The problem is floating, i.e. it appears on both of two K2 servers in K2 farm, but not simultaneously.
Neither Windows Application log nor K2 logs contain any errors indicating possible cause of this issue.
Diagnoses
You may try to place K2 Worklist webpart on a new SharePoint page to rule out SharePoint related issues of any kind.
If you see this behavior on one particular server at a time you may isolate it from client access by means of DNS (removing corresponding DNS entry) so that you can troubleshoot this problem further. You may notice that after a while your working K2 server stops working with the same symptoms (permanently loading K2 Worklist webpart) whereas one you isolated earlier return to normal operation. These symptoms: inability to load K2 Worklist webpart, and the fact that no errors being logged, as well as that over time server starts responding, but after a while returns to the same state clearly indicates that K2 servers running out of threads and unable to process new requests. This is why no errors being logged - K2 server do not enter into error state but just waiting for availability of free threads.
There are number of different threads with their respective limits configured by default for K2 server, and as we unable to open K2 Worklist webpart this means that server run out of worker threads which handle service requests such as: client and management communications, normal process execution (which includes the starting of process instances), and Worklist retrieval. As a final confirmation that this is the case you may check whether you are able to opent K2 Workspace on affected server - you should see that it too not responding and does not show your worklist. You should see only top part of the interface with non-functioning buttons (returned by IIS), and empty space in main window. This is clear confirmation that K2 server in question run out of worker threads. The fact that IIS returned upper part of the interface visible and responding means that IIS and authentication not an issue in this situation.
To further verify and correct this you should check K2 Server.Procinst table and select all processes with status 1 (Running). For two server K2 farm you have 40 worker threads (20 for each server) and if one of the servers has more than 20 processes in a Running state you will see the symptoms described above. This situation will persist until number of running processes fall below 20. Normally K2 server should never stay without free threads and in cases when more than 20 processes stay in a running state for prolonged period of time corrective actions are necessary.
Resolution
Worker threads limit is configurable and can be increased in K2 host server configuration settings, but it is better to verify workflows you are running and investigate why some activities within them stay in a Running state for too long. As a temporary solution you may stop K2 service, then select all processes from Server.ProcInst table which constantly stay with status 1 (Running), and assign them some random number (anything above of 5 will do for this) as a status and start K2 service ? this will make K2 Workspace and K2 Worklist available again (as server will get free worker threads to process worklist retrieval requests). Note that you should have very clear understanding of what are you doing whenever you attempt to perform any direct edits in K2 database and take care about database backup before attempting to do this. In case any doubts you should log a ticket with K2 Support for qualified assistance and guidance. Once you performed corrective measure it is possible to check which process exactly causes the issue and depending on investigation results either adjust process design or maybe increase max thread limits. Additional details/considerations: Number of the processes with status 1 (Running) in the Server.ProcInst table by itself is not a problem, but it is a problem when some "misbehaved" process has multiple constantly running instances. Good check here is to verify if there are any processes which are constantly stays in Running state (Status=1). You can do this by running the following SQL query against K2 database:
SELECT * FROM Server.ProcInst WHERE Status = 1
If you monitor results returned by this query for some time, you should be able to tell if any process is constantly stays in running state. Next, when you identified such process (or processes) in constant running state: Stop K2 service Set the status of item to 10 (or any other number greater than 5) Then start the service Another good check is to verify state size for processes. Anything with state size above 1000000 is greater than a 1 Mb and causes huge spikes. Because such processes had been through a great number of iterations, the state size and version numbers are very high compared to the other processes on the server. The following query can be used to check processes with a large state size:
SELECT TOP 200
ID,
DATALENGTH(State)/1048576.0 AS StateSize,
Version,
StartDate,
Originator,
Folio,
Status
FROM Server.ProcInst WITH(NOLOCK)
WHERE Status IN (1, 2) AND DATALENGTH(State)/1048576.0 >= 1
ORDER BY DATALENGTH(State) DESC
Related documentation:
Details of ProcInst table structure (including process status values description) can be found in K2 blackpearl K2ServerLog Database ERD document.
K2 Server Thread Pools settings described in the K2 Server Thread Pool settings document.