K2 Host server process crashes

  • 28 September 2009
  • 7 replies
  • 108 views

We are experiencing some stability problems with our K2 installation. For some reason the K2 host server service suddenly stops (crashes). There are no error or warning entries written to the event log when this happens. I have enabled the logging feature in the HostServerLogging.config. When the K2 Process stops an error is written to the log file (the specific error can be found at the bottom of this posting).  The process instance on which the K2 service stops is different every time. The process instance is in an error state after the restart of the service. When retrying the process instance manually it continues to work without a problem. We have been experiencing this problem for a few weeks now. Some days the problem occurs only one time a day, some times more than twenty times! We are using K2 Blackpearl version 0807 Update (4.8210.2.450)

The error seems to occur in two different unrelated situations:

High workload:
When there is a high workload on one of our workflows the problem seems to occur more often. During this ‘high’ workload the processor is never stressed at 100% and there are no abnormal mounts of memory consumed. There is only an above-normal processing of the workflows.

Open TCP Connections:
The second situation in which the host process crashes is when the host process has been running without any problems for quite a while (couple of days). When looking at the counters in the performance monitor the K2 counter ‘TCP Connections opened’ seems to have a high amount of open TCP connections. Usually when it hits 15.000 open connections the process stops.

As a workaround I have written a process monitor which monitors the K2 host process and restarts the service after it has stopped. Although this currently allows our server to keep functioning it’s not a permanent solution.

Has anyone encountered the same problem? Or knows a solution to this problem? Any help is highly appreciated.

 

 

Here is the section of the log file containing the error. The username has been replaced by <domain><UserAccount> for security purposes.

----------------

"5437453751","2009-09-28 10:08:02","Error","EnvironmentServer","15100","Generic","SourceCode.Workflow.Runtime.Management [SendArchiveX [string[] names]]","15100 Error occurred, ERROR: 26023 Process instance 20272 not found for K2:<domain><UserName> at 192.168.200.14:10","anonymous","0.0.0.0","K2SRV01:c:program files (x86)k2 blackpearlHost ServerBin","5437453751","8ea18d89f7ae4390bb62b8c7c8200c7e",""

"5437453752","2009-09-28 10:08:02","Error","EnvironmentServer","15101","Generic","SourceCode.Workflow.Runtime.Management [GotoActivity [string[] names]]","15101 Error occurred, ERROR: 26023 Process instance 20272 not found for K2:<domain><UserName> at 192.168.200.14:10","anonymous","0.0.0.0","K2SRV01:c:program files (x86)k2 blackpearlHost ServerBin","5437453752","98ca976ddb924590bbfcacf927d79c48",""

"5437453753","2009-09-28 10:08:02","Error","System","2025","InternalMarshalError","SourceCode.Hosting.Server.Runtime.HostServerBroker.InternalMarshal","2025 Error Marshalling SourceCode.Workflow.Runtime.Management.WorkflowManagementHostServer.GotoActivity, 26023 Process instance 20272 not found for K2:<domain><UserName> at 192.168.200.14:10","system","192.168.200.14","K2SRV01:c:program files (x86)k2 blackpearlHost ServerBin","5437453753","18e7a173c01e41f9866a8ca8631611e7",""

"5437453754","2009-09-28 10:08:02","Error","System","2025","InternalMarshalError","SourceCode.Hosting.Server.Services.TCPClientSocket.InternalMarshal","2025 Error Marshalling SourceCode.Workflow.Runtime.Management.WorkflowManagementHostServer.GotoActivity, 26023 Process instance 20272 not found for K2:<domain><UserName> at 192.168.200.14:10","system","192.168.200.14","K2SRV01:c:program files (x86)k2 blackpearlHost ServerBin","5437453754","286926ddf56b468bac2f19f49090815f",""


7 replies

Badge +13

Not sure if this will help w/excessive TCP connection but try to reduce the timeout to 30 seconds.   HKLMSystemCurrentControlSetServicesTcpipParametersTCPTimedWaitDelay

http://blogs.msdn.com/dgorti/archive/2005/09/18/470766.aspx

Userlevel 4
Badge +14

If what  Peter metioned doesnt work, can you provide the following detail.


I noticed a Goto Activity is being called prior to the Server crash. Could you provide us with more information regarding the Goto . Is it being called via the API? Are you doing a Goto from the Workspace? Or is it an Escalation Goto ?

Also are you having the same behaviour when attempting to Delegate an item?


vernon

Thanks for your reply Vernon.

To put it more into context:

We have built our own web application which uses K2 as runtime environment for various workflows.  We don’t use the default K2 Workspace for showing and processing tasks. The workflow which is most probable to cause the crash is a relative simple workflow which processes a credit claim. A user enters a claim, this claim is redirected to a claim administrator which approves or denies the claim. After the approval or deny, the workflow pushes the outcome to our internal system for further processing by calling a function using a SmartObject. The flow is a little more complicated, but in a nutshell that's what the flow does.

It's possible to reassign a task from our application, but in the cases where K2 crashed this was not the case as far as I can see. I have tried to manually delegate a task, but this did no lead to any unexpected behavior. There are also no escalations defined. When looking at the workflow that is the error state after the service crashes, the workflow has stopped at the activity containing the SmartObject. I can understand that there is problem with (or bug in) the smart object but it seems to me that K2 should not crash when a simple error occurs in the smart object. Or is it possible that, in case an exception occurs in the Smart Object, the K2 host server process crashes because of this exception?

Userlevel 4
Badge +14
So it’s definitely the SmartObject that makes the server crash, for an unknown reason. Do you use a connection string in the SmartObject and what service do you use, Dynamic SQL from BlackMarket? You can also see what happens when testing the SmartObject with the SmartObject tester which you can find in the BlackPearl installation folder service brokerSmartObject service tester.exe Vernon
The smart object where the workflow is in error is a custom in-house developed SmartObject. In this smart object a web-service is called which submits the claim workflow outcome to our back-end system. When an error occurs in the smart object, this error is catched and rethrown to alert K2 that an error occurred, this construction has always worked nicely until now.

Further I have done some more research. Because almost every time K2 crashed a workflow instance was in error state, I assumed that this was the instance causing the error. Yesterday I took a deeper look and saw that process instance reported in the log file was working again after a restart. And after resubmitting the task again the workflow continued without a problem. So it seems that there is no direct relation between the workflow in error state and the workflow instance reported in the log file.

However I have discovered another strange phenomenon. I decided the open a performance monitor and monitor the SQL Server Error counters. The first strange thing I noticed was that there are a lot errors generated when K2 is function normally (about 20 per second). I am not an expert in SQL Server, but it seems a lot to me. But just before K2 crashed there was a sudden burst of errors and the performance monitor counted 1400+ errors per second for a couple of seconds. I have been unable to identify these errors yet, but it seems like it could be relevant.

Another thing I remembered was that a couple of week ago (about a week before the problems occurred) we had a hard disk storage problem. The transaction log of the K2ServerLog database had expanded to a size of over 280 GB and we had to purge and shrink the transaction log. A lot of workflows where in an error state, but after retrying these workflows, 99% of the workflows continued to work without problems after retrying. Maybe this also could have something to do with it...
Badge +1

Hi there! 

 

I am having the same problem, with similar issues (the workload and TCP connections), however the memory and CPU hardware capabilities I have set up for the server far exceed the recommended amount for K2 Blackpearl Server service. At this time it is urgently affecting our production envrionment, could you share your workaround for restarting the server when it is off? Thank you so much! 

This is an old thread, but seeing that there are at least one new entry in this thread, I will post what we have found.
If you are using - a now old version of K2 Blackpearl - you will probably be using .NET 4.5.0. K2 4.7 uses .NET 4.6.

.NET 4.5.0 has a threading error, which is resolved in .NET 4.5.1 and later versions. This is described in the link below, and pertains to changing of thread context and its transactionscope, which is exactly what happends when a given server is taxed.

 

https://particular.net/blog/transactionscope-and-async-await-be-one-with-the-flow

 

Reply