Skip to main content

Hi,


 We recently installed a second K2 server to help with our load and avoid down time if K2 or the machine crashes.  Both instances are pointing to the same database.  Users are able to log into the second instance and view their worklists.  However we just found out that any updates that they perform do not actually update anywhere.


 The following error started showing up in the event viewer on the machine of the first instance:


4140 Cluster communication error : Authentication with server failed    at Client.Throw(String s, Exception e)
   at Client.Connect(String Server, Int32 Port, String ConStr)
   at K2ClusterServer.SendAndReceive(ArchiveX ar)


We tried bouncing the K2 service, but the issue still occurred.  We then tried shutting down the K2 service on the second instance, which fixed the issue. 


We suspect that once K2 is installed on a second machine and pointed at the database, the application starts to balance traffic somehow between the two installations and that maybe the transactions that were being sent through the second instance were failing.  This is a guess at this point.  There are instances of this error starting at 6:59pm last night, through 8:32pm, then a LOT of these errors starting this morning.  It is unclear at this point why they would have started at that time.


Does anyone know if there is any additional steps that need to be performed on the second k2server install?  Do we need to install our K2 processes on that machine as well?  Or is that all stored in the database?  We are now scared to start up the second instance because we will probably start losing transactions again.  Any help would be appreciated.


 Thanks


 


 


 

There is no need to re-export processes as they are stored in the database.  I would recommend opening up a support ticket for this one.


User contacted K2 Support for assistance.


Typically these types of behaviors are due to components pointing directly to a specific K2 Server, which should be pointed to the logical NLB node instead. The K2 servers will coordinate activities between themselves whenever the logical NLB node hands instructions to them. Also, often session-state is a factor, where a client thread opened to one K2 Server needs to be completed by that server (often referred to as 'sticky'). Lastly, how the environment architecture is a factor to consider (hopping thru firewalls, proxy servers, etc.), where session-states or context lost dropped, affecting behavior.


K2 offers a K2 2003 Infrastructure training course that is extremely valuable in learning how to configure NLB and clustered environments.


 


 


Just to explain this.  There is a heartbeat communication which occurs between K2 servers in a cluster.  The reason being is that when a new version of a process is exported, the server that you export to caches the new version of the process definition in memory.  The heartbeat allows the exported server to inform the other servers to refresh their process definition cache. So far this is one of the functions of the heartbeat mechanism that I know of.


In most cases, you would get this error when the K2 server does not have port 5252 access to the other K2 servers in the cluster.  You can easily verify this by opening a DOS command prompt and telnet to port 5252 to the other K2 server.


You would also get this error if you did not cleanly uninstall a K2 server that was previously installed on another node.  e.g. server crash.  The K2 server would keep polling the invalid node and throw this error.  This would require removing the invalid server entry from the _Server table in the K2 database (please make a backup of your databases before doing any direct modifications).


Reply