K2 NLB cluster node failure

  • 20 November 2015
  • 0 replies
  • 6 views

Badge +11


 

Symptoms


K2 NLB cluster experienced node failure (switch off/disconnect of VM) but NLB service continued attempts to serve clients from failed node.
 

Diagnoses


In cases when no logging was enabled for Windows NLB it is not possible to analyze this issue post factum.
K2 is dependent on the NLB service to distribute work based on the Rules configured, Affinity etc. Windows NLB should be configured according to Microsoft documentation and best practices (MSDN, TechNet) as a technology from Microsoft.
There is limited number of recommendations/requirements from K2 side described in our documentation:
Network Load Balancing Setup and Configuration:
http://help.k2.com/onlinehelp/k2blackpearl/icg/current/webframe.html_deploymentconsidert2.html
Setting up NLB:
http://help.k2.com/onlinehelp/k2blackpearl/icg/current/webframe.html_set_up_nlb.html
Network Load Balancing Setup and Configuration:
http://help.k2.com/onlinehelp/k2blackpearl/icg/current/webframe.html_DeploymentConsiderT2.html
Connection Errors with Unicast NLB:
http://help.k2.com/onlinehelp/K2blackpearl/UserGuide/current/webframe.html_Troubleshooting_UnicastNLB.html_tracksearch=NLB

From this documentation the following can be highlighted:
- The machines residing in the individual Network Load Balancing (NLB) configurations must be configured prior to K2 installation.
- In a clustered environment, the K2 Server is only supported on NLB clusters. Installing the K2 Server in a Windows server cluster environment is not supported.
- It is imperative that the NLB adapters for the Web servers are not connected to the normal server network. A separate logical or physical network, such as a VLAN, must be created so the larger amount of incoming traffic is not flooded to the network ports of other servers, thereby causing performance degradation on all servers within the network, not just the Web servers.
- Traffic to and from a SharePoint site or the K2 Workspace involves a considerable amount of communication from the Web servers to the back-end servers running SQL Server good connectivity between them is required. It is therefore recommended that Web servers be dual-homed:
One network adapter handling the incoming Web requests by using NLB.
One network adapter acting as a normal server adapter to communicate to the server running SQL Server along with the other servers within the infrastructure, such as domain controllers for authentication purposes.
- The 64-bit version of Network Load Balancing Manager (nlbmgr.exe) must be used for 64-bit Windows operating systems. For more information and to get the 64-bit version of the Network Load Balancing Manager tool, please refer to the following Microsoft KB Article: http://support.microsoft.com/kb/892782
- For a K2 Host Server cluster, use a Unicast operation mode and set the affinity to None. Since the K2 Host Server is a stateless machine, no affinity is necessary per session.
- For a K2 Workspace Server cluster, use a Unicast operation mode and set the affinity to Single. You will want to ensure that the web pages retain an affinity to the web server during the session.
- For a K2 for SharePoint Server cluster, use a Unicast operation mode and set the affinity to Single. You will want to ensure that the web pages retain an affinity to the web server during the session.
The same is true for all server clusters that host web based components (such as Process Portals, web services, web parts).
- In some cases, the Network Load Balancing Manager console will time out before the second node is configured. If that happens, just right-click on the cluster and select Refresh. You should see all the nodes in a Converged state. Make sure that your cluster is configured correctly before starting the installation.
As mentioned in the Network Load Balancing Setup and Configuration topic, at least two network adaptors are required when the Unicast operation mode is selected.
Set up the NLB configuration to allow traffic through on the K2 Workflow (default of 5252) and K2 Hostserver (default of 5555) ports.

There is also an option of using hardware-based NLB (e.g. from F5 or other vendors). Windows NLB is a software based solution whereas hardware based load balancing (appliance) is the more expensive option. Using the hardware option obviously externalizes the management of the nodes in the cluster and thus do not put overhead on the same systems as what is used in the clustered nodes. Hardware NLB also allows for web based management, but not only hardware NLB unit is more expensive, but you likely will need two units to avoid single point of failure, whereas with Windows NLB each app server runs NLB service thus you do not have this problem.

As it was mentioned above it is recommended that you make use of a separate NIC to eliminate the broadcasting on the subnet. This is mainly for performance reasons.
 

Resolution

Symptoms described are normally related with Windows NLB service failure. To address this it is recommended to audit/review Windows NLB configuration (including NICs/network configuration) and consider enabling NLB logging to make it easier to investigate root cause if similar issue occur next time.
For details on how to enable NLB logging refer to Microsoft documentation. Enable Network Load Balancing Manager logging:
https://technet.microsoft.com/en-us/library/cc784216(v=ws.10).




 

0 replies

Be the first to reply!

Reply