Enterprise-level clients demand the highest quality of business continuity support for their solutions, and rightly so when they service hundreds of thousands of users. For this author, getting to work with one of them to setup Nintex Workflow 2013 in a highly-available environment was as fun as it was challenging. In this blog post I offer guidance for enterprise-class, highly available on-premises Nintex Workflow 2013 deployments using one or more off-site DR farms.
Our topology is a standard two-farm solution with SharePoint Server 2013 Enterprise and SQL Server 2014 Enterprise. Farm A is our primary farm. Farm B is our disaster recovery farm. We can fail-over manually to Farm B if needed at a moment's notice. Fail-over cannot happen automatically, but planned fail-over can happen quickly without down-time and without data loss. Here are the basics:
- Farm A and Farm B are completely separate SharePoint 2013 farms operating independently
- Each farm has two (2) SQL Server 2014 nodes running in a cluster each with two (2) listeners
- Farm A replicates content to Farm B asynchronously via SQL Server AlwaysOn
Microsoft models this topology in detail on TechNet here: Plan for SQL Server AlwaysOn and Microsoft Azure for SharePoint Server 2013 Disaster Recovery
There are 4 SQL Server Availability Groups in use.
- AG1 and AG3, shown above in blue, are the farm-local groups. Farm-specific databases belong here, for example the farm configuration database and the Central Administration content database.
- AG2 and AG4 are the content groups, shown above in dark orange. SharePoint content databases and Nintex Workflow databases go here.
Refer to the TechNet article for details on how the AGs are configured. AG2 replicates to AG4 asynchronously over the WAN thereby allowing Farm B to take over for Farm A with zero or near-zero data loss.
When you setup off-site replication, include all of your SharePoint content databases as well as all of your Nintex Workflow databases in AG2 and replicate them all to AG4. Yes, all of them: config and content. You need to replicate all of those databases together over to Farm B. The replicas in AG4 are read-only. This is fine, it does not cause issues with Nintex Workflow.
Rule #1: All Nintex Workflow databases must be replicated to your DR farm.
Of course, for this topology to work, and our model to succeed when we fail-over to Farm B, we must implement the following constraint: SQL Server aliases for AG2 and AG4 must be the same in Farm A and Farm B.
Rule #2: AG2 and AG4 must use the same SQL client alias or host name.
Nintex Workflow stores connection strings to its own databases in the Config database. You will find these records in the Databases table. If the SQL client aliases or host names used are not the same in Farm B, Nintex Workflow will fail to connect to its own databases when you mount the AG4 secondary replicas in Farm B.
Installing Nintex Workflow can be done using the standard procedure for Farm A and Farm B. Follow the general instructions provided by Nintex to install Nintex Workflow in both farms as you normally would.
Rule #3: Install the same version of Nintex Workflow in Farm A and Farm B.
While it may seem obvious, it's easy to overlook: you should install the same version of Nintex Workflow in Farm A as you do in Farm B so there are no compatibility issues with your solution.
Prep for High Availability and Disaster Recovery
Before you engage high availability and disaster recovery, make sure you are ready by confirming that:
- All of your web applications have been created in both Farm A and Farm B
- All of your SharePoint content databases have been created in Farm A
- All of your Nintex Workflow content databases have been created in Farm A
- All of these databases are in AG2 and being replicated to AG4
- Each SharePoint content database is mapped to a Nintex Workflow content database
Checklist: Engage HA/DR
When ready, follow these steps, in order, to engage high availability and disaster recovery support for Nintex Workflow.
Perform these steps in Farm B:
- Stop the SharePoint Workflow Timer Service in Farm B. Make sure it is stopped on all servers.
- Mount the SharePoint content database secondary replicas in Farm B
- Mount the Nintex Workflow database secondary replicas in Farm B
- There are two ways to do this:
- Edit the database settings for Nintex Workflow via Central Administration or
- Change the "NW2007ConfigurationDatabase" farm property via PowerShell
- There are two ways to do this:
Rule #4: Don't run the SharePoint Workflow Timer Service in your DR farm.
Whether this is Farm A or Farm B, your DR farm is your standby farm. Everything users do in your live, production farm gets replicated to the DR farm. If a user starts a workflow, that workflow instance, the tasks, and all the state information for that workflow gets replicated to the DR farm by AlwaysOn. You don't want the Workflow service running on any of your servers in your DR farm otherwise your DR farm will try to process these workflows too and you'll get some really weird errors.
You are now in High Availability mode. Congratulations.
How-to: Fail Over
When you have to fail over to Farm B, these are the steps to follow:
- Stop the SharePoint Workflow Timer Service in Farm A. Stop it on every server.
- Remember Rule #4? Farm A and Farm B are about to switch roles.
- Flip DNS to Farm B. Wait for the TTL to expire. You want your users off of Farm A right away.
- Sync up the databases. The easiest way to do this is change AG2->AG4 to synchronous replication.
- Fail over to AG4. This makes the databases in Farm B ready for use. User activity will resume.
- Refresh the Web Application ID in the Nintex Workflow configuration database (below).
- Start the SharePoint Workflow Timer Service on the servers of your choice in Farm B.
Refresh the Web Application ID
The ID of your SharePoint web application in Farm B is different than Farm A. For Nintex Workflow to work properly after fail over you have to trick Nintex Workflow into thinking that it's always been connected to Farm B.
USE WITH CAUTION! The script below assumes you only have one (1) web application, but it implements the basic idea:
Rule #5: The PublishedWorkflows table in the Nintex Workflow configuration database has a column called WebApplicationId. This column needs to be updated with the System.Guid of the SharePoint web application in Farm B.
Here is a sample PowerShell script. Note that this script uses both the SharePoint PowerShell cmdlets as well as ADO.NET. It gets the web application ID from SharePoint and overwrites the value in the Nintex Workflow configuration database. Do not use this script verbatim unless you have only 1 web application. The script requires two values:
$url - set this to the URL of the SharePoint web application as required by Get-SPWebApplication
$cs - set this to the connection string of the Nintex Workflow configuration database
$url = "<SharePoint web application URL>"
$cs = "<Nintex Workflow configuration database connection string>"
$wa = Get-SPWebApplication $url
$waId = $wa.Id
$sql = New-Object System.Data.SqlClient.SqlConnection
$sql.ConnectionString = $cs
$sqlCommand = $sql.CreateCommand()
$sqlCommand.CommandType = [System.Data.CommandType]::Text
$sqlCommand.CommandText = "update [PublishedWorkflows] set [WebApplicationId] = @WebApplicationId"
How-to: Upgrade Nintex Workflow
When you need to upgrade Nintex Workflow to a newer version, follow these steps in order:
- Upgrade the Nintex Workflow farm solutions in Farm A using the installer
- Upgrade the Nintex Workflow farm solutions in Farm B using the installer
- In Central Administration, upgrade the Nintex Workflow databases from the live farm. The live farm is the farm your users are using. AlwaysOn will automatically replicate the changes to the DR farm.