Demystifying Workflow History (Part 2)

Forum|Forum|10 years ago
January 5, 2016
4 replies
423 views

+8

burked
Nintex Employee

Now that it's 2016, we felt it was about time to revisit one of the most popular posts in community history.

In Part 2, we would like to go deeper into how workflow history affects the performance of your farm, and some best practices on keeping your Nintex database(s) clean and properly maintained.

Before going into specifics around how to maintain a database, it is important to know how Nintex Workflow databases work and how they can affect workflow history and performance. The database itself affects all aspects of workflows. Below is a quick description of what happens when you start a workflow:

Workflow is initiated by a user or programmatically
- A lookup is done to see what database is configured for the history to be placed in. (which DB is mapped to the site collection where the workflow is being run)

The SharePoint Workflow Engine compiles and starts the workflow
- An initial entry is added to the database tables logging the workflow instance ID, siteid, etc.

The workflow Engine begins to process the workflow actions
- Two Records are created for each action that is processed inside of the workflow (start and finish)
- Records are created/referenced for tasks (if applicable)

The workflow Engine finishes processing the workflow
- Final progress records are added to tables
- The state of the workflow is updated to reflect the outcome (Completed, cancelled, error)

During all of the above steps, Nintex and the SharePoint workflow engine are utilising the Nintex database(s). Because both are dependent on the database, if the database becomes unavailable or if the performance of the database is impacted to a point where the back-end queries timeout, the workflow will fail.

Issues related to the database will typically start showing up as long running Workflow Timer Jobs, unexplained intermittent workflow failures or errors when viewing workflow history. Sometimes it will go for a while before anyone notices, and other times it will rapidly get worse.

Some of the more common errors include:

Workflow failed to start (intermittent)
An error occurred in workflow 'Workflow_Name' (intermittent)
Error: Cannot find a corresponding human workflow task ID for this task (intermittent, in workflow history as well as the ULS logs)
A timeout occurred (in workflow history as well as the ULS logs)

Please keep in mind that it will not always be an issue with the database when you see these errors. Most will happen intermittently and restarting the workflow will sometimes allow for successful completion. If you are seeing errors such as "An error occurred in workflow 'Workflow_Name' " on a consistent basis, this could be a sign that there is an underlying issue with the workflow itself.

One of the more common questions we get asked is: "How many records can the Nintex database handle before there are performance issues?"

This is one of the hardest questions that can be asked to support, because there are a lot of factors that go into being able to provide an accurate answer. Typically in the past we have advised between 5-10 million records in the dbo.workflowprogress table because that is most common range we see when we ask for the report (see part 1). But this begs the question; why can some environments run 100's of millions of records before seeing any performance degradation? The answer is: 1.) Hardware and 2.) Maintenance.

When we see an environment that has 10's or 100's of millions of records chugging along, they are typically backed by large SQL servers (Memory and high performance I/O storage) and processes/jobs to clean/trim/reindex the databases.

Because keeping the database healthy is an important part of ensuring workflows run as expected, it's crucial that the database(s) be properly maintained.

Outside of adding hardware, below are some of the most common maintenance steps to ensure optimal database performance:

Clean-up Fragmented Indexes

Fragmentation of indexes can occur when there are a large number of read-write / delete operations occurring on a database. This can cause logical pages to be out of place with where they should be located within the data file. Fragmented databases can cause slow query times and SQL performance degradation. Index Fragmentation should be kept under 10%. If your index is 5%-30% then you can reorganize. If it is >30% then you will need to rebuild.

The t-sql below will retrieve the fragmentation on the indexes for a database named DATABASENAME. You will want to replace 'DATABASENAME' with the name of your database.

Use DATABASENAME
GO
SELECT OBJECT_NAME(i.object_id) AS TableName ,i.name AS TableIndexName ,phystat.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'DETAILED') phystat
inner JOIN sys.indexes i ON i.object_id = phystat.object_id AND i.index_id = phystat.index_id
WHERE phystat.avg_fragmentation_in_percent > 10

You can also use the PowerShell Script written by Aaron Labiosa at the link below:

How to quickly check the health of your Nintex Workflow SQL Indexes

Once the fragmentation has been found, you can plan accordingly for reorganizing / rebuilding the indexes. The link below will provide further information on how to reorganize / rebuild these indexes.

MSDN: Reorganize and Rebuild Indexes

Keep a clean dbo.WorkflowLog table

This table can grow very rapidly when Verbose Logging is enabled for the Nintex Workflow Category in the Diagnostic Logging section of Central Administration. It is recommended that Verbose logging is only enabled on this category when performing troubleshooting of issues with Nintex Workflow. If you are not actively troubleshooting an issue, but you have enabled verbose logging for this category, it is recommended that you disable verbose logging then truncate the table.

The exact steps for truncating this table can be found on Pavel Svetleachni's blog post below:

How to purge dbo.WorkflowLog table

Backup / Shrink transaction logs

If your database is in a Full or Bulk Logged Recovery mode, it is important that you also backup and shrink your transaction logs in order to prevent excessive growth. When performing the transaction log backup, ensure that the default selection "Truncate the transaction log by removing inactive entries" is selected. This will remove any items that are not currently in the tail of the log (active log).

Once the backup has been completed, you will want to shrink the database using the DBCC SHRINKFILE t-sql command. This is where it will take some legwork from your DBA, as you will want to make sure that you are shrinking the log file to a size that allows for the expected logfile growth between backups. There is no hard-set number on the logfile size, but a good rule of thumb will be to keep it anywhere from 10-25% of your database size (10% if there is minimal growth between backups; 25% if there is large growth between backups).

As an example, The command below will shrink the logfile for the DATABASENAME database to 1000MB. DATABASENAME and 1000 can be altered accordingly with your specific information. In order to shrink the log file, you will need to set the recovery model for the database to 'Simple' prior to running the shrink operation. You can set the database back to full recovery immediately after shrinking the database.

Use DATABASENAME
GO
Alter DATABASE DATABASENAME
Set Recovery SIMPLE;
GO
DBCC SHRINKFILE (DATABASENAME_log, 1000)
GO
Alter DATABASE DATABASENAME
Set Recovery FULL;
GO

Set your file growth and database size limits appropriately

By default SQL will have the following settings for your database and logs

	Autogrowth	Maxsize
Database	By 1MB	Unlimited
Transaction Log	By 10%	Limited to 2,097,152MB

On a SQL environment where there are large amounts of write operations, it is important that these numbers are managed appropriately. Setting the autogrowth to 1MB can lead to a large amount of fragmentation on the disk, and it can cause less than optimal SQL performance when accessing data. Autogrowth should be set in Megabytes, and anywhere from 50-100MB would be appropriate for a database with a large amount of I/O. The Maxsize setting will not have an impact on performance, but it is recommended to monitor growth and overall size of these files. Again, there are not going to be hard limitations on the maxsize for these files, as the functionality of the Nintex Databases will rely on the hardware and overall maintenance of the SQL environment.

Stay tuned for Part 3 where we wake Chad Austin from his hypersleep to provide more information on the relationship between Nintex Databases and the SharePoint workflow engine.