How to purge items from a large history list safely via PowerShell


Userlevel 7
Badge +10

Products: Nintex Workflow 2013, Nintex Workflow 2010

 

Occasionally a history list grows to a point where you can no longer utilize NWAdmin to trim the list (Read this article to avoid this: Defensive Workflow Design Part 1 - Workflow History Lists ). In order to get rid of the items and not impact the entire farm, it becomes necessary to utilize paging and indexing to specifically target each item and delete it. Paging helps throttle the traffic to your SQL server down by only deleting x number of items at a time before it rests and starts again. Indexing enables the targeting of items without the performance overhead of enumerating and/or querying a large collection of items.

 

Using this PowerShell script large history lists can be purged utilizing paging and indexing.

 

PowerShell Script
  1.         Add-PSSnapin Microsoft.SharePoint.PowerShell -ErrorAction SilentlyContinue
  2.       
  3.         #Configure target site and list.
  4.       
  5.         $list = $($(Get-SPWeb -Identity 'http://contoso.com').Lists['NintexWorkflowHistory'])
  6.       
  7.         #Index count for list items.
  8.       
  9.         $index = $list.ItemCount
  10.       
  11.         #Index counter for paging.
  12.       
  13.         $page = 0
  14.       
  15.         #Configure how many items to delete per batch.
  16.       
  17.         $pagesize = 1000
  18.       
  19.         #Configure how may seconds to pause between batches.
  20.       
  21.         $sleep = 1
  22.       
  23.         #Turn verbose output on/off
  24.       
  25.         $verbose = $true
  26.       
  27.         While($index -ge 0){
  28.       
  29.         if($verbose){
  30.       
  31.         $("Deleting item at index $($index).")
  32.       
  33.         }
  34.       
  35.         if($page -lt $pagesize){
  36.       
  37.         try{
  38.       
  39.         if($($list.Items[$index])['Modified'] -lt [DateTime]::Parse("01/01/2014")){
  40.       
  41.         $list.Items[$index].Delete()
  42.       
  43.         write-host "Found Item"
  44.       
  45.         }
  46.       
  47.         }
  48.       
  49.         catch [System.Exception]{
  50.       
  51.         if($verbose){
  52.  
  53.         $("Skipping item at index $($index).")
  54.       
  55.         }
  56.       
  57.         }
  58.       
  59.         $index--
  60.       
  61.         $page++
  62.       
  63.         }
  64.       
  65.         else{
  66.       
  67.         if($verbose){
  68.       
  69.         $("Sleeping for $($sleep) seconds.")
  70.       
  71.         }
  72.       
  73.         [System.Threading.Thread]::Sleep($sleep * 1000)
  74.       
  75.         $page = 0
  76.       
  77.         }
  78.       
  79.         }
 

To use the script do the following:

  • Replace http://contoso.com with the URL of the site you wish to execute the script against.
  • Replace NintexWorkflowHistory with the title of the history list you wish to target.

 

Note: By default the script will delete 1000 items and then rest for 1 second.

 

Filtering can be added by adding an if statement around the Delete() call as shown below. In this example, the item would be deleted if it was older than 01/01/1999.

 

Filtering
  1. try{
  2. if((($list.Items[$index])[($list.Items[$index].Fields['Created'])] -lt [DateTime]::Parse("01/01/1999")))
  3. {
  4. $list.Items[$index].Delete()
  5.  
  6. }

 

Version History and other scripts can be found here: http://ALPS.Codeplex.com


21 replies

For what it is worth, I had much better performance on deleting when I filtered by date on the front-end, then iterated through the list of deletes.  Eg:

$oldDate = (Get-Date).AddDays(-396)

$items = $list.Items | Where {$_["Modified"] -lt $oldDate}

foreach ($item in $items) {

    Write-Host "Deleting: $i/$total"

    try {

        $item.Delete()

    } catch [System.Exception] {

        Write-Host "Had some trouble..." -ForegroundColor Yellow

    }

    $i++

}

Hope this helps someone!

Badge +9

I noticed you had a filter on Date Created. How would you correlate the workflow history with Workflows that are Completed, Cancelled or (Errored and Not Running)

thanks,

Stephan

Badge +2

Chris,

Did you have an issue where it ran real slow?  I am running now on a site with 25k items in the WorkflowHistory list, but it only deleted 5k in the past 12 hours...real..slow.  Which is good because it is supposed to be 'easy' on the SQL server, but in my opinion, running off-hours, a little faster would be preferred.

thanks

Badge +2

Aaron,


Thank you for the script.  So, I was under the impression that this script deletes 1000 items at a time, but when you mean batch, you mean that it will delete 1000 items, then sleep, then work on the next 1000, correct?  I guess I am a bit confused.  I ran this script on a site with 25,000 items and the script took 4 days to delete 20,000 items.  Yes, your post says it is safe on the SQL server, but did I miss something that is causing it to run REAL slow?  or is that a given?  I did see Chris' post about filtering on the front end and that worked much faster, just not sure how that affects performance...  Just making sure I didn't miss anything

thanks

Using Aaron's method, yes it was crazy slow because as you said it takes it easy on the farm.  I let my slightly-modified version of his script run after-hours to get around any performance concerns.  It beat up our WFE pretty bad for a few minutes, but that--in my mind--is way better than letting a script run for day or two.  If you have the luxury of letting something run unthrottled, then I say go for it.

Badge +3

Aaron Labiosa, thanks for the script, I'm trying to use it on my history list (100000 items). But it runs very slow, each 1 item being processed for about 15-20 seconds. How can I speed up this process?

Alex, see my post above about speeding things up.

Userlevel 7
Badge +10

Thanks for the quick explanation Chris, it is safe to run the script with no throttle most of the time. When I originally put this together, there was a requirement for the script to have zero impact on the environment as it was going to need to run for a very long time (even with no throttling).

Cheers!

Badge +3

Guys, I've tried to run NWAdmin -o PurgeHistoryListData with magic parameter -batchSize 200 (and -verbose too) and its really works! Still need to wait for a long long time but it works.

Userlevel 7
Badge +10

This will work if there are not too many items in that list. NWAdmin uses CAML to query the list instead of the ID Index that the PS script uses. Under ideal circumstances, NWAdmin works perfectly however, if things have gotten a bit out of hand, the PS script can be used as a last ditch effort.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

Badge +3

But I've tried to use PS script first on my 100k-items-list and got into high CPU usage by powershell.exe process and as I stated above, it runs very slow, each item being processed for about 15-20 seconds.

Then I've tried to use NWAdmin with batchsize option and it worked. 100k it is not a small list, though )

So, using index in my case was not the best option.

Anyway, thanks, I finally resolved all troubles with history lists!

Userlevel 7
Badge +10

Yeah, that is a large list albeit not really in the range the script was originally targeted at (millions of items in which case any sort of CAML query would fail). I am glad to hear you were able to get your issue sorted!

I can't seem to make this work.  We have a very large workflow history list (over 3 Million Items).  The script starts to run, but when it gets to the deleting part, it says "Deleting Item at X" and immediately "Skipping Item at X" and doesn't actually delete anything...

Userlevel 7
Badge +10

It runs by Index, let it sit there and do that until it gets to the item index that contain items. It is highly likely that early item id’s no longer have items associated with them. Let the script run for a few days and see if the item count starts to drop.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

Hi Aaron,

I'm using the NWAdmin to delete items from both the list NintexWorkflowHistory and from the Nintex content database on the SQL server.

However, I see no drop in the number of records in the tables 'dbo.NintexWorkflowLog' and 'dbo.NintexWorkflowProgress'. The item count from the list NintexWorkflowHistory on the SharePoint site does drop. Why does the NWAdmin doesn't cause records to be deleted from the Nintex Content database? Or am I looking at the wrong tables?

Thanks in advance!

Badge +9

Aaron Labiosa, how do we filter by status, can you please provide example?

There is a few issues with the above script.  it works fine for History lists with up to 50,000 records - but anything over that and it fails miserably.  Firstly, only around 60k items are actually loaded into the object, and it can take 15 seconds per item to delete.

 

So, I wrote the following functions, the first will go through your farm, and return a list of any large NintexWorkflowHistory lists.  The second uses the SPQuery object to batch delete records, around 10 per second.

 

Function NintexCounts([string]$url, [int]$MinSize)
{
  $Sites = Get-SPSite $url/* -Limit ALL
  ForEach ($Site in $Sites)
  {
    ForEach ($web in $Site.AllWebs)
    {
      $WFList = $web.Lists["NintexWorkflowHistory"]
      If ($WFList -ne $null -and $WFList.ItemCount -ge $MinSize)
      {
        write-host $web.url: $WFList.Title, $WFList.ItemCount
      }
    }
  }
}


Function ClearNintexHistory([string]$url, [string]$BeforeDate)
{
 $Web = Get-SPWeb $url
 $NintexList = $Web.Lists["NintexWorkflowHistory"]
 If ($NintexList -ne $null)
 {
  $Query = New-Object Microsoft.SharePoint.SPQuery
  $Query.ViewAttributes = "Scope='Recursive'"
  $Query.RowLimit = 1000
  $Query.ViewFields = "<FieldRef Name='ID'/>"
  $Query.ViewFieldsOnly = $true
  $Query.Query = "<Where> <Leq> <FieldRef Name='Created' /> <Value IncludeTimeValue='FALSE' Type='DateTime'>$BeforeDate</Value> </Leq> </Where>"

  Do
  {
     $ListItems = $NintexList.GetItems($Query)
     $Query.ListItemCollectionPosition = $ListItems.ListItemCollectionPosition
     foreach($Item in $ListItems)
     {
       Write-Host "Deleting Item - $($Item.Id)"
       $NintexList.GetItemById($Item.Id).Delete()
     }
  }
  While ($Query.ListItemCollectionPosition -ne $null)
 }
}

NintexCounts –url http://[url] –MinSize 60000

ClearNintexHistory -url http://[url] -BeforeDate "2017-01-01"

 

Hopefully these are of help to someone

Regards

Tom

Great Post! this is has been super helpful for environmental clean ups! 

Badge +5

Tom's method worked great for me. Many thanks.

Badge +1

For the script above from tom_grieve, does something go in the brackets for $url and $int? If so, would my url and integer go inside the brackets or are the brackets deleted? I tried several variations for my url and the script did not do anything.

 

Function NintexCounts([string]$url, [int]$MinSize)
{
  $Sites = Get-SPSite $url/* -Limit ALL
  ForEach ($Site in $Sites)
  {
    ForEach ($web in $Site.AllWebs)
    {
      $WFList = $web.Lists["NintexWorkflowHistory"]
      If ($WFList -ne $null -and $WFList.ItemCount -ge $MinSize)
      {
        write-host $web.url: $WFList.Title, $WFList.ItemCount
      }
    }
  }
}

Badge +1

I am using script at the top of this thread. I added a timestamp at the beginning and end. The script goes through the history list starting with highest index number, so it looks in order from newest created to oldest created. It definitely works to delete all WF history items older than date entered in line 39.

For a history list in our qa environment with nearly 16,000 items it took 8.5 hours to complete, and nobody was using the site while the script was running. We have a list in production with nearly 4 million items.

Questions:

Is it recommended to take the site offline while running the script?

Will users notice degraded performance in that site or in other sites that are hosted on the same servers?

Reply