Skip to main content
Solved

How do I handle large SharePoint lists?

  • 13 August 2024
  • 6 replies
  • 46 views

I have developed a workflow which is used to extract items from a list on a monthly basis for business finance. Below is a description of the approach I have taken.

Step 1: The workflow has to query the list to capture the required items. This is done with a Query List action, filtered by the date range (first to last days in a calendar month). This can result in anything from 800 to 3000 items in the target range.

Step 2: We then need to loop through the Items collection and break it into individual collections, one per column needed. There are 14 columns that are of interest.

Step 3a: Next, another loop is needed to extract each collection item into a usable variable, which can then be used in the output stage. Within this loop is a Run-If action that batches in chunks of 200, using a row count variable

Batching in 200s

Step 3b: Now the collections are run though a parallel action and apply some date/time formatting as well as Regex to remove problematic special characters and line feeds. Then the column variables are used to build up the rows of column data into a variable varOutputData (CSV format), using Create a text string actions.

Step 3c: To prevent items from being targeted in subsequent runs, a flag is then set to true in the list, which is a Boolean column. Then to aid with management and list views, the current date is added to a date/time column in the list.

Step 3d: Next, some math actions are used to increment the loop and batch count variables by one, for the next iteration. The varOutputData variable is logged into the workflow history.

Step 4: Finally, we call a Webservice to create a CSV file in a named document library, format dates into strings and merge the varOutputData variable into the CSV, before attaching it to an email, sent to the finance team.

So, this is how I get the desired data from the SharePoint list, set each affected items as “extracted” together with run date, and then send the resulting CSV data in an email.

The Problem

However, there are actually two problems. One is the time taken to complete, the other is the unexpected workflow terminations.

  1. For a month containing a target group of 250 list items, the workflow will run for anything up to 8 hours to complete. It is related to the fact that 2 list columns are being written to (Boolean flag and the date). I guess this is just down to the latency incurred when two Cloud based entities are communicating, authenticating and then verifying each list update. I am still looking for a solution to this problem.
  2. I recently had a month that contained just shy of 1800 items in the target range. I ran the workflow in smaller date chunks to try and circumvent any runtime issues. The first run was against 150 items, and that ran through in about 2 hours. The next two runs were for 120 and 95 items. These also completed. On the fourth run, there were 254 items. That ran for about 5 hours and then terminated. There was nothing at all in the history, just terminated. Looking at the history, my log of the output data showed about 98 rows and  “truncated” at its end - not the full history. I had to abandon the run and reset the list items that had half completed. I tried again with only 150 items in range and it worked. I had two instances of terminations in a total of 23 workflow run attempts before I got all the data I needed. In all, it took nearly 2 days, which is insane.

The Plea

What am I doing wrong? Is there a better and more resilient way of handling lists with 1000s of rows? I full accept that my workflow may not be designed in the most efficient way, but I cannot believe that it is normal for such a workflow to take so long to complete.

Has anyone here had more success with this sort of situation? If I just wanted to send a CSV file of all items within range, then that would happen in minutes, so I know the time-consuming element are the list item updates. The problem is that the extract and the update of these columns needs to happen concurrently.

Thanks in advance!

6 replies

Badge +10

@SmashedMonkey , is the goal to transform the SP list items into a CSV string and send it in an email?

 

It will help if you can maybe elaborate on the business use case, what is the end goal?

 

I am always a bit sceptical when Business Process Automation platforms like NAC and K2 are used for pure SYSTEM-TO-SYSTEM data manipulation. These platforms are best suited for Human-to-Human and Human-to-System Process Automation cases.

Badge +5

@SmashedMonkey , is the goal to transform the SP list items into a CSV string and send it in an email?

 

It will help if you can maybe elaborate on the business use case, what is the end goal?

 

I am always a bit sceptical when Business Process Automation platforms like NAC and K2 are used for pure SYSTEM-TO-SYSTEM data manipulation. These platforms are best suited for Human-to-Human and Human-to-System Process Automation cases.

Deon,

Yes, a CSV file gets sent to our Finance team by email. You may scoff, but I happen to agree that it is not the best use of these tools, but I work in the public sector where money is in very short supply. My daily job is to use the tools we have to try and improve and streamline our business processes in the most cost-effective way. The solutions are seldom pretty, and often far from ideal, but this is the world I exist in.

SharePoint and Nintex are being used to provide an electronic form (it used to be an Excel spreadsheet), and a basic approval solution for overtime payments, The form collects the dates, hours and purpose of each instance of overtime (using a repeating section). Once submitted into the list, a workflow runs to pull the data in the repeater, where it then populates another list. This is done to get the repeater rows into proper list rows. It is a precursor to getting the data in shape for export, via CSV to our Oracle eBusiness Suite, used by our HR and Finance teams. Proper connectors are very costly, so this semi-auto, semi-manual process is the best we can achieve. This workflow is an important part of this process - clunk though it is - but I need to make it work, hence my reaching out here in the hope that some kind soul, who may have encountered similar problems, might be able to nudge me in the right direction.

Thank you.

Userlevel 7
Badge +16

Hi @SmashedMonkey 

This seems like a batch process to consolidate the data in a SPO List. If this is the case, using PowerShell may seem like a better and faster alternative.

What is the size of the SPO List? Below 5,000 items? Greater that 5,000 items?

Is that any calculations that has to be done first - get sum, get product, get average?

Do you need to split into smaller batches (1 small batch is 250 rows) and consolidate into a final CSV 

Badge +5

Hi @SmashedMonkey 

This seems like a batch process to consolidate the data in a SPO List. If this is the case, using PowerShell may seem like a better and faster alternative.

What is the size of the SPO List? Below 5,000 items? Greater that 5,000 items?

Is that any calculations that has to be done first - get sum, get product, get average?

Do you need to split into smaller batches (1 small batch is 250 rows) and consolidate into a final CSV 

Garrett, The list will swell to many thousands of items, but this process will only ever expect to run against no more than about 3000 items. The flow has to query the list and select items whose Start Date is on or after 1st of the month, and on or before last day of the month AND the Extracted column is set to No. I am sure that I can do this with PnP PowerShell, but the people who need to carry out this extract are not able to access tools like that, due to security and policy lockdowns. It needs to be a user based solution.

Userlevel 7
Badge +16

@SmashedMonkey ,

There are other considerations. In this case NAC is probably the better tool.

You just need to set the proper expectations with your end users “expect about 3-5 hours”
Building timely updates into the workflow “100 out of xxx records has been processed” after every 100 records processed 

Badge +5

@Garrett Yes, that’s what I am now doing. I have basically set a limit on the number of items being processed and added a column to the list that shows the rolling number of items processed.

The terminations were Nintex over-limit triggered, but support is now investigating why no termination messaging is being shown on the workflow instance history. This was the response from Nintex Support.

Looks like the affected instance has reached the limitations we have, that is why, it was terminated automatically.
Please check the below points and compare the results in the affected instance.
1. The maximum number of action executions in a workflow should not be more than 10,000. This is not the number of actions, but the number of action executions. For example, a single loop action with 100 iterations would result in 100 action executions.
2. The maximum number of iterations in a loop should not be more than 1000.
3. The maximum size of all variables in a workflow should not be more than 100MB. If a workflow contains variables totalling in excess of this value, it will terminate. Unfortunately there is no way to check the size from your end but I can see the workflow has many loop iterations so it might have crossed 100MB variable size.

Reply