Skip to main content

Hi,


I just like to know what are the recommended approaches to recover failures in K2 workflow process instance.


My workflow process consists of various web service calls and DB calls. I notice under heavy load, the web service gets timed out. Even DB calls on a few occasions encounter SQL Exception. Our web services might also have bugs and they can fail. I know it is possible to go to the Management Console and retry from the activity that has been suspended due to some exception. However, while this will not be a problem in Dev or trial environment, in Production, we may not have the time and may not be able to access the server easily to do all these manual recovery.


So how do you handle this?


This problem might have been discussed before; if so, would someone kindly point me to the specific thread?


Cheers,


lyf

Hi,


The recommended approach for this type of scenario would be to make use of the Exception Rules built into the K2 Events. Exceptions Rules allow you to handle a specifc type of exception and then perform some action based on the type of exception. Here you can test whether the error was a timeout and if so have it retry the connection again. If you decide to retry connections, set an activity data field with a counter and have the connection only retried a maximum of 3 times. Failure to do so will send your K2 process into an endless loop, without the ability to stop it, if the Web Service/SQL Server is inaccessible.


In the Exception rule also make sure that you only handle timeout errors by automatic recovery. Log the other exceptions to the K2 Error Log and handle them as you are currently.


Exception Rules are available at various levels but I would suggest you place the exception rule at the lowest level possible - hence the recommendation to do so at the Event level. For more information reference the following Help documentation topic: Designers > Process Design Concepts > Rules > Exception Wizard (http://help.k2.com/helppages/k2blackpearl1370/page=Activity_Exception_Wizard.html)


I hope this helps!


Thanks Johan.


I'm trying to have this approach but not sure if it
is practical in production scenario. I intend to have a timer job to
periodically fetch the list of error process instance (if any) and it will auto retry the process instances. Maybe I can schedule the job to run every 5 minutes, in this way it will clear all the failing web service calls due to network issues. Now I'm more worried about those that stick around after repeated retries which will mean there are bugs in the server-side code. The job will just keep on running and retrying non-stop so there must be a way to remember the number of tries it has executed and to stop retrying after a limit. Assuming I can introduce this max_retry limit into the workflow, if after exceeding the max retries, the job should insert the error info (proc inst id, etc) to a custom database table so that we can have something easily traceable. Based on this table, we can take further human intervention to recover.


Looking forward to hear from you.


cheers


lyf


Reply