can azure durable functions support, cancel, suspend and retry/replay activites? - azure

we are exploring azure durable functions to facilitate the following requirements, can the durable functions support these ?
a user can suspend the workflow and resume when he wants
if an activity of workflow fails, the user can retry that particular activity with human intervention as many number of time he wants
a user can choose from which step the workflow can be resumed, to be more brief if 2 of the 4 steps have been executed in the workflow the user can re-run the workflow from the beginning (Step 1)
Thanks

It's possible to "stop" the execution using External Event to stop the processing of an execution. You can restart the execution using Rewind API, but it will restart the whole workflow, not particular tasks.
More info: https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-instance-management?tabs=csharp#rewind-instances-preview
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-external-events?tabs=csharp

Related

How is ForEach activity declaring success in Data Factory?

In the following scenario we have a ForEach activity running in an Azure Data Factory pipeline to copy data from source to destination.
The last CopyActivity took 4:10:33 but the ForEach activity declared Succeeded 36 Minutes later: 4:46:12.
The question is, why ForEach activity need this 36 Minutes extra?
Is it the case that the ForEach needs also to consolidate results from subactivities before declaring success or fail?
Official answer from Microsoft: ForEach activity does wait for all inner activity runs to complete. In theory, there should not be much delay on marking foreach run success after the last activity run within it succeed. However, ADF rely on partner service to execute the runs and it's possible that the partner service run into failures and could not complete foreach in time. They have build in logic to keep retry and recover but the behavior in ADF activity runs is delay. It's also possible that orchestration service fails and partner service keep retry on calling us. but usually partner service delay is the main cause here.
Our assumption: Duration time is end-to-end for the pipeline activity. That takes into account all factors like marshaling of your data flow script from ADF to the Spark cluster, cluster acquisition time, job execution, and I/O write time. Due to ADF is serverless compute, I think ForEach needs time to wait for all activities to acquire and release computing resources, but this is my guess because there are few official explanations.
So there will be a delay time, which varies according to internal activities.
Official answer from Microsoft: ForEach activity does wait for all inner activity runs to complete. In theory, there should not be much delay on marking foreach run success after the last activity run within it succeed. However, ADF rely on partner service to execute the runs and it's possible that the partner service run into failures and could not complete foreach in time. They have build in logic to keep retry and recover but the behavior in ADF activity runs is delay. It's also possible that orchestration service fails and partner service keep retry on calling us. but usually partner service delay is the main cause here.

Do Azure Batch jobs need a watcher process?

We have a very long running operation (potentially days) that we would like to have triggered from a BLOB file written to a Azure Storage. This job could be started once year, never, or many times over a few days.
Azure Batch jobs look exactly like what we need assuming there doesn't need to be a 'watcher' process on the batch job as it runs. For example, if we can have a Azure Function catch the BLOB event, fire up a Batch job, start the job in a "fire and forget" type fashion, and then the Function ends it is exactly what we need. We aren't really too worried about reporting progress of the job (we are using a SQL table for that), we just want to start the job then monitor it out of band.
Is there a way to start a batch job and let the instigator process disappear while the job continues to run in the background? If not, is there any way to do this without having to have a constantly running process (Worker Role or Fabric Worker)? We are trying to avoid having a process (Worker/Fabric Role, Function using the App Function Plan, etc.) running all the time when 99.9% of the time it isn't doing anything.
Short answer: No, you don't need a watcher process.
Azure Batch tasks are asynchronous in nature. When you add a task (under a job), your call against the Batch service immediately returns with success or failure of the submission action itself (and not if the task completed successfully on a compute node or not). The Batch service takes care of scheduling the task among the available compute nodes in your pool, internally monitoring the progress of the task, updating stats, etc.
If you elect to do so, you can monitor the progress of your task independently of the submitting actor using any SDK, REST API or client tool. Or you can opt to monitor it out-of-band yourself as you have described if your task is updating an external monitor or data store. Or you can schedule a task and not monitor it, the service does not force you to monitor/watch the task.

Windows Azure time based event processing

I'm am trying to design an application where a user can create a task that must be run at a certain time. To compound problems this task needs to have an associated countdown function so that at say 1 minute before the time it must "come to life" and notify the user.
This solution will be developed in .net using Windows Azure and Azure Message bus. My question is once the message/task is delivered via the bus how do I architect the solution to handle potentially thousands of independent events that must be processed with second or sub second accuracy? I can't have 1 worker role per user that would be insanely expensive ...
I was thinking of using something like Quartz.net in the backend to handle the job processing and scheduling but surely there must be a better way with Azure ...
Any guidance and assistance would be greatly appreciated
Thanks

WF4 Affinity on Windows Azure and other NLB environments

I'm using Windows Azure and WF4 and my workflow service is hosted in a web-role (with N instances). My job now is find out how
to do an affinity, in a way that I can send messages to the right workflow instance. To explain this scenario, my workflow (attached) starts with a "StartWorkflow" receive activity, creates 3 "Person" and, in a parallel-for-each, waits for the confirmation of these 3 people ("ConfirmCreation" Receive Activity).
I then started to search how the affinity is made in others NLB environments (mainly looked for informations about how this works on Windows Server AppFabric), but I didn't find a precise answer. So how is it done in others NLB environments?
My next task is find out how I could implement a system to handle this affinity on Windows Azure and how much would this solution cost (in price, time and amount of work) to see if its viable or if it's better to work with only one web-role instance while we wait for the WF4 host for the Azure AppFabric. The only way I found was to persist the workflow instance. Is there other ways of doing this?
My third, but not last, task is to find out how WF4 handles multiple messages received at the same time. In my scenario, this means how it would handle if the 3 people confirmed at the same time and the confirmation messages are also received at the same time. Since the most logical answer for this problem seems to be to use a queue, I started looking for information about queues on WF4 and found people speaking about MSQM. But what is the native WF4 messages handler system? Is this handler really a queue or is it another system? How is this concurrency handled?
You shouldn't need any affinity. In fact that's kinda the whole point of durable Workflows. Whilst your workflow is waiting for this confirmation it should be persisted and unloaded from any one server.
As far as persistence goes for Windows Azure you would either need to hack the standard SQL persistence scripts so that they work on SQL Azure or write your own InstanceStore implementation that sits on top of Azure Storage. We have done the latter for a workflow we're running in Azure, but I'm unable to share the code. On a scale of 1 to 10 for effort, I'd rank it around an 8.
As far as multiple messages, what will happen is the messages will be received and delivered to the workflow instance one message at a time. Now, it's possible that every one of those messages goes to the same server or maybe each one goes to a diff. server. No matter how it happens, the workflow runtime will attempt to load the workflow from the instance store, see that it is currently locked and block/retry until the workflow becomes available to process the next message. So you don't have to worry about concurrent access to the same workflow instance as long as you configure everything correctly and the InstanceStore implementation is doing its job.
Here's a few other suggestions:
Make sure you use the PersistBeforeSend option on your SendReply actvities
Configure the following workflow service options
<workflowIdle timeToUnload="00:00:00" />
<sqlWorkflowInstanceStore ... instanceLockedExceptionAction="AggressiveRetry" />
Using the out of the box SQL instance store with SQL Azure is a bit of a problem at the moment with the Azure 1.3 SDK as each deployment, even if you made 0 code changes, results in a new service deployment meaning that already persisted workflows can't continue. That is a bug that will be solved but a PITA for now.
As Drew said your workflow instance should just move from server to server as needed, no need to pin it to a specific machine. And even if you could that would hurt scalability and reliability so something to be avoided.
Sending messages through MSMQ using the WCF NetMsmqBinding works just fine. Internally WF uses a completely different mechanism called bookmarks that allow a workflow to stop and resume. Each Receive activity, as well as others like Delay, will create a bookmark and wait for that to be resumed. You can only resume existing bookmarks. Even resuming a bookmark is not a direct action but put into an internal queue, not MSMQ, by the workflow scheduler and executed through a SynchronizationContext. You get no control over the scheduler but you can replace the SynchronizationContext when using the WorkflowApplication and so get some control over how and where activities are executed.

SharePoint timer jobs not getting invoked

I have a timer job which has been deployed to a server with multiple Web front ends.
This timer job reads it's configuration from a Hierarchical Object Store.
This timer job is scheduled to run daily on the server.
But the problem is that this timer job is not getting invoked daily. I have implemented event logging in the timer job's Execute() method, but I dont see any logs being generated.
Any ideas as to what could cause a timer job to be not picked up for execution by the SharePoint Timer Service? How can I troubleshoot this problem?
Are there any "gotcha"s for running timer jobs in servers from multiple front ends? Will the timer job get execute in all the web front ends, or any one of them arbitarily? How to know which machine will have my event logs?
This might be a stupid question, but does having multiple front ends for load balancing affect the way Hierarchical Object Stores behave?
EDIT:
One of the commenters, Sean McDounough, (Thanks Sean!! ) made a very good point that:
"whether or not the timer job runs on all WFEs will be a function of the SPJobLockType enum value you specified in the constructor. Using a value of "None" means that the job will run on all WFEs."
Now, my timer job is responsible for sending periodic mails to a list of users. Currently it is marked as SPJobLockType.Job"
If I change this to SPJobLockType.None, does this mean that my timer job will be executed in all the WFEs separately? (THis is not desired, it will spam all the users with multiple emails)
Or does it mean that the timer job will execute in any one of the WFEs, arbitarily?
Try restarting the SharePoint timer service from the command-line using NET STOP SPTIMERV3 followed by a NET START SPTIMERV3. My guess is that the timer service is running with an older version of your .NET assembly. The timer service does not automatically reload assemblies when you upgrade the WSP solution.
To do this, follow these steps:
Stop the Timer service.
Click Start, point to Administrative Tools, and then click Services.
Right-click Windows SharePoint Services Timer, and then click Stop, or Restart service.
This URL helped me.

Resources