I have 100 rows in an Azure table storage. But later I can add more rows or set "disabled" property on any row in the table.
I have an Azure function - "XProcessor". I would like to have an Azure function "HostFunction" which would start a new instance of the "XProcessor" for each row from the Azure table storage.
The "HostFunction" should be able to pass details of a table row to the instance of the "XProcessor" and the "HostFunction" needs to be executed every minute.
How do I achieve this? I am looking into the Azure Logic app but not sure yet how to orchestrate "XProcessor" with the details.
I would look in to using a combination of "Durable Functions" techniques.
Eternal Orchestration - will allow you to enable your process to run, wait a set period of time after completion, and then run again.
From the docs: Eternal orchestrations are orchestrator functions that never end. They are useful when you want to use Durable Functions for aggregators and any scenario that requires an infinite loop.
Fan-in-fan-out - will allow you to call a separate function per row.
From the docs: Fan-out/fan-in refers to the pattern of executing multiple functions in parallel, and then waiting for all to finish. Often some aggregation work is done on results returned from the functions.
There is a bit of additional overhead getting going with durable functions but it gives you fine grained control over your execution. Keep in mind that state of objects is serialised in durable functions at every await call, so thousands of rows would potentially be an issue, but for the scenario you describe it will work well and I have had a lot of success with it.
Good luck!
Durable functions is your go to option here. what you can do is
1) Have a controller function called the orchestration function
2) Another child function which can be invoked multiple times by the orchestration
And in your orchestrate function, you wait till all the child instances give you the response back, hence you can get a fan-out- fan in scenario.
Have a look at following links
https://blog.mexia.com.au/tag/azure-durable-functions and https://learn.microsoft.com/en-us/azure/azure-functions/durable-functions-overview
Related
In Azure, I have created a queue of items, which are consumed by an Azure function. I would like to group these items in the queue, so that after a batch of items are consumed, then I will do another action, e.g. send an email. I am new to Azure, and wanted to know is there to batch/manage a group of function calls to do an action after the batch is processed?
It is certainly possible to do so. Do look into Azure Durable Functions.
Based on the description, you can implement Function Chaining pattern where output of one Function can be fed to another Function. In this case the items will be processed sequentially.
You can also implement Fan out/fan in pattern if you want the items in your queue to be processed in parallel.
I have the following setup:
Azure function in Python 3.6 is processing some entities utilizing
the TableService class from the Python Table API (new one for
CosmosDB) to check whether the currently processed entity is already
in a storage table. This is done by invoking
TableService.get_entity.
get_entity method throws exception each
time it does not find an entity with the given rowid (same as the
entity id) and partition key.
If no entity with this id is found then i call insert_entity to add it to the table.
With this i am trying to implement processing only of entities that haven't been processed before (not logged in the table).
I observe however consistent behaviour of the function simply freezing after exactly 10 exceptions, where it halts execution on the 10th invocation and does not continue processing for another minute or 2!
I even changed the implementation of instead doing a lookup first to simply call insert_entity and let it fail when a duplicate row key is added. Surprisingly enough the behaviour there is exactly the same - on the 10th duplicate invocation the execution freezes and continues after a while.
What is the cause of such behaviour? Is this some sort of protection mechanism on the storage account that kicks in? Or is it something to do with the Python client? To me it looks very much something by design.
I wasnt able to find any documentation or settings page on the portal for influencing such behaviour.
I am wondering of it is possible to implement such logic with using table storage? I don't seem to find it justifiable to spin up a SQL Azure db or Cosmos DB instance for such trivial functionality of checking whether an entity does not exist in a table.
Thanks
I have a logic app with a sql trigger that gets multiple rows.
I need to split on the rows so that I have a better overview about the actions I do per row.
Now I would like that the logic app is only working on one row at a time.
What would be the best solution for that since
"operationOptions": "singleInstance", and
"runtimeConfiguration": {
"concurrency": {
"runs": 1
}
},
are not working with splitOn.
I was also thinking about calling another logic app and have the logic app use a runtimeConfiguration but that sounds just like an ugly workaround.
Edit:
The row is atomic, and no sorting is needed. Each row can be worked on separately and independent of other data.
As fare as I can tell I wouldn't use a foreach for that since than one failure within a row will lead to a failed logic app.
If one dataset (row) other should also be tried and the error should be easily visible.
Yes, you are seeing the expected behavior. Keep in mind, the split happens in the trigger, not the workflow. BizTalk works the same way except it's a bit more obvious there.
You don't want concurrent processing, you want ordered processing. Right now, the most direct way to handle this is by Foreach'ing over the collection. Though waiting ~3 weeks might be a better option.
One decision point will be weather the atomicity is the collection or the item. Also, you'll need to know if overlapping batches are ok or not.
For instance, if you need to process all items in order, with batch level validation, Foreach with concurrency = 1 is what you need.
Today (as of 2018-03-06) concurrency control is not supported for split-on triggers.
Having said that, concurrency control should be enabled for all trigger types (including split-on triggers) within the next 2-3 weeks.
In the interim you could remove the splitOn property on your trigger and set its concurrency limit to 1. This will start a single run for the entire collection of items, but you can use a foreach loop in your definition to limit concurrency as well. The drawback here is that the trigger will wait until the run as a whole is completed (all items are processed), so the throughput will not be optimal.
I'm trying out the AWS step function. What I'm trying to create.
Get a list of endpoints from the dynamoDB (https://user:password#server1.com, https://user2:password#server2.com, etc..)
From each domain, I get a list of ids. /all
For each id in the result, I want to do a series of REST etc https://user:password#server1.com/device/{id} (Only one request at the time to each domain in parallel)
Save the result in a dynamoDB and check if it is duplicated result or not.
I know how to make the rest call and saving to the dynamoDB etc.
But the problem or unable to find the answer to is.
How can I start run /all in parallel for each domain in the array I get from the dynamoDB?
AWS Step Functions have an immutable state. Once created, they cannot be changed. Given this fact, you cannot have a dynamic number of branches in your Parallel state.
To solve for this, you'll probably want to approach your design a little differently. Instead of solving this with a single Step Function, consider breaking it apart into two different state machines, as shown below.
Step Function #1: Retrieve List of Endpoints
Start
Task: Retrieves list of endpoints from DynamoDB
Task: For each endpoint, invoke Step Function #2 and pass in endpoint
End
You could optionally combine states #2 and #3 to simplify the state machine and your task code.
Step Function #2: Perform REST API Calls
Start - takes a single endpoint as input state
Task: Perform series of REST calls against endpoint
Task: Stores result in DynamoDB via Task state
End
Is it possible to cancel an Azure table query?
We have cases where we are making a long running query (can take 30-60 seconds), but the object gets disposed and needs to abort the query before it completes.
We are using TableServicesContext, and ExecuteQuery (synchronously). We can consider async as well if the solution requires it.
First of all, I doubt a table service query may last longer than 30 seconds. Check out this documentation on Query Timeouts and Pagination.
Also, the Windows Azure Storage Services SLA guarantees that the maximum response time for a Table Service (which is for batch operation) is 30 seconds. And operations on single entities shall complete within 2 seconds.
If yet, you still having issues, your solution is to use BeginExecute method which will give you back an IAsyncResult object. You can have your own timer and call CancelRequest with the given IAsyncResult upon your own logic.
By now, if you followed all my links, you might have noticed that BeginExecute and CancelRequest are methods of DataServiceContext calss. That's why they are not complete in the documentation for TableSeriveContext. But since TableServiceContext inherits directly DataServiceContext, these methods are availabe in your TableServiceContext also.
You may also want to take a look at How to: Execute Asynchronous Data Service Queries
Hope this helps!