I was wondering if anyone could help me understand some behavior regarding orchestration functions and entities.
Say I have one HttpTriggered function that starts an orchestration that calls (not signal) a Durable Entity and runs a long process (30 seconds) to update some string member. I also have another HttpTriggered function that starts an orchestration to call the same Entity and get that string.
What I want to know is that if I trigger the first function "A" and then the second function "B" immediately after. When "B" calls the Entity to get the string, does it wait for "A" to finish its call to the Entity? Or will "B" get a "dirty" value from it?
Guess what I'm trying to figure out is if all calls to the Entity have to be synchronous in regards to the Control Queue.
Entities execute operations one-at-a-time in FIFO order, so if you have two orchestrations (A and B) that call some operation simultaneously, the second operation (B in your example) will be forced to wait for A's operation to finish, which could be as long as 30 seconds. Dirty reads therefore aren't possible when calling entities. This synchronization happens in memory in the Durable Functions extension itself.
This is mentioned in the documentation here:
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entities#general-concepts
To prevent conflicts, all operations on a single entity are guaranteed to execute serially, that is, one after another.
This applies both to entity calls and signals.
Related
Is it possible to cancel a scheduled operation in azure function durable entity ? Below is an example code.I want to cancel the call to operation "DeviceTimeout"after it is scheduled.
Entity.Current.SignalEntity(Entity.Current.EntityId, DateTime.UtcNow.Add(TimeSpan.FromSeconds(30)), nameof(DeviceTimeout));
Unfortunately not. There is an open issue requesting this ability but afaik it has not been implemented yet:
I can' think of a very straightforward API to provide this. One approach would be to include some sort of unique identifier for the signal so that a cancellation request can be precisely matched to the signal being cancelled. But there are some open questions on what exactly the implementation would have to guarantee, and whether that can lead to new issues. For example, would we need to store a cancellation request that we can't match to a signal until such a signal arrives? what if it never arrives?
I suppose the next best thing is to just exit immediately once the operation is executed.
I have an operation/task that I need to run, which is triggered by an event getting fired (I don't think this last thing is really important).
Thing is, this task is composed of several io operations, network calls mostly. Also, I would like to run this task atomically, start to end, one at the time, newer tasks should not start until the current one finishes.
I would normally do this using a critical section of some kind, but I don't think there's such a concept in js or the node base lib. How do you suggest I should handle a case like this?
thanks
EDIT: I've seen the "critical sections are not needed, this is single threaded" opinion several times in different posts and I think that is only partially true, it only applies to synchronous actions.
Suppose the typical scenario for which critical sections are used, you need to do 2 things A) check for the validity of a condition, B) apply an action only if A is either true or false, an action that would flip the condition. You don't want 2 threads to arrive to the conclusion that A is false at the same time, and that B should be done, so you wrap A and B in a critical section to make them atomic. In node.js, if A in synchronous then you are fine, no other thread will be running and you can do B safely. But if A is async, before it's callback fires, another event for A might show up on the event queue, before the first one get's it's B executed.
As mscdex noted a queue would be preferable, Async has queue() that would be able to handle the scenario you described. To guarantee the 'critical section' feel just set concurrency: 1 for the queue.
Aggregate B has calculations that need to be eventually consistent with aggregate A. Aggregate A can be mutated using eight methods and each method results in B needing to be updated. It seems an eventually consistent task, but the actual update time frame should be within seconds.
I don't want to rely on the application layer to 'remember' to trigger the update. (Jimmy Bogard says this as well.) What's the best way to model this?
Using a domain service with double dispatch is a pain:
The service will have to be a parameter on every method on A
Multiple mutation methods will usually be called in a row and I don't want to trigger an update in B each time a method is called.
Constructor injection is also a pain:
There are situations where A is not mutated, so being forced to instantiate and inject a domain service to watch for mutation that certainly won't happen feels wrong.
Again, multiple mutation methods will usually be called in a row and I don't want to trigger an update in B each time a method is called.
Domain events sound good but I'm not sure what that looks like. Each mutation method raises a domain event?
Again, multiple mutation methods will usually be called in a row and I don't want to trigger an update in B each time a method is called.
How do I model 'knowing' when A is finished being updated and knowing whether it has been updated so I can trigger B's update without relying on the application layer to call methods in a particular order each time?
Or is this really a repository-level or application-level concern, even though it seems to be a domain requirement?
Your number 3. is commonly used and a very straight-forward technique:
Raise a domain event AChangedType1, ..., AChangedTypeN on model A updates
Let a saga/process manager listen on AChangedTypeX and issue a corresponding UpdateBTypeX command.
It's loosely coupled (neither A nor B now about each other) and scales well (easy parallelization), and the relation between them is explicitly modeled in the long running process.
If you don't want to trigger an update to B on every change on A, then you can delay the update by some time before you send out the UpdateBTypeX command (as it is commonly done in network protocols, see, e.g., TCP's delayed acks.
I'm trying to build a simple orchestration engine in a functional test like the following:
object Engine {
def orchestrate(apiSequence : Seq[Any]) {
val execUnitList = getExecutionUnits(apiSequence) // build a specific list
schedule(execUnitList) // call multiple APIs
}
In the methods called underneath (getExecutionUnits, and schedule), the pattern I've applied is one where I incrementally build a list (hence, not a val but a var), iterate over the list and call sepcific APIs and run some custom validation on each one.
I'm aware that an object in scala is sort of equivalent to a singleton (so there's only one instance of Engine, in my case). I'm wondering if this is an appropriate pattern if I'm expecting 100's of invocations of the orchestrate method concurrently. I'm not managing any other internal variables within the Engine object and I'm simply acting on the provided arguments in the method. Assuming that the schedule method can take up to 10 seconds, I'm worried about the behavior when it comes to concurrent access. If client1, client2 and client3 call this method at the same time, will 2 of the clients get queued up and be blocked my the current client being processed?
Is there a safer idiomatic way to handle the use-case? Do you recommend using actors to wrap up the "orchestrate" method to handle concurrent requests?
Edit: To clarify, it is absolutely essential the the 2 methods (getExecutionUnits and schedule) and called in sequence. Moreover, the schedule method in turn calls multiple APIs (anywhere between 1 to 10) and it is important that they too get executed in sequence. As of right now I have a simply for loop that tackles 1 Api at a time, waits for the response, then moves onto the next one if appropriate.
I'm not managing any other internal variables within the Engine object and I'm simply acting on the provided arguments in the method.
If you are using any vars in Engine at all, this won't work. However, from your description it seems like you don't: you have a local var in getExecutionUnits method and (possibly) a local var in schedule which is initialized with the return value of getExecutionUnits. This case should be fine.
If client1, client2 and client3 call this method at the same time, will 2 of the clients get queued up and be blocked my the current client being processed?
No, if you don't add any synchronization (and if Engine itself has no state, you shouldn't).
Do you recommend using actors to wrap up the "orchestrate" method to handle concurrent requests?
If you wrap it in one actor, then the clients will be blocked waiting while the engine is handling one request.
I have 2 questions :
Q1) Can i implement an asynchronous timer in a single threaded application i.e i want a functionality like this.
....
Timer mytimer(5,timeOutHandler)
.... //this thread is doing some other task
...
and after 5 seconds, the timeOutHandler function is invoked.
As far as i can think this cannot be done for a single threaded application(correct me if i am wrong). I don't know if it can be done using select as the demultiplexer, but even if select could be used, the event loop would require one thread ? Isn't it ?
I also want to know whether i can implement a timer(not timeout) using select.
Select only waits on set of file descriptors, but i want to have a list of timers in ascending order of their expiry timeouts and want select to tell me when the first timer expires and so on. So the question boils down to can a asynchronous timer be implemented using select/poll or some other event demultiplexer ?
Q2) Now lets come to my second question. This is my main question.
Now i am using a dedicated thread for checking timeouts i.e i have a min heap of timers(expiry times) and this thread sleeps till the first timer expires and then invokes the callback.
i.e code looks something like this
lock the mutex
check the time of the first timer
condition timed wait for that time(and wake up if some other thread inserts a timer with expiry time less than the first timer) Condition wait unlocks the lock.
After the condition wait ends we have the lock. So unlock it, remove the timer from the heap and invoke the callback function.
go to 1
I want the time complexity of such asynchronous timer. From what i see
Insertion is lg(n)
Expiry is lg(n)
Cancellation
:( this is what makes me dizzy ) the problem is that i have a min heap of timers according to their times and when i insert a timer i get a unique id. So when i need to cancel the timer, i need to provide this timer id and searching for this timer id in the heap would take in the worst case O(n)
Am i wrong ?
Can cancellation be done in O(lg n)
Please do take care of some multithreading issues. I would elaborate on what i mean by my previous sentence once i get some responses.
It's definitely possible (and usually preferable) to implement timers using a single thread, if we can assume that the thread will be spending most of its time blocking in select().
You could check out using signal() and SIGALRM to implement the functionality under POSIX, but I'd recommend against it (Unix signals are ugly hacks, and when the signal callback function runs there is very little that you can do inside it safely, since it is running asynchronously to your app thread)
Your idea about using select()'s timeout to implement your timer functionality is a good one -- that is a very common technique and it works well. Basically you keep a list of pending/upcoming events that is sorted by timestamp, and just before you call select() you subtract the current time from the first timestamp in the list, and pass in that time-delta as the timeout value to select(). (note: if the time-delta is negative, pass in zero as the timeout value!) When select() returns, you compare the current time with the time of the first item in the list; if the current time is greater than or equal to the event time, handle the timer-event, pop the first item off the head of the list, and repeat.
As for efficiency, your big-O times will depend entirely on the data structure you use to store your ordered list of timers. If you use a priority queue (or a similar ordered tree type structure) you can have O(log N) times for all of your operations. You can even go further and store the events-list in both a hash table (keyed on the event IDs) and a linked list (sorted by time stamp), and that can give you O(1) times for all operations. O(log N) is probably sufficiently efficient though, unless you plan to have a really large number of events pending at once.
man pthread_cond_timedwait
man pthread_cond_signal
If you are a windows App, you can trigger a WM_TIMER message to be sent to you at some point in the future, which will work even if your app is single threaded. However, the accuracy of the timing will not be great.
If your app runs in a constant loop (like a game, rendering at 60Hz), you can simply check each time around the loop to see if triggered events need to be called.
If you want your app to basically be interrupted, your function to be called, then execution to return to where it was, then you may be out of luck.
If you're using C#, System.Timers.Timer will do what you want. You specify an event handler method that the timer calls when it expires, which can be in the class that you invoke the timer from. Note that when the timer calls the event handler, it will do it on a separate thread, which you need to take into account if you're updating the user interface, or use its SynchronizingObject property to run it on the UI thread.