Where is the state in Azure Durable Function?

Where is the state in Azure Durable Function? - azure

I'm building an Azure Durable Function App, it is triggered by Timer 1 time per day. For some reason, I want to persist state (e.g. A token, or an array) from previous run, is that possible?
A lot of official docs will start with Azure Durable Function is statefull, I only know that the output from one activity can be used as input of another activity. And Chris Gillum just skip that topic in youtube video

It depends a bit how you are going to need that state. If the object is an output from one activity function and passed into another then you can use the function chaining pattern such as Sajeetharan described because the inputs for the activity function are persisted by default.
But if you want to capture the state of an object over a longer time (i.e. the last time your timer function ran) you can look into Durable Entities (still in preview at the time of writing):
Entity functions define operations for reading and updating small
pieces of state, known as durable entities. Like orchestrator
functions, entity functions are functions with a special trigger type,
entity trigger. Unlike orchestrator functions, entity functions do not
have any specific code constraints. Entity functions also manage state
explicitly rather than implicitly representing state via control flow.
Chris Gillum wrote an article about it: https://medium.com/#cgillum/azure-functions-durable-entities-67db648d2f74
If you can't use this preview version or don't like concept you can still write activity functions that have input & output bindings to table- or blob storage or CosmosDB and manage the state yourself.

Related

What is the alternative to global variables in Azure Function Apps?

Lets say I want to have a TimerTrigger function app that executes every 10 seconds and prints an increasing count(1...2...3...),
how can I achieve this WITHOUT using environment variable?

You're already using an Azure Storage account for your function. Create a table within that storage account, and increment the counter there. This has the added benefit of persisting across function restarts.
Since you're using a TimerTrigger, it's implicit that there will only ever be one instance of the function running. If this were not the case, you could end up in a race condition with two or more instances interleaving to incorrectly increment your counter.

I suggest you look into Durable Functions. This is an extension for Azure Functions that allow state in your (orchestrator) functions.
In your case, you can have a single HTTP triggered starter function that starts a long running orchestrator function. The HTTP function passes the initial count value to the orchestrator function. You can use the Timer functionality of Durable Functions to have the orchestrator wait for the specified amount of time before continuing/restarting. After the timer expires, the count value is incremented and you can restart the orchestrator function with this new count value by calling the ContinueAsNew method.
This periodic work example is almost what you need I think. You still need to add the initial count to be read as the input, and increment it before the ContinueAsNew method is called.
If you need more details about Durable Functions, I have quite some videos that explain the concepts.

Azure durable entity or static variables?

Question: Is it thread-safe to use static variables (as a shared storage between orchestrations) or better to save/retrieve data to durable-entity?
There are couple of azure functions in the same namespace: hub-trigger, durable-entity, 2 orchestrations (main process and the one that monitors the whole process) and activity.
They all need some shared variables. In my case I need to know the number of main orchestration instances (start new or hold on). It's done in another orchestration (monitor)
I've tried both options and ask because I see different results.
Static variables: in my case there is a generic List, where SomeMyType holds the Id of the task, state, number of attempts, records it processed and other info.
When I need to start new orchestration and List.Add(), when I need to retrieve and modify it I use simple List.First(id_of_the_task). First() - I know for sure needed task is there.
With static variables I sometimes see that tasks become duplicated for some reason - I retrieve the task with List.First(id_of_the_task) - change something on result variable and that is it. Not a lot of code.
Durable-entity: the major difference is that I add List on a durable entity and each time I need to retrieve it I call for .CallEntityAsync("getTask") and .CallEntityAsync("saveTask") that might slow done the app.
With this approach more code and calls is required however it looks more stable, I don't see any duplicates.
Please, advice

Can't answer why you would see duplicates with the static variables approach without the code, may be because list is not thread safe and it may need ConcurrentBag but not sure. One issue with static variable is if the function app is not always on or if it can have multiple instances. Because when function unloads (or crashes) the state would be lost. Static variables are not shared across instances either so during high loads it wont work (if there can be many instances).
Durable entities seem better here. Yes they can be shared across many concurrent function instances and each entity can only execute one operation at a time so they are for sure a better option. The performance cost is a bit higher but they should not be slower than orchestrators since they perform a lot of common operations, writing to Table Storage, checking for events etc.
Can't say if its right for you but instead of List.First(id_of_the_task) you should just be able to access the orchestrators properties through the client which can hold custom data. Another idea depending on the usage is that you may be able to query the Table Storages directly with CloudTable class for the information about the running orchestrators.
Although not entirely related you can look at some settings for parallelism for durable functions Azure (Durable) Functions - Managing parallelism
Please ask any questions if I should clarify anything or if I misunderstood your question.

What is the best way to rehydrate aggregate roots and their associated entities in an event sourced environment

I have seen information on rehydrating aggregate roots in SO, but I am posting this question because I did not find any information in SO about doing so with in the context of an event sourced framework.
Has a best practice been discovered or developed for how to rehydrate aggregate roots when operating on the command side of an application using the event sourcing and CQRS pattern
OR is this still more of a “preference“ among architects?
I have read through a number of blogs and watched a number of conference presentations on you tube and I seem to get different guidance depending on who I am attending to.
On the one hand, I have found information stating fairly clearly that developers should create aggregates to hydrate themselves using “apply“ methods on events obtained directly from the event store..
On the other hand, I have also seen in several places where presenters and bloggers have recommended rehydrating aggregate roots by submitting a query to the read side of the application. Some have suggested creating specific validation “buckets“ / projections on the read side to facilitate this.
Can anyone help point me in the right direction on discovering if there is a single best practice or if the answer primarily depends upon performance issues or some other issue I am not thinking about?

Hydrating Aggregates in an event sourced framework is a well-understood problem.
On the one hand, I have found information stating fairly clearly that
developers should create aggregates to hydrate themselves using
“apply“ methods on events obtained directly from the event store..
This is the prescribed way of handling it. There are various ways of achieving this, but I would suggest keeping any persistence logic (reading or writing events) outside of your Aggregate. One simple way is to expose a constructor that accepts domain events and then applies those events.
On the other hand, I have also seen in several places where presenters
and bloggers have recommended rehydrating aggregate roots by
submitting a query to the read side of the application. Some have
suggested creating specific validation “buckets“ / projections on the
read side to facilitate this.
You can use the concept of snapshots as a way of optimizing your reads. This will create a memoized version of your hydrated Aggregate. You can load this snapshot and then only apply events that were generated since the snapshot was created. In this case, your Aggregate can define a constructor that takes two parameters: an existing state (snapshot) and any remaining domain events that can then be applied to that snapshot.
Snapshots are just an optimization and should be considered as such. You can create a system that does not use snapshots and apply them once read performance becomes a bottleneck.
On the other hand, I have also seen in several places where presenters
and bloggers have recommended rehydrating aggregate roots by
submitting a query to the read side of the application
Snapshots are not really part of the read side of the application. Data on the read side exists to satisfy use cases within the application. Those can change based on requirements even if the underlying domain does not change. As such, you shouldn't use read side data in your domain at all.

Event sourcing has developed different styles over the years. I could divide all o those into two big categories:
an event stream represents one entity (an aggregate in case of DDD)
one (partitioned) event stream for a (sub)system
When you deal with one stream per (sub)system, you aren't able to rehydrate the write-side on the fly, it is physically impossible due to the number of events in that stream. Therefore, you would rely on the projected read-side to retrieve the current entity state. As a consequence, this read-side must be fully consistent.
When going with the DDD-flavoured event sourcing, there's a strong consensus in the community how it should be done. The state of the aggregate (not just the root, but the whole aggregate) is restored by the command side before calling the domain model. You always restore using events. When snapshotting is enabled, snapshots are also stored as events in the aggregate snapshot stream, so you read the last one and all events from the snapshot version.
Concerning the Apply thing. You need to clearly separate the function that adds new events to the changes list (what you're going to save) and functions what mutate the aggregate state when events are applied.
The first function is the one called Apply and the second one is often called When. So you call the Apply function in your aggregate code to build up the changelist. The When function is called when restoring the aggregate state from events when you read the stream, and also from the Apply function.
You can find a simplistic example of an event-sourced aggregate in my book repo: https://github.com/alexeyzimarev/ddd-book/blob/master/chapter13/src/Marketplace.Ads.Domain/ClassifiedAds/ClassifiedAd.cs
For example:
public void Publish(UserId userId)
=> Apply(
new V1.ClassifiedAdPublished
{
Id = Id,
ApprovedBy = userId,
OwnerId = OwnerId,
PublishedAt = DateTimeOffset.Now
}
);
And for the When:
protected override void When(object #event)
{
switch (#event)
{
// more code here
case V1.ClassifiedAdPublished e:
ApprovedBy = UserId.FromGuid(e.ApprovedBy);
State = ClassifiedAdState.Active;
break;
// and more here
}
}

How to tell when the last message from an Azure Queue is removed

I'm exploring azure functions and queue triggers to implement a recursive solver.
The desired behavior is as follows:
There exists a queue called "solveme" which when it receives an item containing state, will create a second queue with a unique name (GUID) and send the initial state to that queue with some correlation information.
When a message is received into the newly created queue, it will possibly enqueue more work representing all the next possible states so that before the message is "completed", the new work will exist in the queue. If there are no valid next states for the given state, it will just complete the message. This means that the original message is not "completed" until all new work is enqueued. During this step, I'm writing off data to azure tables representing the valid solutions using the correlation information from step 1.
3. When all messages have been processed from the newly created queue, delete the queue.
I am not sure how to go about #3 without having a timer or something that checks the queue length until it is zero and deletes it. I'm trying to only use Azure Functions and Azure Storage - nothing else.
It would be nice if there was someway to do directly from a QueueTrigger or within a function called by a QueueTrigger.
Further, is there a better / standard way to trigger completion of this kind of work? Effectively, the empty queue is the trigger that all work is done, but if there is a better way, I'd like to know about it!

There's no reliable way to do that out of the box. If your consumer is faster than producer, the queue could stay on 0 for long time, while producer still keeps sending data.
I would suggest you to send a special "End of Sequence" message, so when consumer receives this message it knows to go and delete the queue.
P.S. Have a look at Durable Functions - it looks like you could use some of the orchestration there instead of implementing workflows yourself (just mind it's in early preview).

How should I model 'knowing' when an aggregate is finished being updated to trigger an event?

Aggregate B has calculations that need to be eventually consistent with aggregate A. Aggregate A can be mutated using eight methods and each method results in B needing to be updated. It seems an eventually consistent task, but the actual update time frame should be within seconds.
I don't want to rely on the application layer to 'remember' to trigger the update. (Jimmy Bogard says this as well.) What's the best way to model this?
Using a domain service with double dispatch is a pain:
The service will have to be a parameter on every method on A
Multiple mutation methods will usually be called in a row and I don't want to trigger an update in B each time a method is called.
Constructor injection is also a pain:
There are situations where A is not mutated, so being forced to instantiate and inject a domain service to watch for mutation that certainly won't happen feels wrong.
Again, multiple mutation methods will usually be called in a row and I don't want to trigger an update in B each time a method is called.
Domain events sound good but I'm not sure what that looks like. Each mutation method raises a domain event?
Again, multiple mutation methods will usually be called in a row and I don't want to trigger an update in B each time a method is called.
How do I model 'knowing' when A is finished being updated and knowing whether it has been updated so I can trigger B's update without relying on the application layer to call methods in a particular order each time?
Or is this really a repository-level or application-level concern, even though it seems to be a domain requirement?

Your number 3. is commonly used and a very straight-forward technique:
Raise a domain event AChangedType1, ..., AChangedTypeN on model A updates
Let a saga/process manager listen on AChangedTypeX and issue a corresponding UpdateBTypeX command.
It's loosely coupled (neither A nor B now about each other) and scales well (easy parallelization), and the relation between them is explicitly modeled in the long running process.
If you don't want to trigger an update to B on every change on A, then you can delay the update by some time before you send out the UpdateBTypeX command (as it is commonly done in network protocols, see, e.g., TCP's delayed acks.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string