Service Fabric Reliable Actors and I/O Operations - azure

On the Service Fabric Reliable Actors Introduction, documentation provided by Microsoft, it states that Actors should not, "block callers with unpredictable delays by issuing I/O operations."
I'm a bit unsure on how to interpret this.
Does this imply that I/O is ok so long as the latency of the request is predictable?
or
Does this imply that the best practice is that Actors should not make any I/O operations outside of Service Fabric? Like for example: to some REST API or to write to some sort of DB, data lake or event hub.

Technically, is a bit of both.
Because actors are Single Threaded, only a single operation can happen in the actor at same time.
The SF Actor uses the Ask approach, where every call expect an answer, the callers will make calls and keep waiting for answers, if the actor receives too many calls from clients, and this actor depends too much on external components, it will take too long to process each call and all other client calls will be enqueued and will probably fail at some point because they will wait for too long and timeout.
This wouldn't be a big issue for actors using the Tell approach, like Akka, because it does not wait for an answer, it just send the message to the mailbox and receive a message back with the answer(when applicable). but the latency between request and response will still be an issue because too many messages are pending to process by a single actor. On the other hand, could increase the complexity if one command fail, and there are 2 or 3 sequence of events that triggered before you know the answer for the first (not the scope here, but you relate this happening to the example below).
Regarding the second point, the main idea of an Actor is to be self contained, if it depends too much on external dependencies, maybe you should rethink the design and evaluate if the actor is actually the best design for the problem.
Self contained actors are scalable, they don't depend on external state manager to manage his own state, they won't depend on other actor to accomplish their tasks, they can scale independently of each other.
Example:
Actor1(of ActorTypeA) depends on Actor2 (of ActorTypeB) to execute an operation.
To make it more human friendly let's say:
ActorTypeA is an Ecommerce Checkout Cart
ActorTypeB is a Stock Management
Actor1 is the Cart for user 1
Actor2 is the Stock for product A
Whenever a client(user) interact with his checkout cart, adding or removing products, it will send add and remove commands to the Actor1 to manage his own cart. In this scenario the dependency is one to one, when another user navigate to the website, another actor will be created for him to manage his own cart. In both cases, they will have their own actors.
Let's now say, Whenever a product is placed in the cart, it will be reserved in stock to avoid double selling the same product.
In this case, both actor will try to reserve products in Actor2, and because of single threaded nature of Actors, only the first one will succeed and the second will wait the first to complete and fail if the product is not in stock anymore. Also, the second user, won't be able to add or remove any products to his cart, because the first operation is waiting to complete. Now increase these numbers for thousands and see how the problem evolves quickly and the scalability fails.
This is just a small and simple example, so, the second point is not just for External Dependencies, it also applies to internal ones, every operation outside the actor reduces the scalability of it.
Said that, you should avoid external (outside the actor) dependencies as much as possible , but is not a crime if it is needed, but you will reduce the scalability when an external dependency is limiting it to scale independently.
This other SO question I've answered might also be interesting to you.

Related

Learning DDD and CQRS

I'm new to DDD and CQRS and I'm planning to build a simple application to improve my skills a bit.
What I'm planning to do is a simple Taxi Corp application.
Requirements:
Client orders a taxi.
Client can have only one order at a time.
Driver picks an order.
Driver can have only one order at a time.
Driver goes to client.
Client enters cab.
Course starts.
Course finishes.
Client is purchased and driver is paid
And so on.
I can see there can be three aggregates: Client, Order and Driver. I want to split them into separate microservices. Do you think it's a good idea or I should start with one microservice?
I'm currently focused on the ordering a taxi. First of all I need to check if client doesn't already have a course assigned, later on I can create an order. After the order is created, I need to assign it to client. As during one request only one aggregate can be updated/created I wonder how to do it correctly. I've read something about Process Managers and I think it will be very useful in this case. I even draw a schema of communication. Can anyone tell me if my approach is correct and give me some tips on how to going further?
Process of creating an order
Do you think it's a good idea or I should start with one microservice?
I refer you to the wisdom of John Gall
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over, beginning with a working simple system.
Instead of worrying about microservices, give your attention to messages.
Someone said: "If you have more microservices than customers, you are doing it wrong".
And if you really follow CQRS/ES approach, resulting system is much easier to split apart than traditional ORM monolyths.
So focus on the domain first and start with monolyth.
start with the microservices design even in a wrong way, you get a better insight into desired architecture. because problems in microservices architecture design show themselves very soon.
client and driver are both users of systems and have some commonalities so you can consider them as one domain and one micro-service for them.
consider an order manager micro-service to assign client and driver to a trip by their ids. the order database may include trips table with two id keys for driver-Id and client-Id and some columns for the different states. after finishing each trip you can remove it from the trip table and insert that in an archive table. also, you can leave it there and partition your table daily to keep your database performance high.
consider an accounting micro-service for keeping payments and transactions. It's ok if you opt to use NoSql databases for other microservices, but do use SQL database for your transactions.
you may need another microservice for reporting and dashboards. mirror other dbs in a new one for reporting.
you also need an API gateway to route requests to micro-services or do authentication
your process is a set of events. definitely, you will expand the system later on and perhaps will have some long-running tasks, better to have a message broker and implement your flow as an event/task flow using patterns like event sourcing.
I can see there can be three aggregates: Client, Order and Driver. I
want to split them into separate microservices. Do you think it's a
good idea or I should start with one microservice?
They all belong to the same bounded context. Bounded context translates nicely to microservices (see Eric Evans video: https://www.infoq.com/news/2015/06/dddx-microservices-boundaries). But don't start by designing a micro service, you are doing it in the wrong order. Design first your bounded context then if it makes sense create a micro service around the hexagonal architecture.
After the order is created, I need to assign it to client. As during
one request only one aggregate can be updated/created I wonder how to
do it correctly.
This is the perfect example of why you need to do it all in the same process.
But in the case you want to go multiple micro services, think of eventual consistency (https://en.wikipedia.org/wiki/Eventual_consistency) and create a message driven architecture between your services. Might be too much work in my opinion but for learning purpose can be a good idea.

DDD, CQRS/ES & MicroServices Should Decisions be taken on Microservice's views or aggregates?

So I'll explain the problem through the use of an example as it makes everything more concrete and hopefully will reduce ambiguity.
The Architecture is pretty simple
1 MicroService <=> 1 Aggregate <=> Transactional Boundry
Each microservice will be using CQRS/ES design pattern which implies
Each microservice will have its own Aggregate mapping the domain of a real-world problem
The state of the aggregate will be rebuilt from an event store
Each event will signify a state change within the aggregate and will be transmitted to any service interested in the change via a message broker
Each microservice will be transactional within its own domain
Each microservice will be eventually consistent with other domains
Each microservice will build there own view models, from events being emitted by other microservices
So the example lets say we have a banking system
current-account microservice is responsible for mapping the Customer Current Account ... Withdrawal, Deposits
rewards microservice will be responsible for inventory and stock take of any rewards being served by the bank
air-miles microservice will be responsible for monitoring all the transaction coming from the current-account and in doing so award the Customer with rewards, from our reward micro-service
So the problem is this Should the air-miles microservice take decisions based on its own view model which is being updated from events coming from the current-account, and similarly, on picking which reward it should give out to the Customer?
Drawbacks of taking decisions on local view models;
Replicating domain logic on how to maintain these views
Bugs within the view might propagate the wrong rewards to be given out
State changes (aka events emitted) on corrupted view models could have consequences in other services which are taking their own decisions on these events
Advantages of taking a decision on local view models;
The system doesn't need to constantly query the microservice owning the domain
The system should be faster and less resource intense
Or should it use the events coming from the service to trigger queries to the Aggregate owning the Domain, in doing so we accept the fact that view models might get corrupt but the final decision should always be consulted with the aggregate owning the domain?
Please, not that the above problem is simply my understanding of the architecture, and the aim of this post is to get different views on how one might use this architecture effectively in a microservice environment to keep each service decoupled yet avoid cascading corruption scenario without to much chatter between the service.
So the problem is this Should the air-miles microservice take decisions based on its own view model which is being updated from events coming from the current-account, and similarly, on picking which reward it should give out to the Customer?
Yes. In fact, you should revise your architecture and even create more microservices. What I mean is that, being a event-driven architecture (also an Event-sourced one), your microservices have two responsibilities: they need to keep two different models: the write model and the read model.
So, for each Aggregate should be a microservice that keeps only the write model, that is, it only processes Commands, without building also a read model.
Then, for each read/query use case you should have a microservice that build the perfect read model. This is required if you need to keep the Aggregate microservice clean (as you should) because in general, the read models needs data from multiple Aggregate types/bounded contexts. Read models may cross bounded context boundaries, Aggregates may not. So you see, you don't really have a choice if you need to fully respect DDD.
Some says that domain events should be hidden, only local to the owning microservice. I disagree. In an event-driven architecture the domain events are first class citizens, they are allowed to reach other microservices. This gives the other microservices the chance to build their own interpretation of the system state. Otherwise, the emitting microservice would have the impossible additional responsibility/task of building a state that must match every possible need that all the microservices would ever want(!); i.e. maybe a microservices would want to lookup a deleted remote entity's title, how could it do that if the emitting microservice keeps only the list of non-deleted-yet entities? You may say: but then it will keep all the entities, deleted or not. But maybe someone needs the date that an entity was deleted; you may say: but then I keep also the deletedDate. You see what you do? You break the Open/closed principle. Every time you create a microservice you need to modify the emitting microservice.
There is also the resilience of the microservices. In the Art of scalability, the authors speak about swimming lanes. They are a strategy to separate the components of a system into lanes of failures. A failure in a lane does not propagate to other lanes. Our microservices are lanes. Components in a lane are not allowed to access any component from other lane. One down microservice should not bring the others down. It's not a matter of speed/optimisation, it's a matter of resilience. The domain events are the perfect modality of keeping two remote systems synchronized. They also emphasize the fact that the data is eventually consistent; the events travel at a limited speed (from nanoseconds to even days). When a system is designed with that in mind then no other microservice can bring it down.
Yes, there will be some code duplication. And yes, although I said that you don't have a choice, you have. In order to reduce the code duplication at the cost of a lower resilience, you can have some Canonical read models that build a normal flat state and other microservices could query that. This is dangerous in most cases as it breaks the swimming lanes concept. Should the Canonical microservices go down, go down all dependent microservices. Canonical microservices works best for CRUD-like bounded context.
There are however valid cases when you may have some internal events that you don't want to expose. In other words, you are not required to publish all domain events.
So the problem is this Should the air-miles micro service take decisions based on its own view model which is being updated from events coming from the current-account, and similarly, on picking which reward it should give out to the Customer?
Each consumer uses a local replica of a representation computed by the producer.
So if air-miles needs information from current-account it should be looking at a local replica of a view calculated by the current-account service.
The key idea is this: micro services are supposed to be isolated from one another; you should be able to redesign and deploy one without impacting the others.
So try this thought experiment - suppose we had these three micro services, but all saving snapshots of current state, rather than events. Everything works, then imagine that the current-account maintainer discovers that an event sourced implementation would better serve the business.
Should the change to the current-account require a matching change in the air-miles service? If so, can we really claim that these services are isolated from one another?
Advantages of taking a decision on local view models
I don't particularly like these "advantages"; first, they are dominated by the performance axis (please recall that the second rule of performance optimization is "not yet"). And second, that they assume that the service boundaries are correctly drawn; maybe the performance issue is evidence that the separation of responsibilities needs review.

DDD - Using a Process Manager or a Domain Service

I am new with DDD and I am implementing it on a part of my application because some of the requirements of the application lead me to CQRS with Event Sourcing (need of historic of events that occured in the system plus need to be able to view the state of the system in the past).
One question I have after reading Vaughn Vernon book and its Effective Aggregate Design series is what is the difference between Process Manager (Long Running Porcess) and Domain Service. Especially when you have navigation properties towards an Aggregate into another Aggregate
I'll explain what I have understood :
- Domain services are made to hold logic that does not belong into any Aggregate. According to Vaughn it can be used as well to pass entities reference to the aggregate that contains it. It maybe used also to manage transactions as they cannot be handle into a Domain Object
- Process manager are made to orchestrate modifications that are made on a system and spans on different aggregates. Some people are saying that a good Process Manager is actually an Aggregate Root. From my understanding it does not manage transactions as events are launched after changes are committed. It uses the approach of eventual consistency. Eventually all the changes will have occured
Now, to put everything in context. The core of the application I am building is to handle a tree of Nodes that contains their own logic. We need to be able to add Nodes to the Tree and of course to create those Nodes.
We need to be able to know what happened to those Node. ie we need to be able to retrieve the event linked to a node
Also a modification that is done to one of the leaves (depending of the kind of modification) will have to be replicated to the other Nodes that are a parent of this node.
What are my aggregates :
- Nodes, it is what my tree contains. In my opinion this is an aggregate for several reasons. It is not an invariant, therefore not a value object. They have their own domain logic that allows them to assign it's properties it's value objects and we need to be able to access them using Ids
- A representation of a non binary Tree made of Nodes. Right now, I actually designed this as my aggregate Root and it is actually a Process Manager. The tree contains a logic representation of this tree. It contains the root of the tree. This root is actually an Object (I am not sure it can be named a Value Object because it contains reference towards other aggregates, Child Nodes, but it certainly sounds like it is). The Node Object in the Tree contains basic information like the Node Name, and the reference towards the actual Aggregate (this almost sounds like two bounded context ?)
Using that approach this is what is happening :
- After executing the command to create a Node, a Node is created and committed. The NodeCreated Event is launched, caught by the correct Handler that retrieves the Tree (process manager) associated to this node and add the node at the correct place (using the parent id property of the Node)
- After executing the command to modify a Node, the node is modified and committed. The NodeModified Event is launched, caught by the handler. The Handler then retrieves the Tree (my process manager) and find all the Parent Node of the modified Node and ask for those Node to modify their own properties based on what was modified on the Child Node. This all makes perfect sense, and looks almost beautiful to me, showing the power of events and the seperation of domain logic
But, My principal issue here is with the transaction. What happens if an error happens while updating the Tree and the node that has to be modified or added? The event for the Node is already saved in the Event Store, because it was committed. So i would have to create a new Event to revert the modifications ? I know that commands have to be valid when entering the system so it would not be a validation issue, and chances that something happening are like 1 in a million. Does that mean we should not take that possibility in account ?
The transaction issue is why I feel like I should use a Service. Either a Application Service (here a command handler) or a domain Service to orchestrate the mofications and do them in a Single Transaction. If something fails during this transaction, nothing is created/modified but that breaks the rule of DDD saying that I should not modify several Aggregates in the same Transaction. This somehow looks a less elegant solution
I really feel like I am missing something here but I am not quite sure what it is.
Some people are saying that a good Process Manager is actually an Aggregate Root
From my point of view this is not correct. A Process manager or a Saga coordinates a long running Business process that spans multiple Aggregate instances. It brings the system eventually in a valid final state. It does not emit events but respond to events and creates Commands that arrive to the Aggregates (possibly through a Command handler, depending on your exact architecture). Those architects that say that have failed to correctly identify they Aggregate boundaries.
A Process manager/Saga could be stateful - but just to remember the progress that it has made; it can have a Process ID; it can even be Event-sourced.
Process manager are made to orchestrate modifications that are made on a system and spans on different aggregates.
Yes, this is correct.
After executing the command to modify a Node, the node is modified and committed.
When you design your Aggregates you must take into consideration only the protection of invariants, of the business rules that exists on the write/command side of the architecture; this is the side that produce the state transition, the emitting of the events in case of Event-driven architectures.
The single business rule, if any, that I have identified on your specific case is that when a node is created (seems like a CRUD operation!) the NodeCreated Event is emitted; similar to NodeModified. So, these operations exist on the write/command side.
The NodeModified Event is launched, caught by the handler. The Handler then retrieves the Tree (my process manager) and find all the Parent Node of the modified Node and ask for those Node to modify their own properties based on what was modified on the Child Node
Are there any business rules for the write side regarding the updating of the Parents nodes? I don't see any. Of course, something is updated after a Node is created but it is not an Aggregate but a Read model. Your Handler that is called is in fact a Read model. It projects the NodeXXX events on a Tree of Nodes.
I really feel like I am missing something here but I am not quite sure what it is.
You may have over complicated your domain model.
Domain Services are typically service providers that give the domain model access to (cached) state or capabilities it wouldn't normally have. For instance, we might use a domain service to give the model access to a cached tax table, so that it can compute the tax on an order; or we might use a domain service to give the model access to a notifyCustomer capability that is delegated to the email infrastructure.
Process Managers are usually used for orchestration - they are basically state machines that look at what has happened (events) and suggest additional commands to run. See Rinat Abdullin's description.
What happens if an error happens while updating the Tree and the node that has to be modified or added? The event for the Node is already saved in the Event Store, because it was committed. So i would have to create a new Event to revert the modifications ?
Maybe - compensating events are a common pattern.
The main point is this: there is no magic in orchestrating a change across multiple transactions. Think about how you would arrange a UI that displays to a human operator what's going on, what should happen next, and what the failure modes would be.
chances that something happening are like 1 in a million. Does that mean we should not take that possibility in account ?
Depends on the risk to the business. But as Greg Young points out in his talk Stop Over Engineering, if you can just escalate that 1 in a million problem to the human beings to sort out, you may have done enough.

How many read operations can be handled by Reliable Actor with no problems?

Aim: Pretend, I have a very popular page (let's say 1 million people per 5 minute) on my Azure Service Fabric based web application. I want to make some kind of cache layer between a data layer and frontend API layer.
Solution: For this purpose, I choose a Reliable Actor performing only one method for readonly operation: GetFrequentlyAskedPage(). This Actor has a volatile type and 5 minutes timeout to be replaced with Garbage Collector.
Questions:
How many read-operations can be handled by the Actor before it lay down?
Should I use in this case "read from secondary replicas" option for that Actor?
Or maybe I am totally wrong in my reasoning and should change the way of implementation.
I would not recommend using actors as a cache. Actor instances force single-threaded turn-based access, meaning an actor instance can only service one request at a time. This obviously will not perform well as a cache. See here for more info: https://azure.microsoft.com/en-us/documentation/articles/service-fabric-reliable-actors-introduction/
Instead I would recommend using a stateful Reliable Service with a Reliable Dictionary to cache data, or better yet, use a stateful Reliable Service as your data layer, in which case you don't need this cache at all.

Why can't sagas query the read side?

In a CQRS Domain Driven Design system, the FAQ says that a saga should not query the read side (http://cqrs.nu). However, a saga listens to events in order to execute commands, and because it executes commands, it is essentially a "client", so why can't a saga query the read models?
Sagas should not query the read side (projections) for information it needs to fulfill its task. The reason is that you cannot be sure that the read side is up to date. In an eventual consistent system, you do not know when the projection will be updated so you cannot rely on its state.
That does not mean that sagas should not hold state. Sagas do in many cases need to keep track of state, but then the saga should be responsible of creating that state. As I see it, this can be done in two ways.
It can build up its state by reading the events from the event store. When it receives an event that it should trigger on it will read all events it needs from the store and build up its state in a similar manner that an aggregates does. This can be made performant in Event Store by creating new streams.
The other way is that it continuously listens to events from the event store and build up state and stores it on some data storage like projections do. Just be careful with this approach. You cannot reply sagas in the same way as you do with projections. If you need to change the way you store state and want to rebuild it, make sure that you do not execute the commands that you have already executed.
Sagas use the command model to update the state of the system. The command model contains business rules and is able to ensure that changes are valid within a given domain. To do that, the command model has all the information available that it needs.
The read model, on the other hand, has an entirely different purpose: It structures data so that it is suitable to provide information, e.g. to display on a web page.
Since the saga has all the information it needs through the command model, so it doesn't need the read model. Worse, using the read model from a saga would introduce additional coupling and increase the overall complexity of the system considerably.
This does not mean that you absolutely cannot use the read model. But if you do, be sure you understand the consequences. For me, that bar is quite high, and I have always found a different solution yet.
It's primarily about separation of concerns. Process managers (sagas) are state machines responsible for coordinating activities. If the process manager want to affect change, it dispatches commands (asynchronous).
Also: what is the read model? It's a projection of a bunch of events that already happened. So if the processor cared about those events... shouldn't it have been subscribing to them all along? So there's a modeling smell here.
Possible issues:
The process manager should have been listening to earlier messages in the stream, so that it would be in the right state when this message arrived.
The current event should be richer (so that the data the process manager "needs" is already present).
... variation - the command handler should instead be listening for a different event, and THAT one should be richer.
The query that you want should really be a command to an aggregate that already knows the answer
and failing all else
Send a command to a service, which runs the query and dispatches events in response. This sounds weird, but it's already common practice to have a process manager dispatch a message to a scheduling service, to be "woken up" when some fixed amount of time passes.

Resources