Actor model: how to reach data integrity in this specific situation? - azure

I'm new to actor model and Orleans, so any suggestions on good practices to solve the following task is very appreciated:
we have [service1] that runs some logic and store some results in relational database (legacy thing). Now somewhere in the middle we want to call Orleans actor [Actor1] which holds a list of numbers, to get the next available number. The goal of the [Actor1] is to feed the numbers sequentially and consistently, so no skip over, no duplication is allowed, so it's sort of single-threaded stack. Single-threaded not only per process, but throughout the cluster of services, exactly what we need.
[service1] -> [Actor1]
Now the only problem I see here is that [service1] can fail with exception after it takes the next number, but before it stores results in database. Number is taken from the single-threaded stack, but it's lost as calling application did not manage to store results based on the fed number in database. In other words, I do not want the actor to feed next number, unless it ensures the last fed one is in good use, and only calling application knows if it is.
How would you suggest to handle these situations? Can I somehow keep Orleans actor's job open unless calling service (or another actor) commits it to database?

This is a Byzantine problem, so there is no easy solution: there will be "holes" in the number sequence or you will use the same number twice.
I would prefer to get holes and fill them with dummy data later if this is necessary (eg. if this is a billing system, end of month enter a cancelled empty bill for each bill number that is a "hole").
Even in SQL, an Insert and a Rollback will let the sequence incremented in an auto-increment primary-key ID column, so there can be holes after a failure.

Related

Do I need FIFO SQS for jira like board view app

Currently I am running a jira like board-stage-card management app on AWS ECS with 8 tasks. When a card is moved from one column/stage to another, I look for the current stage object for that card remove card from that stage and add card to the destination stage object. This is working so far because I am always looking for the actual card's stage in the Postgres database not base on what frontend think that card belongs to.
Question:
Is it safe to say that even when multiple users move the same card to different stages, but query would still happen one after the other and data will not corrupt? (such as duplicates)
If there is still a chance data can be corrupted. Is it a good option to use SQS FIFO to send message to a lambda and handle each card movement in sequence ?
Any other reason I should use SQS in this case ? or is SQS not applicable at all here?
The most important question here is: what do you want to happen?
Looking at the state of a card in the database, and acting on that is only "wrong" if it doesn't implement the behavior you want. It's true that if the UI can get out of sync with the database, then users might not always get the result they were expecting - but that's all.
Consider likelihood and consequences:
How likely is it that two or more people will update the same card, at the same time, to different stages?
And what is the consequence if they do?
If the board is being used by a 20 person project team, then I'd say the chances were 'low/medium', and if they are paying attention to the board they'll see the unexpected change and have a discussion - because clearly they disagree (or someone moved it to the wrong stage by accident).
So in that situation, I don't think you have a massive problem - as long as the system behavior is what you want (see my further responses below). On the other hand, if your board solution is being used to help operate a nuclear missile launch control system then I don't think your system is safe enough :)
Is it safe to say that even when multiple users move the same card to
different stages, but query would still happen one after the other and
data will not corrupt? (such as duplicates)
Yes the query will still happen - on the assumption:
That the database query looks up the card based on some stable identifier (e.g. CardID), and
that having successfully retrieved the card, your logic moves it to whatever destination stage is specified - implying there's no rules or state machine that might prohibit certain specific state transitions (e.g. moving from stage 1 to 2 is ok, but moving from stage 2 to 1 is not).
Regarding your second question:
If there is still a chance data can be corrupted.
It depends on what you mean by 'corruption'. Data corruption is when unintended changes occur in data, and which usually make it unusable (un-processable, un-readable, etc) or useless (processable but incorrect). In your case it's more likely that your system would work properly, and that the data would not be corrupted (it remains processable, and the resulting state of the data is exactly what the system intended it to be), but simply that the results the users see might not be what they were expecting.
Is it a good option
to use SQS FIFO to send message to a lambda and handle each card
movement in sequence ?
A FIFO queue would only ensure that requests were processed in the order in which they were received by the queue. Whether or not this is "good" depends on the most important question (first sentence of this answer).
Assuming the assumptions I provided above are correct: there is no state machine logic being enforced, and the card is found and processed via its ID, then all that will happen is that the last request will be the final state. E.g.:
Card State: Card.CardID = 001; Stage = 1.
3 requests then get lodged into the FIFO queue in this order:
User A - Move CardID 001 to Stage 2.
User B - Move CardID 001 to Stage 4.
User C - Move CardID 001 to Stage 3.
Resulting Card State: Card.CardID = 001; Stage = 3.
That's "good" if you want the most recent request to be the result.
Any other reason I should use SQS in this case ? or is SQS not
applicable at all here?
The only thing I can think of is that you would be able to store a "history", that way users could see all the recent changes to a card. This would do two things:
Prove that the system processed the requests correctly (according to what it was told to do, and it's logic).
Allow users to see who did what, and discuss.
To implement that, you just need to record all relevant changes to the card, in the right order. The thing is, the database can probably do that on it's own, so use of SQS is still debatable, all the queue will do is maybe help avoid deadlocks.
Update - RE Duplicate Cards
You'd have to check the documentation for SQS to see if it can evaluate queue items and remove duplicates.
Assuming it doesn't, you'll have to build something to handle that separately. All I can think of right now is to check for duplicates before adding them to the queue - because once that are there it's probably too late.
One idea:
Establish a component in your code which acts as the proxy/façade for the queue.
Make it smart in that it knows about recent card actions ("recent" is whatever you think it needs to be).
A new card action comes it, it does a quick check to see if it has any other "recent" duplicate card actions, and if yes, decides what to do.
One approach would be a very simple in-memory collection, and cycle out old items as fast as you dare to. "Recent", in terms of the lifetime of items in this collection, doesn't have to be the same as how long it takes for items to get through the queue - it just needs to be long enough to satisfy yourself there's no obvious duplicate.
I can see such a set-up working, but potentially being quite problematic - so if you do it, keep it as simple as possible. ("Simple" meaning: functionally as narrowly-focused as possible).
Sizing will be a consideration - how many items are you processing a minute?
Operational considerations - if it's in-memory it'll be easy to lose (service restarts or whatever), so design the overall system in such a way that if that part goes down, or the list is flushed, items still get added to the queue and things keep working regardless.
While you are right that a Fifo Queue would be best here, I think your design isn't ideal or even workable in some situation.
Let's say user 1 has an application state where the card is in stage 1 and he moves it to stage 2. An SQS message will indicate "move the card from stage 1 to stage 2". User 2 has the same initial state where card 1 is in stage 1. User 2 wants to move the card to stage 3, so an SQS message will contain the instruction "move the card from stage 1 to stage 3". But this won't work since you can't find the card in stage 1 anymore!
In this use case, I think a classic API design is best where an API call is made to request the move. In the above case, your API should error out indicating that the card is no longer in the state the user expected it to be in. The application can then reload the current state for that card and allow the user to try again.

Why do we need total order across view changes in consensus protocols?

In their famous article, Miguel Castro and Barbara Liskov justify the commit phase of the PBFT consensus protocol like this:
This ensures that replicas agree on a total order for requests in the
same view but it is not sufficient to ensure a total order for
requests across view changes. Replicas may collect prepared
certificates in different views with the same sequence number and
different requests. The commit phase solves this problem as follows.
Each replica i multicasts <COMMIT, v, n, i>_{α_i} saying it has the
prepared certificate and adds this message to its log. Then each
replica collects messages until it has a quorum certificate with 2 f +
1 COMMIT messages for the same sequence number n and view v from
different replicas (including itself). We call this certificate the
committed certificate and say that the request is committed by the
replica when it has both the prepared and committed certificates.
But why exactly do we need to guarantee total order across view changes?
If a leader/primary replica fails and triggers a view change, wouldn't it suffice to discard everything from the previous view? What situation does the commit phase prevent that this solution does not?
Apologies if this is too obvious. I'm new to distributed systems and I haven't found any source which directly answers this question.
There is a conceptual reason for this. The system appears to a client as a black box. The whole idea of this box is to provide reliable access to some service, thus, it should mask the failures of a particular replica. Otherwise, if you discard everything at each view change, clients will constantly lose their data. So basically, your solution simply contradicts the specification. The commit phase is needed exactly to prevent such kind of situations. If the request is "accepted" only when there are 2f + 1 COMMIT messages, then, even if all f replicas are faulty, the remaining nodes can recover all committed requests, this provides durable access to the system.
There is also a technical reason. In theory the system is asynchronous, this means that you can't even guarantee that the view change will occur only as a result of a failure. Some replicas may only suspect that the leader is faulty and change the view. With your solution it is possible that the system discards everything it is accepted even if non of replicas is faulty.
If you're new to distributed systems I suggest you to have a look at the classic protocols tolerating non-Byzantine failures (e.g., Paxos), they are simpler but solves the problems in the similar way.
Edit
When I say "clients constantly lose their data" it is a bit more than it sounds. I'm talking about the impact of a particular client request to the system. Let's take a key-value store. A clinet A associates some value to some key via our "black box". The "black box" now orders this request with respect to any other concurrent (or simply parallel) requests. It then replicates it across all replicas and finally notifies A. Without commit phase there is no ordering and at two different views our "black box" can chose two different order of execution of client requests. That being said, the following is possible:
at a time t, A associates value to key and the "box" approves this,
at the time t+1, B associates value_2 to key and the "box" approves this,
at the time t+2, C reads value_2 from key,
view change (invisible to clients),
at the time t+3, D reads value from key.
Note that (5) is possible not because the "box" is not aware of value_2 (as you mentioned the value itself can be resubmitted) but because it is not aware that previously it first wrote value and then overwrote it with value_2. At the new view, the system needs somehow order those two requests but no luck, the decision is not coherent with the past.
The eventual synchrony is a way to guarantee liveness of the protocols, however, it cannot prevent the situations described above. Eventual synchrony states that eventually your system will behave much like the synchronous one, but you don't know when, before that time any kind of weird things can happen. If during the asynchronous period a safety property is violated, then obviously the whole system is not safe.
The output of PBFT should not be one log per view, but rather an ever-growing global log to which every view tries to contribute new 'blocks'.
The equivalent notion in a blockchain is that each block proposer, or block miner, must append to the current blockchain, instead of starting its new blockchain from scratch. I.e. new blocks must respect previous transactions, the same way new views must respect previous views.
If the total ordering is not consistent across views, then we lose the property above.
In fact if we force a view change after every sequence number in PBFT, it looks a lot like blockchain, but with a much more complicated recovery/safety mechanism (in part since PBFT blocks don't commit to the previous block, so we need to agree on each of them individually)

How to process Read Model in CQRS

We want to implement cqrs in our new design. We have some doubts in processing command handler and read model. We got understand that while processing commands we should take optimistic lock on aggregateId. But what approach should be considered while processing readModels. Should we take lock on entire readModel or on aggregateId or never take lock while processing read model.
case 1. when take lock on entire readmodel -> it is safest but is not good in term of speed.
case 2 - take lock on aggregateId. Here two issues may arise. if we take lock aggregateId wise -> then what if read model server restarts. It does not know from where it starts again.
case 3 - Never take lock. in ths approach, I think data may be in corrputed state. For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Hope I am able to address my issue.
If you do not process events concurrently in the Readmodel then there is no need for a lock. This is the case when you have a single instance of the Readmodel, possible in a Microservice, that poll for events and process them sequentially.
If you have a synchronous Readmodel (i.e. in the same process as the Writemodel/Aggregate) then most probably you will need locking.
An important thing to keep in mind is that a Readmodel most probably differs from the Writemodel. There could be a lot of Writemodel types whos events are projected in the same Readmodel. For example, in an ecommerce shop you could have a ListOfProducts that projects event from Vendor and from Product Aggregates. This means that, when we speak about a Readmodel we cannot simply refer to the "Aggregate" because there is not single Aggregate involved. In the case of ecommerce, when we say "the Aggregate" we might refer to the Product Aggregate or Vendor Aggregate.
But what to lock? Here depends on the database technology. You should lock the smallest affected read entity or collection that can be locked. In a Readmodel that consist of a list of products (read entities, not aggregates!), when an event that affects only one product you should lock only that product (i.e. ProductTitleRenamed).
If an event affects more products then you should lock the entire collection. For example, VendorWasBlocked affects all the products (it should remove all the products from that vendor).
You need the locking for the events that have non-idempotent side effects, for the case where the Readmodel's updater fails during the processing of an event, if you want to retry/resume from where it left. If the event has idempotent side effects then it can be retried safely.
In order to know from where to resume in case of a failed Readmodel, you could store inside the Readmodel the sequence of the last processed event. In this case, if the entity update succeeds then the last processed event's sequence is also saved. If it fails then you know that the event was not processed.
For eg say an order inserted event is generated and thorugh some workflow/saga, order updated event took place as well. what if order updated event comes first and order inserted event is not yet processed ?
Read models are usually easier to reason about if you think about them polling for ordered sequences of events, rather than reacting to unordered notifications.
A single read model might depend on events from more than one aggregate, so aggregate locking is unlikely to be your most general answer.
That also means, if we are polling, that we need to keep track of the position of multiple streams of data. In other words, our read model probably includes meta data that tells us what version of each source was used.
The locking is likely to depend on the nature of your backing store / cache. But an optimistic approach
read the current representation
compute the new representation
compare and swap
is, again, usually easy to reason about.

EventSourcing race condition

Here is the nice article which describes what is ES and how to deal with it.
Everything is fine there, but one image is bothering me. Here it is
I understand that in distributed event-based systems we are able to achieve eventual consistency only. Anyway ... How do we ensure that we don't book more seats than available? This is especially a problem if there are many concurrent requests.
It may happen that n aggregates are populated with the same amount of reserved seats, and all of these aggregate instances allow reservations.
I understand that in distributes event-based systems we are able to achieve eventual consistency only, anyway ... How to do not allow to book more seats than we have? Especially in terms of many concurrent requests?
All events are private to the command running them until the book of record acknowledges a successful write. So we don't share the events at all, and we don't report back to the caller, without knowing that our version of "what happened next" was accepted by the book of record.
The write of events is analogous to a compare-and-swap of the tail pointer in the aggregate history. If another command has changed the tail pointer while we were running, our swap fails, and we have to mitigate/retry/fail.
In practice, this is usually implemented by having the write command to the book of record include an expected position for the write. (Example: ES-ExpectedVersion in GES).
The book of record is expected to reject the write if the expected position is in the wrong place. Think of the position as a unique key in a table in a RDBMS, and you have the right idea.
This means, effectively, that the writes to the event stream are actually consistent -- the book of record only permits the write if the position you write to is correct, which means that the position hasn't changed since the copy of the history you loaded was written.
It's typical for commands to read event streams directly from the book of record, rather than the eventually consistent read models.
It may happen that n-AggregateRoots will be populated with the same amount of reserved seats, it means having validation in the reserve method won't help, though. Then n-AggregateRoots will emit the event of successful reservation.
Every bit of state needs to be supervised by a single aggregate root. You can have n different copies of that root running, all competing to write to the same history, but the compare and swap operation will only permit one winner, which ensures that "the" aggregate has a single internally consistent history.
There are going to be a couple of ways to deal with such a scenario.
First off, an event stream would have the current version as the version of the last event added. This means that when you would not, or should not, be able to persist the event stream if the event stream is not at the version when loaded. Since the very first write would cause the version of the event stream to be increased, the second write would not be permitted. Since events are not emitted, per se, but rather a result of the event sourcing we would not have the type of race condition in your example.
Well, if your commands are processed behind a queue any failures should be retried. Should it not be possible to process the request you would enter the normal "I'm sorry, Dave. I'm afraid I can't do that" scenario by letting the user know that they should try something else.
Another option is to start the processing by issuing an update against some table row to serialize any calls to the aggregate. Probably not the most elegant but it does cause a system-wide block on the processing.
I guess, to a large extent, one cannot really trust the read store when it comes to transactional processing.
Hope that helps :)

How to deal with Command which is depend on existing records in application using CQRS and Event sourcing

We are using CQRS with EventSourcing.
In our application we can add resources(it is business term for a single item) from ui and we are sending command accordingly to add resources.
So we have x number of resources present in application which were added previously.
Now, we have one special type of resource(I am calling it as SpecialResource).
When we add this SpecialResource , id needs to be linked with all existing resources in application.
Linked means this SpecialResource should have List of ids(guids) (List)of existing resources.
The solution which we tried to get all resource ids in applcation before adding the special
resource(i.e before firing the AddSpecialResource command).
Assign these List to SpecialResource, Then send AddSpecialResource command.
But we are not suppose to do so , because as per cqrs command should not query.
I.e. command cant depend upon query as query can have stale records.
How can we achieve this business scenario without querying existing records in application?
But we are not suppose to do so , because as per cqrs command should not query. I.e. command cant depend upon query as query can have stale records.
This isn't quite right.
"Commands" run queries all the time. If you are using event sourcing, in most cases your commands are queries -- "if this command were permitted, what events would be generated?"
The difference between this, and the situation you described, is the aggregate boundary, which in an event sourced domain is a fancy name for the event stream. An aggregate is allowed to run a query against its own event stream (which is to say, its own state) when processing a command. It's the other aggregates (event streams) that are out of bounds.
In practical terms, this means that if SpecialResource really does need to be transactionally consistent with the other resource ids, then all of that data needs to be part of the same aggregate, and therefore part of the same event stream, and everything from that point is pretty straight forward.
So if you have been modeling the resources with separate streams up to this point, and now you need SpecialResource to work as you have described, then you have a fairly significant change to your domain model to do.
The good news: that's probably not your real requirement. Consider what you have described so far - if resourceId:99652 is created one millisecond before SpecialResource, then it should be included in the state of SpecialResource, but if it is created one millisecond after, then it shouldn't. So what's the cost to the business if the resource created one millisecond before the SpecialResource is missed?
Because, a priori, that doesn't sound like something that should be too expensive.
More commonly, the real requirement looks something more like "SpecialResource needs to include all of the resource ids created prior to close of business", but you don't actually need SpecialResource until 5 minutes after close of business. In other words, you've got an SLA here, and you can use that SLA to better inform your command.
How can we achieve this business scenario without querying existing records in application?
Turn it around; run the query, copy the results of the query (the resource ids) into the command that creates SpecialResource, then dispatch the command to be passed to your domain model. The CreateSpecialResource command includes within it the correct list of resource ids, so the aggregate doesn't need to worry about how to discover that information.
It is hard to tell what your database is capable of, but the most consistent way of adding a "snapshot" is at the database layer, because there is no other common place in pure CQRS for that. (There are some articles on doing CQRS+ES snapshots, if that is what you actually try to achieve with SpecialResource).
One way may be to materialize list of ids using some kind of stored procedure with the arrival of AddSpecialResource command (at the database).
Another way is to capture "all existing resources (up to the moment)" with some marker (timestamp), never delete old resources, and add "SpecialResource" condition in the queries, which will use the SpecialResource data.
Ok, one more option (depends on your case at hand) is to always have the list of ids handy with the same query, which served the UI. This way the definition of "all resources" changes to "all resources as seen by the user (at some moment)".
I do not think any computer system is ever going to be 100% consistent simply because life does not, and can not, work like this. Apparently we are all also living in the past since it takes time for your brain to process input.
The point is that you do the best you can with the information at hand but ensure that your system is able to smooth out any edges. So if you need to associate one or two resources with your SpecialResource then you should be able to do so.
So even if you could associate your SpecialResource with all existing entries in your data store what is to say that there isn't another resource that has not yet been entered into the system that also needs to be associated.
It all, as usual, will depend on your specific use-case. This is why process managers, along with their state, enable one to massage that state until the process can complete.
I hope I didn't misinterpret your question :)
You can do two things in order to solve that problem:
make a distinction between write and read model. You know what read model is, right? So "write model" of data in contrast is a combination of data structures and behaviors that is just enough to enforce all invariants and generate consistent event(s) as a result of every executed command.
don't take a rule which states "Event Store is a single source of truth" too literally. Consider the following interpretation: ES is a single source of ALL truth for your application, however, for each specific command you can create "write models" which will provide just enough "truth" in order to make this command consistent.

Resources