State machine implementation on legacy system

State machine implementation on legacy system - state-machine

I am very new to this subject, please let me know if you need more context.
I have an old legacy system with lots of complicated business logic/states that we are trying to extract and re-implement in a state machine.
Let's suppose I need to do something similar:
Save a User to the state "Approved"
Reject if the user data does not satisfy some conditions
Accept the change otherwise, save the User and send a notification
My understanding is that between 1. and 2. I need to call the state machine providing the new data (filled from a web form).
The state machine needs to get the current state from the database to understand what is the original state and verify if the conditions for switching the state to "Approved" are met.
Is my understanding right?
Thank you

Related

Do I need FIFO SQS for jira like board view app

Currently I am running a jira like board-stage-card management app on AWS ECS with 8 tasks. When a card is moved from one column/stage to another, I look for the current stage object for that card remove card from that stage and add card to the destination stage object. This is working so far because I am always looking for the actual card's stage in the Postgres database not base on what frontend think that card belongs to.
Question:
Is it safe to say that even when multiple users move the same card to different stages, but query would still happen one after the other and data will not corrupt? (such as duplicates)
If there is still a chance data can be corrupted. Is it a good option to use SQS FIFO to send message to a lambda and handle each card movement in sequence ?
Any other reason I should use SQS in this case ? or is SQS not applicable at all here?

The most important question here is: what do you want to happen?
Looking at the state of a card in the database, and acting on that is only "wrong" if it doesn't implement the behavior you want. It's true that if the UI can get out of sync with the database, then users might not always get the result they were expecting - but that's all.
Consider likelihood and consequences:
How likely is it that two or more people will update the same card, at the same time, to different stages?
And what is the consequence if they do?
If the board is being used by a 20 person project team, then I'd say the chances were 'low/medium', and if they are paying attention to the board they'll see the unexpected change and have a discussion - because clearly they disagree (or someone moved it to the wrong stage by accident).
So in that situation, I don't think you have a massive problem - as long as the system behavior is what you want (see my further responses below). On the other hand, if your board solution is being used to help operate a nuclear missile launch control system then I don't think your system is safe enough :)
Is it safe to say that even when multiple users move the same card to
different stages, but query would still happen one after the other and
data will not corrupt? (such as duplicates)
Yes the query will still happen - on the assumption:
That the database query looks up the card based on some stable identifier (e.g. CardID), and
that having successfully retrieved the card, your logic moves it to whatever destination stage is specified - implying there's no rules or state machine that might prohibit certain specific state transitions (e.g. moving from stage 1 to 2 is ok, but moving from stage 2 to 1 is not).
Regarding your second question:
If there is still a chance data can be corrupted.
It depends on what you mean by 'corruption'. Data corruption is when unintended changes occur in data, and which usually make it unusable (un-processable, un-readable, etc) or useless (processable but incorrect). In your case it's more likely that your system would work properly, and that the data would not be corrupted (it remains processable, and the resulting state of the data is exactly what the system intended it to be), but simply that the results the users see might not be what they were expecting.
Is it a good option
to use SQS FIFO to send message to a lambda and handle each card
movement in sequence ?
A FIFO queue would only ensure that requests were processed in the order in which they were received by the queue. Whether or not this is "good" depends on the most important question (first sentence of this answer).
Assuming the assumptions I provided above are correct: there is no state machine logic being enforced, and the card is found and processed via its ID, then all that will happen is that the last request will be the final state. E.g.:
Card State: Card.CardID = 001; Stage = 1.
3 requests then get lodged into the FIFO queue in this order:
User A - Move CardID 001 to Stage 2.
User B - Move CardID 001 to Stage 4.
User C - Move CardID 001 to Stage 3.
Resulting Card State: Card.CardID = 001; Stage = 3.
That's "good" if you want the most recent request to be the result.
Any other reason I should use SQS in this case ? or is SQS not
applicable at all here?
The only thing I can think of is that you would be able to store a "history", that way users could see all the recent changes to a card. This would do two things:
Prove that the system processed the requests correctly (according to what it was told to do, and it's logic).
Allow users to see who did what, and discuss.
To implement that, you just need to record all relevant changes to the card, in the right order. The thing is, the database can probably do that on it's own, so use of SQS is still debatable, all the queue will do is maybe help avoid deadlocks.
Update - RE Duplicate Cards
You'd have to check the documentation for SQS to see if it can evaluate queue items and remove duplicates.
Assuming it doesn't, you'll have to build something to handle that separately. All I can think of right now is to check for duplicates before adding them to the queue - because once that are there it's probably too late.
One idea:
Establish a component in your code which acts as the proxy/façade for the queue.
Make it smart in that it knows about recent card actions ("recent" is whatever you think it needs to be).
A new card action comes it, it does a quick check to see if it has any other "recent" duplicate card actions, and if yes, decides what to do.
One approach would be a very simple in-memory collection, and cycle out old items as fast as you dare to. "Recent", in terms of the lifetime of items in this collection, doesn't have to be the same as how long it takes for items to get through the queue - it just needs to be long enough to satisfy yourself there's no obvious duplicate.
I can see such a set-up working, but potentially being quite problematic - so if you do it, keep it as simple as possible. ("Simple" meaning: functionally as narrowly-focused as possible).
Sizing will be a consideration - how many items are you processing a minute?
Operational considerations - if it's in-memory it'll be easy to lose (service restarts or whatever), so design the overall system in such a way that if that part goes down, or the list is flushed, items still get added to the queue and things keep working regardless.

While you are right that a Fifo Queue would be best here, I think your design isn't ideal or even workable in some situation.
Let's say user 1 has an application state where the card is in stage 1 and he moves it to stage 2. An SQS message will indicate "move the card from stage 1 to stage 2". User 2 has the same initial state where card 1 is in stage 1. User 2 wants to move the card to stage 3, so an SQS message will contain the instruction "move the card from stage 1 to stage 3". But this won't work since you can't find the card in stage 1 anymore!
In this use case, I think a classic API design is best where an API call is made to request the move. In the above case, your API should error out indicating that the card is no longer in the state the user expected it to be in. The application can then reload the current state for that card and allow the user to try again.

CQRS Aggregate and Projection consistency

Aggregate can use View this fact is described in Vaughn Vernon's book:
Such Read Model Projections are frequently used to expose information to various clients (such as desktop and Web user interfaces), but they are also quite useful for sharing information between Bounded Contexts and their Aggregates. Consider the scenario where an Invoice Aggregate needs some Customer information (for example, name, billing address, and tax ID) in order to calculate and prepare a proper Invoice. We can capture this information in an easy-to-consume form via CustomerBillingProjection, which will create and maintain an exclusive instance of CustomerBilling-View. This Read Model is available to the Invoice Aggregate through the Domain Service named IProvideCustomerBillingInformation. Under the covers this Domain Service just queries the document store for the appropriate instance of the CustomerBillingView
Let's imagine our application should allow to create many users, but with unique names. Commands/Events flow:
CreateUser{Alice} command sent
UserAggregate checks UsersListView, since there are no users with name Alice, aggregate decides to create user and publish event.
UserCreated{Alice} event published // By UserAggregate
UsersListProjection processed UserCreated{Alice} // for simplicity let's think UsersListProjection just accumulates users names if receives UserCreated event.
CreateUser{Bob} command sent
UserAggregate checks UsersListView, since there are no users with name Bob, aggregate decides to create user and publish event.
UserCreated{Bob} event published // By UserAggregate
CreateUser{Bob} command sent
UserAggregate checks UsersListView, since there are no users with name Bob, aggregate decides to create user and publish event.
UsersListProjection processed UserCreated{Bob} .
UsersListProjection processed UserCreated{Bob} .
The problem is - UsersListProjection did not have time to process event and contains irrelevant data, aggregate used this irrelevant data. As result - 2 users with the same name created.
how to avoid such situations?
how to make aggregates and projections consistent?

how to make aggregates and projections consistent?
In the common case, we don't. Projections are consistent with the aggregate at some time in the past, but do not necessarily have all of the latest updates. That's part of the point: we give up "immediate consistency" in exchange for other (higher leverage) benefits.
The duplication that you refer to is usually solved a different way: by using conditional writes to the book of record.
In your example, we would normally design the system so that the second attempt to write Bob to our data store would fail because conflict. Also, we prevent duplicates from propagating by ensuring that the write to the data store happens-before any events are made visible.
What this gives us, in effect, is a "first writer wins" write strategy. The writer that loses the data race has to retry/fail/etc.
(As a rule, this depends on the idea that both attempts to create Bob write that information to the same place, using the same locks.)
A common design to reduce the probability of conflict is to NOT use the "read model" of the aggregate itself, but to instead use its own data in the data store. That doesn't necessarily eliminate all data races, but you reduce the width of the window.
Finally, we fall back on Memories, Guesses and Apologies.

It's important to remember in CQRS that every write model is also a read model for the reads that are required to validate a command. Those reads are:
checking for the existence of an aggregate with a particular ID
loading the latest version of an entire aggregate
In general a CQRS/ES implementation will provide that read model for you. The particulars of how that's implemented will depend on the implementation.
Those are the only reads a command-handler ever needs to perform, and if a query can be answered with no more than those reads, the query can be expressed as a command (e.g. GetUserByName{Alice}) which when handled does not emit events. The benefit of such read-only commands is that they can be strongly consistent because they are limited to a single aggregate. Not all queries, of course, can be expressed this way, and if the query can tolerate eventual consistency, it may not be worth paying the coordination tax for strong consistency that you typically pay by making it a read-only command. (Command handling limited to a single aggregate is generally strongly consistent, but there are cases, e.g. when the events form a CRDT and an aggregate can live in multiple datacenters where even that consistency is loosened).
So with that in mind:
CreateUser{Alice} received
user Alice does not exist
persist UserCreated{Alice}
CreateUser{Alice} acknowledged (e.g. HTTP 200, ack to *MQ, Kafka offset commit)
UserListProjection updated from UserCreated{Alice}
CreateUser{Bob} received
user Bob does not exist
persist UserCreated{Bob}
CreateUser{Bob} acknowledged
CreateUser{Bob} received
user Bob already exists
command-handler for an existing user rejects the command and persists no events (it may log that an attempt to create a duplicate user was made)
CreateUser{Bob} ack'd with failure (e.g. HTTP 401, ack to *MQ, Kafka offset commit)
UserListProjection updated from UserCreated{Bob}
Note that while the UserListProjection can answer the question "does this user exist?", the fact that the write-side can also (and more consistently) answer that question does not in and of itself make that projection superfluous. UserListProjection can also answer questions like "who are all of the users?" or "which users have two consecutive vowels in their name?" which the write-side cannot answer.

How to use a state chart as the flow chart for an agent

I have two processes I want to juxtapose. The first is a Manual workflow that is well represented by the Process library. The second is a software System that performs the same work, but is better modelled as a state transition system (e.g. s/w component level).
Now in AnyLogic, state models are for agents, that can run through processes with animations (counts), or move across space. What if I want to use a state chart to run an agent through? so I have a System state chart/agent and a Job state chart/agent?
I want Jobs from Population A to go through the Manual process flow chart and Jobs from Population B to go through the System state flow chart, so I can juxtapose the processing costs. I then calculate various delays and resource allocations for each of the Jobs going through and compare them.
Can anyone explain how to setup a state chart as the base process, another agent will go through? Is this even possible?
Please help
Thanks

This will not work as you would like it to, for these reasons:
You can't send an Agent into a flowchart. (Not sure how AnyLogic is handling it internally, maybe a generic token, or no flow at all, just changes to the state).
In AnyLogic there can only be one state active (simple or combined state) per state chart, so you can't represent a population with several members.
Agents can't be in more then one flow at a time, so even if it would be possible to insert an Agent into a statechart, this limitation would also apply.
The conclusion of this is: State charts are suitable for modeling individual behaviour (inside one Agent), whereas process flows can be both used for individual behaviour (inside one Agent, running a dummy Agent through) as well as for groups (multiple Agents running through process).
The normal use case would be to add the state chart to the Agent type running through your process flow (as you already noted in your question), applying the changes caused by the state chart to the individual agent.

How to assign a request to different people dependent on request type and initiator in DDD

I am working on a project and want to try to adhere to DDD principles. As I've been going about it I've come across some questions that I hope someone will be able to help me with.
The project is a request system with each request having multiple request types inside it.
When a request is submitted it will be in a status of AwaitingApproval and will get routed to different people sequentially according to a set of rules as below:-
1) If the request only contains request types that don't need
intermediate approval it will be routed to a processing department
who will be the one and only approval in the chain.
2) If the initiator of the request is a Level 1 manager it will require
approvals from Level2, Level 3 and Level 4 managers
3) If the initiator is a Level 2 manager the request will be as 2) but without the need for Level 2 approval for obvious reasons
4) If the request contains a request type that increases a monetary value by lets say >$500 it will require the approval of a Level 4 manager
A request at any of the stages can either be Approved, Rejected or Rejected With Changes. Approve it will take it take the next level in the approval chain. Reject ends the process entirely.
Reject With Changes allows the user to send back to any of the previous approvers of the request as appropriate who will then be able to do the same with an Approve potentially sending it back through the chain again if it was a monetary change or if the reject with changes came from the processing department it will be re-assigned straight back to them.
Initially, I considered that we had an aggregate route of a Request with a RequestStatus using the State Pattern.
So I would have something like
class Request{
_currentstate = new AwaitingApprovalState();
void AssignTo(string person){
_assignee = person;
}
void Approve(){
_currentstate = _currentstate.Approve();
}
}
class AwaitingApprovalState : IState{
void Approve(){
return new ApprovedState();
}
}
class ApprovedState : IState{
void Approve(){
return new Level2ManagerApprovedState();
}
}
This got me to a point but I kept getting caught in knots. I think I am missing something in my initial model.
Some questions that occur
1) Where does the responsibility of working out who the next manager in the chain is to assign the request? Does that belong in the state class implementations or somewhere else like on the Request itself?
2) Currently a new request is in AwaitingApprovalState and if I approve it goes straight to ApprovedState. Where does the logic go that determines that because I don't require any intermediate approvals it should go straight to the processing department?
3) If there is a reject with modifications how do we go back to previous levels - I have considered some sort of StatusHistory entity.
I have considered maybe that this is some sort of workflow component but want to avoid that as much as possible.
Any pointers or ideas would be very much appreciated

If often makes sense to model processes as histories of related events. You might imagine this as a log of activity related to a specific request. Imagine somebody getting messages from different departments, and writing down the messages in a book
Request #12354 submitted.
Request #12354 received level 2 approval: details....
Request #12354 received level 3 approval: details....
To figure out what work needs to be done next, you just review what has already happened. Load all of the events, fold them into an in memory representation, and then query that structure.
Where does the responsibility of working out who the next manager in the chain is to assign the request?
Something like that would probably be implemented in a domain service; if the aggregate doesn't contain the information that it needs to do work, then it has to ask somebody else.
A common pattern for this would be a "stateless" service that knows how to find the right manager, given a set of values which describe the state of the aggregate. The aggregate knows what state it is in, so it passes the values describing its state to the service to get the answer.
Manager levelFourManager = managers.getLevelFourManager(...)
Where does the logic go that determines that because I don't require any intermediate approvals it should go straight to the processing department?
Probably into the aggregate itself, eventually.
Rinat Abdullin put together a very good tutorial on evolving process managers, which is very much in line with Greg Young's talk Stop Over Engineering.
You've got some query in your model like
request.isReadyForProcessing()
In the early versions of your model, the request might answer false until some human operator has told it that "yes, you are ready"; then, over time you start adding in the easy cases to compute.
boolean isReadyForProcessing() {
return aHumanSaidImReadyForProcessing() || ImOneOfTheEasyCasesToProcess();
}
What "send to processing" actually means probably doesn't live in the aggregate. We might borrow the domain service idea again, this time to communicate with an external system
void notify(ProcessingClient client) {
if (this.isReadyForProcessing()) {
client.process(this.id);
}
}
The processing client might be doing real work, or it might just be sending a message somewhere else -- the aggregate model doesn't really care.
Part of the point of domain model, as a pattern, is that our domain calls for the coordination/orchestration of messages between objects in the model. If we didn't need that complexity, we'd probably look at something more straight forward, like transaction scripts. The printed version of Patterns of Enterprise Application Architecture dedicates a number of pages to describing these.
If there is a reject with modifications how do we go back to previous levels - I have considered some sort of StatusHistory entity.
Yes, that -- RejectWithModifications is just another message to write into the book, and that gives you more information to consider when answering questions.
Request #12354 submitted.
Request #12354 received level 2 approval: details....
Request #12354 received level 3 approval: details....
Request #12354 rejected with modifications: details....
I understand what you're saying and it makes great sense. I still get caught up in implementation details.
That is not your fault.
The literature is weak.
does the log of events lets call it ActivityLog live on the Request aggregate or is its own aggregate like in the Cargo DDD samples?
Putting it into the aggregate is probably the right place to start; it might not stay there. Finding a decent local minimum for your current system is probably better than trying to find the global minimum right away.
Are there differences between domain events as per Evans in the blue book and more recent domain events.
Maybe; it's also tangled because domain events aren't necessarily the sort of thing people are talking about when they say "event sourcing".
Need to see the wood for the trees.
The only thing that has worked for me, is to regularly go back to first principles, working through solutions piece by piece, and watching like a hawk for implicit assumptions.

1) Where does the responsibility of working out who the next manager
in the chain is to assign the request? Does that belong in the state
class implementations or somewhere else like on the Request itself?
It depends. It could be in Request itself, it could be in a Domain Service.
As an aside, I would recommend, if feasible, not determining exactly who the next validator is when the Request transitions to its next state but later. Sending a notification and displaying the validation request on a dashboard are consequences of domain state changes but not state changes per se - they don't need to happen atomically with the operation on Request but can happen at a later time.
If you manage to dissociate the bit that looks up validator data for request followup from the logic that determines who the next type of validators is (Level1 manager, Level 2 manager, etc.) you will probably spare yourself some complex modelling of the Request aggregate.
2) Currently a new request is in AwaitingApprovalState and if I
approve it goes straight to ApprovedState. Where does the logic go
that determines that because I don't require any intermediate
approvals it should go straight to the processing department?
Same as 1)
3) If there is a reject with modifications how do we go back to
previous levels - I have considered some sort of StatusHistory entity.
You could either work out who the previous validation group was, using the same kind of logic as for determining the next group. Or you could store a history of past states as a private member of Request alongside _currentState

for explaining this lets make assumption that there are there types of request types:
Purchase (require manager approval, eg: level 2 require level 3 and above managers approval)
BusinessMeet (No Approval Needed)
and as we can see there are diff. type of requests with diff. approval cycle and more such type of requests will be added in future.
Now lets see for the current structure how we would define it in DDD:
PurchaseRequest Aggregate extends RequestAgg
requestid
requested by
purchase info - description about purchase
requested by manager level
pending on mangers lists -- list of manager with level
approved by mangers lists -- list of manager with level
next manager for approval -- manager with level
status {approved , pending}
BusinessMeetRequest Aggregate extends RequestAgg
requestid
requested by
status {approved , pending} -- by default it should be approved
ApprovalRequestAgg
requestid
manager id
request type
status - (Approved , Rejected)
When user request he either hit api with purchase request or BusinessMeetRequest
In this case lets say user hit with purchase request then PurchaseRequestAgg will be created.
Based on the event PurchaseRequestCreated one ProcessManager will listen to the event and create a new agg ApprovalRequestAgg which has the manager id.
Manager will be able to see the request which it needs to approve from ApprovalRequest Read Model. and to see the info of request as ApprovalRequest has the request id and request type he will be able to fetch the actual purchase request, after this he can either approve or reject and send a event ApprovalRequestRejected or ApprovalRequestApproved.
Based on the above event one will update the PurchaseRequestAgg. and PurchaseRequest Agg will give a event (lets say after approval) PurchaseRequestAcceptedByManager.
Now someone will listen and the above loop work.
**In the above solution only problem is adding a new type of request will take time **
Another way could be there is a single RequestAgg. for request
RequestAgg
- request id
- type
- info
- status
and the algo for giving update to the manager is written in ProcessManager.
I think this would help you. if still has doubts , ping again :)

Should latest event version be queried in event sourcing?

I am developing a simple DDD + Event sourcing based app for educational purposes.
In order to set event version before storing to event store I should query event store but my gut tells that this is wrong because it causes concurrency issues.
Am I missing something?

There are different answers to that, depending on what use case you are considering.
Generally, the event store is a dumb, domain agnostic appliance. It's superficially similar to a List abstraction -- it stores what you put in it, but it doesn't actually do any work to satisfy your domain constraints.
In use cases where your event stream is just a durable record of things that have happened (meaning: your domain model does not get a veto; recording the event doesn't depend on previously recorded events), then append semantics are fine, and depending on the kind of appliance you are using, you may not need to know what position in the stream you are writing to.
For instance: the API for GetEventStore understands ExpectedVersion.ANY to mean append these events to the end of the stream wherever it happens to be.
In cases where you do care about previous events (the domain model is expected to ensure an invariant based on its previous state), then you need to do something to ensure that you are appending the event to the same history that you have checked. The most common implementations of this communicate the expected position of the write cursor in the stream, so that the appliance can reject attempts to write to the wrong place (which protects you from concurrent modification).
This doesn't necessarily mean that you need to be query the event store to get the position. You are allowed to count the number of events in the stream when you load it, and to remember how many more events you've added, and therefore where the stream "should" be if you are still synchronized with it.
What we're doing here is analogous to a compare-and-swap operation: we get a representation of the original state of the stream, create a new representation, and then compare and swap the reference to the original to point instead to our changes
oldState = stream.get()
newState = domainLogic(oldState)
stream.compareAndSwap(oldState, newState)
But because a stream is a persistent data structure with append only semantics, we can use a simplified API that doesn't require duplicating the existing state.
events = stream.get()
changes = domainLogic(events)
stream.appendAt(count(events), changes)
If the API of your appliance doesn't allow you to specify a write position, then yes - there's the danger of a data race when some other writer changes the position of the stream between your query for the position and your attempt to write. Data obtained in a query is always stale; unless you hold a lock you can't be sure that the data hasn't changed at the source while you are reading your local copy.

I guess you shouldn't to think about event version.
If you talk about the place in the event stream, in general, there's no guaranteed way to determine it at the creation moment, only in processing time or in event-storage.
If it is exactly about event version (see http://cqrs.nu/Faq, How do I version/upgrade my events?), you have it hardcoded in your application. So, I mean next use case:
First, you have an app generating some events. Next, you update app and events are changed (you add some fields or change payload structure) but kept logical meaning. So, now you have old events in your ES, and new events, that differ significantly from old. And to distinguish one from another you use event version, eg 0 and 1.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string