Related
Currently I am running a jira like board-stage-card management app on AWS ECS with 8 tasks. When a card is moved from one column/stage to another, I look for the current stage object for that card remove card from that stage and add card to the destination stage object. This is working so far because I am always looking for the actual card's stage in the Postgres database not base on what frontend think that card belongs to.
Question:
Is it safe to say that even when multiple users move the same card to different stages, but query would still happen one after the other and data will not corrupt? (such as duplicates)
If there is still a chance data can be corrupted. Is it a good option to use SQS FIFO to send message to a lambda and handle each card movement in sequence ?
Any other reason I should use SQS in this case ? or is SQS not applicable at all here?
The most important question here is: what do you want to happen?
Looking at the state of a card in the database, and acting on that is only "wrong" if it doesn't implement the behavior you want. It's true that if the UI can get out of sync with the database, then users might not always get the result they were expecting - but that's all.
Consider likelihood and consequences:
How likely is it that two or more people will update the same card, at the same time, to different stages?
And what is the consequence if they do?
If the board is being used by a 20 person project team, then I'd say the chances were 'low/medium', and if they are paying attention to the board they'll see the unexpected change and have a discussion - because clearly they disagree (or someone moved it to the wrong stage by accident).
So in that situation, I don't think you have a massive problem - as long as the system behavior is what you want (see my further responses below). On the other hand, if your board solution is being used to help operate a nuclear missile launch control system then I don't think your system is safe enough :)
Is it safe to say that even when multiple users move the same card to
different stages, but query would still happen one after the other and
data will not corrupt? (such as duplicates)
Yes the query will still happen - on the assumption:
That the database query looks up the card based on some stable identifier (e.g. CardID), and
that having successfully retrieved the card, your logic moves it to whatever destination stage is specified - implying there's no rules or state machine that might prohibit certain specific state transitions (e.g. moving from stage 1 to 2 is ok, but moving from stage 2 to 1 is not).
Regarding your second question:
If there is still a chance data can be corrupted.
It depends on what you mean by 'corruption'. Data corruption is when unintended changes occur in data, and which usually make it unusable (un-processable, un-readable, etc) or useless (processable but incorrect). In your case it's more likely that your system would work properly, and that the data would not be corrupted (it remains processable, and the resulting state of the data is exactly what the system intended it to be), but simply that the results the users see might not be what they were expecting.
Is it a good option
to use SQS FIFO to send message to a lambda and handle each card
movement in sequence ?
A FIFO queue would only ensure that requests were processed in the order in which they were received by the queue. Whether or not this is "good" depends on the most important question (first sentence of this answer).
Assuming the assumptions I provided above are correct: there is no state machine logic being enforced, and the card is found and processed via its ID, then all that will happen is that the last request will be the final state. E.g.:
Card State: Card.CardID = 001; Stage = 1.
3 requests then get lodged into the FIFO queue in this order:
User A - Move CardID 001 to Stage 2.
User B - Move CardID 001 to Stage 4.
User C - Move CardID 001 to Stage 3.
Resulting Card State: Card.CardID = 001; Stage = 3.
That's "good" if you want the most recent request to be the result.
Any other reason I should use SQS in this case ? or is SQS not
applicable at all here?
The only thing I can think of is that you would be able to store a "history", that way users could see all the recent changes to a card. This would do two things:
Prove that the system processed the requests correctly (according to what it was told to do, and it's logic).
Allow users to see who did what, and discuss.
To implement that, you just need to record all relevant changes to the card, in the right order. The thing is, the database can probably do that on it's own, so use of SQS is still debatable, all the queue will do is maybe help avoid deadlocks.
Update - RE Duplicate Cards
You'd have to check the documentation for SQS to see if it can evaluate queue items and remove duplicates.
Assuming it doesn't, you'll have to build something to handle that separately. All I can think of right now is to check for duplicates before adding them to the queue - because once that are there it's probably too late.
One idea:
Establish a component in your code which acts as the proxy/façade for the queue.
Make it smart in that it knows about recent card actions ("recent" is whatever you think it needs to be).
A new card action comes it, it does a quick check to see if it has any other "recent" duplicate card actions, and if yes, decides what to do.
One approach would be a very simple in-memory collection, and cycle out old items as fast as you dare to. "Recent", in terms of the lifetime of items in this collection, doesn't have to be the same as how long it takes for items to get through the queue - it just needs to be long enough to satisfy yourself there's no obvious duplicate.
I can see such a set-up working, but potentially being quite problematic - so if you do it, keep it as simple as possible. ("Simple" meaning: functionally as narrowly-focused as possible).
Sizing will be a consideration - how many items are you processing a minute?
Operational considerations - if it's in-memory it'll be easy to lose (service restarts or whatever), so design the overall system in such a way that if that part goes down, or the list is flushed, items still get added to the queue and things keep working regardless.
While you are right that a Fifo Queue would be best here, I think your design isn't ideal or even workable in some situation.
Let's say user 1 has an application state where the card is in stage 1 and he moves it to stage 2. An SQS message will indicate "move the card from stage 1 to stage 2". User 2 has the same initial state where card 1 is in stage 1. User 2 wants to move the card to stage 3, so an SQS message will contain the instruction "move the card from stage 1 to stage 3". But this won't work since you can't find the card in stage 1 anymore!
In this use case, I think a classic API design is best where an API call is made to request the move. In the above case, your API should error out indicating that the card is no longer in the state the user expected it to be in. The application can then reload the current state for that card and allow the user to try again.
I am working on a project in which we define two aggregates: "Project" and "Task". The Project, in addition to other attributes, has the points attribute. These points are distributed to the tasks as they are defined by users. In a use case, the user assigns points for some task, but the project must have these points available.
We currently model this as follows:
“task.RequestPoints(points)“, this method will create an aggregate PointsAssignment with attributes points and taskId, which in its constructor issues a PointsAssignmentRequested domain event.
The handler of the event issued will fetch the project related to the task and the aggregate PointsAssigment and call the method “project.assignPoints(pointsAssigment, service)”, that is, it will pass PointAssignment aggregate as a parameter and a service to calculate the difference between the current points of the task and the desired points.
If points are available, the project will modify its points attribute and issue a “ProjectPointsAssigned” domain event that will contain the pointsAssignmentId attribute (in addition to others)
The handler of this last event will fetch the PointsAssingment and confirm “pointsAssigment.Confirm ()”, this aggregate will issue a PointsAssigmentConfirmed domain event
The handler for this last event will bring up the associated task and call “task.AssignPoints (pointsAssignment.points)”
My question is: is it correct to pass in step 2 the aggregate PointsAssignment in the project method? That was the only way I found to be able to relate the aggregates.
Note: We have created the PointsAssignment aggregate so that in case of failure I could save the error “pointsAssignment.Reject(reasonText)” and display it to the user, since I am using eventual consistency (1 aggregate per transaction).
We think about use a Process Manager (PointsAssingmentProcess), but the same way we need the third aggregate PointsAssingment to correlate this process.
I would do it a little bit differently (it doesn´t mean more correct).
Your project doesn´t need to know anything about the PointsAssignment.
If your project is the one that has the available points for use, it can have simple methods of removing or adding points.
RemovePointsCommand -> project->removePoints(points)
AddPointsCommand -> project->addPoints(points)
Then, you would have an eventHandler that would react to the PointsAssignmentRequested (i imagine this guy has the id of the project and the number of points and maybe a status field from what you said)
This eventHandler would only do:
on(PointsAssignmentRequested) -> dispatch command (RemovePointsCommand)
// Note that, in here it would be wise to the client to send an ID for this operation, so it can do it asynchronously.
That command can either success or fail, and both of them can dispatch events:
RemovePointsSucceeded
RemovePointsFailed
// Remember that you have a correlation id from earlier persisted
Then, you would have a final eventHandler that would do:
on(RemovePointsSucceeded) -> PointsAssignment.succeed() //
Dispatches PointsAssignmentSuceeded
on(PointsAssignmentSuceeded) -> task.AssignPoints
(pointsAssignment.points)
On the fail side
on(RemovePointsFailed) -> PointsAssignment.fail() // Dispatches PointsAssignmentFailed
This way you dont have to mix aggregates together, all they know are each others id´s and they can work without knowing anything about the schema of other aggregates, avoiding undesired coupling.
I see the semantics of the this problem exactly as a bank transfer.
You have the bank account (project)
You have money in this bank account(points)
You are transferring money through a transfer process (pointsAssignment)
You are transferring money to an account (task)
The bank account only should have minimal operations, of withdrawing and depositing, it does not need to know anything about the transfer process.
The transfer process need to know from which bank it is withdrawing from and to which account it is depositing to.
I imagine your PointsAssignment being like
{
"projectId":"X",
"taskId":"Y",
"points" : 10,
"status" : ["issued", "succeeded", "failed"]
}
Suppose there are two microservices: Order and Inventory. There is an API in order service that takes ProductId, Qty etc and place the order.
Ideally order should only be allowed to place if inventory exists in inventory service. People recommend to have Saga pattern or any other distributed transactions. That is fine and eventually consistency will be utilized.
But what if somebody wants to abuse the system. He can push orders with products (ProductIds) which are either invalid or out of inventory. System will be taking all these orders and place these orders in queue and Inventory service will be handling these invalid order.
Shouldn't this be handled upfront (in order service) rather than pushing these invalid orders to the next level (specially where productId is invalid)
What are the recommendations to handle these scenarios?
What are the recommendations to handle these scenarios?
Give your order service access to the data that it needs to filter out undesirable orders.
The basic plot would be that, while the Inventory service is the authority for the state of inventory, your Orders service can work with a cached copy of the inventory to determine which orders to accept.
Changes to the Inventory are eventually replicated into the cache of the Orders service -- that's your "eventual consistency". If Inventory drops off line for a time, Order's can continue providing business value based on the information in its cache.
You may want to be paying attention to the age in the data in the cache as well -- if too much time has passed since the cache was last updated, then you may want to change strategies.
Your "aggregates" won't usually know that they are dealing with a cache; you'll pass along with the order data a domain service that supports the queries that the aggregate needs to do its work; the implementation of the domain service accesses the cache to provide answers.
So long as you don't allow the abuser to provide his own instance of the domain service, or to directly manipulate the cache, then the integrity of the cached data is ensured.
(For example: when you are testing the aggregate, you will likely be providing cached data tuned to your specific test scenario; that sort of hijacking is not something you want the abuser to be able to achieve in your production environment).
You most definitely would want to ensure up-front that you can catch as many invalid business cases as possible. There are a couple ways to deal with this. It is the same situation as one would have when booking a seat on an airline. Although they do over-booking which we'll ignore for now :)
Option 1: You could reserve an inventory item as part of the order. This is more of a pessimistic approach but your item would be reserved while you wait for the to be confirmed.
Option 2: You could accept the order only if there is an inventory item available but not reserve it and hope it is available later.
You could also create a back-order if the inventory item isn't available and you want to support back-orders.
If you go with option 1 you could miss out on a customer if an item has been reserved for customer A and customer B comes along and cannot order. If customer A decides not to complete the order the inventory item becomes available again but customer B has now gone off somewhere else to try and source the item.
As part of the fulfillment of your order you have to inform the inventory bounded context that you are now taking the item. However, you may now find that both customer A and B have accepted their quote and created an order for the last item. One is going to lose out. At this point the one not able to be fulfilled will send a mail to the customer and inform them of the unfortunate situation and perhaps create a back-order; or ask the customer to try again in X-number of days.
Your domain experts should make the call as to how to handle the scenarios and it all depends on item popularity, etc.
I will not try to convince you to not do this checking before placing an order and to rely on Sagas as it is usually done; I will consider that this is a business requirement that you must implement.
This seems like a new sub-domain to me: bad-behavior-prevention (or how do you want to call it) that comes with a new responsibility: to prevent abusers. You could add this responsibility to the Order microservice but you would break the SRP. So, it should be done in another microservice.
This new microservice is called from your API Gateway (if you have one) or from the Orders microservice.
If you do not to add a new microservice (from different reasons) then you could implement this new functionality as a module inside of the Orders microservice but I strongly recommend to make it highly decoupled from its host (separate and private persistence/database/table).
I am developing an application that has employee time tracking module. When employee starts working (e.g. at some abstract machine), we need to save information about him working. Each day lots of employees work at lots of machines and they switch between them. When they start working, they notify the system that they have started working. When they finish working - they notify the system about it as well.
I have an aggregate Machine and an aggregate Employee. These two are aggregate roots with their own behavior. Now I need a way to build reports for any given Employee or any given Machine for any given period of time. For example, I want to see which machines did given employee used over period of time and for how long. Or I want to see which employees worked at this given machine for how long over period of time.
Ideally (I think) my aggregate Machine should have methods startWorking(Employee employee) and finishWorking(Employee employee).
I created another aggregate: EmployeeWorkTime that stores information about Machine, Employee and start,finish timestamps. Now I need a way to modify one aggregate and create another at the same time (or ideally some another approach since this way it's somewhat difficult).
Also, employees have a Shift that describes for how many hours a day they must work. The information from a Shift should be saved in EmployeeWorkTime aggregate in order to be consistent in a case when Shift has been changed for given Employee.
Rephrased question
I have a Machine, I have an Employee. HOW the heck can I save information:
This Employee worked at this Machine from 1.05.2017 15:00 to 1.05.1017 18:31.
I could do this simply using CRUD, saving multiple aggregates in one transaction, going database-first. But I want to use DDD methods to be able to manage complexity since the overall domain is pretty complex.
From what I understand about your domain you must model the process of an Employee working on a machine. You can implement this using a Process manager/Saga. Let's name it EmployeeWorkingOnAMachineSaga. It work like that (using CQRS, you can adapt to other architectures):
When an employee wants to start working on a machine the EmployeeAggregate receive the command StartWorkingOnAMachine.
The EmployeeAggregate checks that the employee is not working on another machine and if no it raises the EmployeeWantsToWorkOnAMachine and change the status of the employee as wantingToWorkOnAMachine.
This event is caught by the EmployeeWorkingOnAMachineSaga that loads the MachineAggregate from the repository and it sends the command TryToUseThisMachine; if the machine is not vacant then it rejects the command and the saga sends the RejectWorkingOnTheMachine command to the EmployeeAggregate which in turns change it's internal status (by raising an event of course)
if the machine is vacant, it changes its internal status as occupiedByAnEmployee (by raising an event)
and similar when the worker stops working on the machine.
Now I need a way to build reports for any given Employee or any given Machine for any given period of time. For example, I want to see which machines did given employee used over period of time and for how long. Or I want to see which employees worked at this given machine for how long over period of time.
This should be implemented by read-models that just listen to the relevant events and build the reports that you need.
Also, employees have a Shift that describes for how many hours a day they must work. The information from a Shift should be saved in EmployeeWorkTime aggregate in order to be consistent in a case when Shift has been changed for given Employee
Depending on how you want the system to behave you can implement it using a Saga (if you want the system to do something if the employee works more or less than it should) or as a read-model/report if you just want to see the employees that do not conform to their daily shift.
I am developing an application that has employee time tracking module. When employee starts working (e.g. at some abstract machine), we need to save information about him working. Each day lots of employees work at lots of machines and they switch between them. When they start working, they notify the system that they have started working. When they finish working - they notify the system about it as well.
A critical thing to notice here is that the activity you are tracking is happening in the real world. Your model is not the book of record; the world is.
Employee and Machine are real world things, so they probably aren't aggregates. TimeSheet and ServiceLog might be; these are the aggregates (documents) that you are building by observing the activity in the real world.
If event sourcing is applicable there, how can I store domain events efficiently to build reports faster? Should each important domain event be its own aggregate?
Fundamentally, yes -- your event stream is going to be the activity that you observe. Technically, you could call it an aggregate, but its a pretty anemic one; easier to just think of it as a database, or a log.
In this case, it's probably just full of events like
TaskStarted {badgeId, machineId, time}
TaskFinished {badgeId, machineId, time}
Having recorded these events, you forward them to the domain model. For instance, you would take all of the events with Bob's badgeId and dispatch them to his Timesheet, which starts trying to work out how long he was at each work station.
Given that Machine and Employee are aggregate roots (they have their own invariants and business logic in a complex net of interrelations, timeshift-feature is only one of the modules)
You are likely to get yourself into trouble if you assume that your digital model controls a real world entity. Digital shopping carts and real world shopping carts are not the same thing; the domain model running on my phone can't throw things out of my physical cart when I exceed my budget. It can only signal that, based on the information that it has, the contents are not in compliance with my budgeting policy. Truth, and the book of record are the real world.
Greg Young discusses this in his talk at DDDEU 2016.
You can also review the Cargo DDD Sample; in particular, pay careful attention to the distinction between Cargo and HandlingHistory.
Aggregates are information resources; they are documents with internal consistency rules.
Problem:
Two employees (A & B) go off-line at the same time while editing customer #123, say version #20, and while off-line continue making changes...
Scenarios:
1 - The two employees edit customer #123 and make changes to one or more identical attributes.
2 - The two employees edit customer #123 but DO NOT make the same changes (they cross each other without touching).
... they then both come back on-line, first employee A appends, thereby changing the customer to version #21, then employee B, still on version #20
Questions:
Who's changes do we keep in scenario 1?
Can we do a merge in scenario 2, how?
Context:
1 - CQRS + Event Sourcing style system
2 - Use Event Sourcing Db as a Queue
3 - Eventual Consistency on Read Model
4 - RESTful APIs
EDIT-1: Clarifications based on the answers so far:
In order to perform fined grained merging, I'll need to have one command for each of field in a form for example?
Above, finely grained commands for ChangeName, ChangeSupplier, ChangeDescription, etc., each with their own timestamp would allow for auto-merging in the event A & B both updated ChangedName?
Edit-2: Follow up based on the the use of a particular event store:
It seems as though I'll make use of #GetEventStore for the persistence of my event streams.
They make use of Optimistic Concurrency as follows:
Each event in a stream increments stream version by 1
Writes can specify an expected version, making use of the ES-ExpectedVersion header on writers
-1 specifies stream should not already exist
0 and above specifies a stream version
Writes will fail if the stream is not at the version, you either retry with a new expected version number or you reprocessed the behavior and decided it's OK if you so choose.
If no ES-Expected Version specified, optimistic concurrency control is disabled
In this context, the Optimistic Concurrency is not only based on the Message ID, but also on the Event #
If I understand your design picture correctly, then the occasionally connected users enqueue commands, i.e., change requests, and when the user reconnects the queued commands are sent together; there is only one database authority (that the command handlers query to load the most recent versions of their aggretates); only the view model is synced to the clients.
In this setup, Scenario 2 is trivially auto-merged by your design, if you choose your commands wisely, read: make them fine-grained: For every possible change, choose one command. Then, on re-connection of the client, the commands are processed in any order, but since they only affect disjunct fields, there is no problem:
Customer is at v20.
A is offline, edits changes against stale model of v20.
B is offline, edits changes against stale model of v20.
A comes online, batch sends an queued ChangeName command, the Customer of v20 is loaded and persisted as v21.
B comes online, batch sends an queued ChangeAddress command, the Customer of v21 is loaded and persisted as v22.
The database contains the user with their correct name and address, as expected.
In Scenario 1, with this setup, both employees will overwrite the other employees' changes:
Customer is at v20.
A is offline, edits changes against stale model of v20.
B is offline, edits changes against stale model of v20.
A comes online, batch sends an queued ChangeName command to "John Doe", the Customer of v20 is loaded and persisted as v21 with name "John Doe"
B comes online, batch sends an queued ChangeName command to "Joan d'Arc", the Customer of v21 (named "John Doe") is loaded and persisted as v22 (with name "Joan d'Arc').
Database contains a user with name "Joan d'Arc".
If B comes online before A, then it's vice versa:
Customer is at v20.
A is offline, edits changes against stale model of v20.
B is offline, edits changes against stale model of v20.
B comes online, batch sends an queued ChangeName command to "Joan d'Arc", the Customer of v20 is loaded and persisted as v21 (with name "Joan d'Arc').
A comes online, batch sends an queued ChangeName command to "John Doe", the Customer of v21 is loaded and persisted as v22 with name "John Doe".
Database contains a user with name "John Doe".
There are two ways to enable conflict detection:
Check whether the command's creation date (i.e., the time of the employees modification) is after the last modification date of the Customer. This will disable the auto-merge feature of Scenario 2, but will give you full conflict detection against concurrent edits.
Check whether the command's creation date (i.e., the time of the employees modification) is after the last modification date of the individual field of the Customer it is going to change. This will leave the auto-merge of Scenario 2 intact, but will give you auto-conflict-detection in Scenario 1.
Both are easy to implement with event sourcing (since the timestamps of the individual events in the event stream are probably known).
As for your question "Who's changes do we keep in scenario 1?" -- this depends on your business domain and its requirements.
EDIT-1: To answer on the clarification question:
Yes, you'll need one command for each field (or group of fields, respectively) that can be changed individually.
Regarding your mockup: What you are showing is a typical "CRUD" UI, i.e., multiple form fields and, e.g., one "Save" button. CQRS is usually and naturally combined with a "task based" UI, where there would be, say, the Status field be displayed (read-only), and if a user wants to change the status, one clicks, say, a "Change Status" button, which opens a dialog/new window or other UI element, where one can change the status (in web based systems, in-place-editing is also common). If you are doing a "task based" UI, where each task only affects a small subset of all fields, then finely grained commands for ChangeName, ChangeSupplier etc are a natural fit.
Here's a generic overview of some solutions:
Scenario 1
Someone has to decide, preferably a human. You should ask the user or show that there is a conflict.
Dropbox solves this by picking the later file and keeping a file.conflict file in the same directory for the user to delete or use.
Scenario 2
Keep the original data around and see which fields actually changed. Then you can apply employee 1's changes and then employee 2's changes without stepping on any toes.
Scenario 3 (Only when the changes come online at different times)
Let the second user know that there were changes while they were offline. Attempt Scenario 2 and show the second user the new result (because this might change his inputs). Then ask him if he wants to save his changes, modify them first, or throw them out.
Aaron, where the events do actually conflict, i.e. in scenario 1 then I would expect a concurrency exception of some sort to be thrown.
The second scenario is much more interesting. Assuming your commands and events are reasonably well defined, i.e. not a wrapper for CRUD then you would be able to test if the events committed since your command was issued actually conflict. I use a concurrency conflict registry for this purpose. Essentially when I detect a potential conflict I grab the events that have been committed since the version I currently have and ask the registry to check if any of them actually conflict.
If you want to see a code example and and a bit more detail on this I put together a post outlining my approach. Take a look at it here: handling concurrency issues in cqrs es systems
Hope this helps!
In this case, maybe you can use the "aggregate root" concept, for the Item which powered by CEP Engine (Complex Event Process Engine) to perform these complex operations.