We have an event sourced system using GetEventStore where the command-side and denormalisers are running in two separate processes.
I have an event handler which sends emails as the result of a user saving an application (an ApplicationSaved event), and I need to change this so that the email is sent only once for a given application.
I can see a few ways of doing this but I'm not really sure which is the right way to proceed.
1) I could look in the read store to see if theres a matching application however there's no guarantee that the data will be there when my email handler is processing the event.
2) I could attach something to my ApplicationSaved event, maybe Revision which gets incremented on each subsequent save. I then only send the email if Revision is 1.
3) In my event handler I could load in the events from my event store for the matching customer using a repository, and kind of build up an aggregate separate from the one in my domain. It could contain a list of applications with which I can use to make my decision.
My thoughts:
1) This seems a no-go as the data may or may not be in the read store
2) If the data can be derived from a stream of events then it doesn't need to be on the event itself.
3) I'm leaning towards this, but there's meant to be a clear separation between read and write sides which this feels like it violates. Is this permitted?
I can see a few ways of doing this but I'm not really sure which is the right way to proceed.
There's no perfect answer - in most cases, externally observable side effects are independent of your book of record; you're always likely to have some failure mode where an email is sent but the system doesn't know, or where the system records that an email was sent but there was actually a failure.
For a pretty good answer: you're normally going to start with a facility that sends and email and reports as an event that the email was sent successfully, or not. That's fundamentally an event stream - your model doesn't get to veto whether or not the email was sent.
With that piece in place, you effectively have a query to run, which asks "what emails do I need to send now?" You fold the ApplicationSaved events with the EmailSent events, compute from that what new work needs to be done.
Rinat Abdullin, writing Evolving Business Processes a la Lokad, suggested using a human operator to drive the process. Imagine building a screen, that shows what emails need to be sent, and then having buttons where the human says to actually do "it", and the work of sending an email happens when the human clicks the button.
What the human is looking at is a view, or projection, which is to say a read model of the state of the system computed from the recorded events. The button click sends a message to the "write model" (the button clicked event tells the system to try to send the email and write down the outcome).
When all of the information you need to act is included in the representation of the event you are reacting to, it is normal to think in terms of "pushing" data to the subscribers. But when the subscriber needs information about prior state, a "pull" based approach is often easier to reason about. The delivered event signals the project to wake up (reducing latency).
Greg Young covers push vs pull in some detail in his Polyglot Data talk.
Related
I wondering how to update bunch of data in Event Sourcing concept for any aggregate.
In traditional application I would take some data such as name, date of birth etc. and put them into existing object; as I understand, in ES concept this approach is wrong, so that should I perform different Events to update different parts of aggregate root? If so, that how to build REST API? How to handle with validation?
In traditional application I would take some data such as name, date of birth etc. and put them into existing object; as I understand, in ES concept this approach is wrong,
Short answer: that approach is fine -- what changes in event sourcing is how you keep track of the changes in your service.
A way to think of a stream of events is a sequence of patch-documents. There's nothing wrong with changing multiple fields in a single patch document, and that is fine in events as well.
This question is really too broad for SO. You should google “event sourcing basics in azure” to find detailed articles, github projects, videos, and other responses to these questions.
In general, in Event Sourcing there two main ideas you need – Messages and Events. A typical process (not the only option, but a common one) is as follows. A message is created by your UI which makes a request for a change to be made to an AR. Validation for that message is done on the message creation source.
The message is then sent to an API, where it is validated again since you can't trust all possible senders. The request is processed, resulting in changes made to an AR. An event is then created describing the changes made, and that event is placed on an event source (Azure Event Hub, Kafka, Kinesis, a DB, whatever). This list of events is kept forever and describes each and every change made to that AR throughout time, including the initial creation request. To get the current state of the AR, just add up all the events.
The key idea that is confusing when learning Event Sourcing is the two different types of “events”. Messages ask for a change to be made, Events record that a change has been made.
As already answered, the batch update approach is fine.
I suggest to focus on the event consumption code. If all you have in your ReadSide is a complete aggregate representation, then generic *_UPDATED event is ok.
But if you do have parts of you system interested only in particular part of your aggregate, you might want to update that part separately, so that system doesn't have to analyze all events and dig for particular data.
For example, some demographic analysis system is only interested in the birthdate. It would be much easier for this system to have a BURTHDATE_SET event that it would listen to, and ignore all others.
Fine grained events like this also reduces coupling, because require less knowledge of the internal event data structure.
It feels like you still have an active record way of looking at things.
You should model the things that happen to your entity as events rather than the impact of things happening.
So to my mind all of that data might be gathered in a "Person was registered" event but an "Address added" event might also exist - in which case your single command might end up appending two events to the event stream.
I am trying to implement another DDD bounded context with CQRS and ES.
I wonder, given there is CreateUserCommand that creates User in my domain model (not a word about saving). Then it fires UserCreatedEvent.
I have two event handlers for that event:
PersistUserEventHandler (updates state of app) and
SendWelcomeEmailEventHandler (sends welcome email to user)
Now, I know, that:
Order of processing event in Event Handlers should not matter
Saving state should be detail, because source of truth is in my event store.
But, what if I do not want to send welcome email until my read model is fully updated? Because, what if for example process is delayed or some error occurs and I am not able to persist that user into read model right now? Then I do not want send that welcome email now, because if user clicked to for example link to his profile in mail, he would see "user does not exists".
I saw people are persisting changes through repository directly in command handlers (which would solve this problem), but that does not make sense with Event Sourcing, because I want to be able to replay all events (with event handlers for persisting only to prevent all other side effects) and get actual state of application in persistence layer.
Or should I listen to UserCreatedEvent only with event handler that actually persists it into read model and then raise in this event handler another event CreatedUserSavedEvent and all emails etc. would have been sent by their handlers?
I suppose NO too, because it reminds me some event hell and also if I get EventBus into some event handler, I am getting into circular reference problem which is just effect of that I am violating rule that every depencency should point down to lower components of my system and not the other side.
So, how is this usualy solved or am I missing something?
PersistUserEventHandler (updates state of app)
You might be mistaking Read Models for a homogeneous whole that accurately represents the current state of an application, i.e. a second source of absolute truth besides the event log.
I tend to see them more as a bunch of partial, opinionated parcels of state that may not all be updated at the same time and may reflect different truths.
I don't recommend taking read models as a source of data in another context than the use case they were designed for. In your example, SendWelcomeEmail should probably not rely on the User read model but only on the data contained in the UserCreated event.
Now you can share code between read model projectors and other types of event handlers to avoid duplication, but sharing data seems risky.
If users have random UUID then it should not be a problem. If a user arrive at an url and the readmodel is not up to date then you could show a "loading in progress,please wait" message.
If you really want to know if the user really exists - for example you want to see the difference between "user does not exists" and "read model is not sunchronized yet" then you could send a special command that don't generate any events (or just test a command if your command dispatcher supports dry running of commands) and throw exception if user does not exist.
I'm building a service using the familiar event sourcing pattern:
A request is received.
The aggregate's history is loaded.
The aggregate is rebuilt (from its history).
New events are prepared and the aggregate is updated in response to the incoming request from Step 1.
These events are written to the log, and are made available (published) to any subscribers.
In my case, Step 5 is accomplished in two parts. The events are written to the event log. A background process reads from the event log and publishes all events starting from an offset.
In some cases, I need to publish side effects in addition to events related to the aggregate. As far as the system is concerned, these are events too because they are consumed by and affect the state of other services. However, they don't affect the history of the aggregate in this service and are not needed to rebuild it.
How should I handle these in the code?
Option 1-
Don't write side-effecting events to the event log. Publish these in the main process prior to Step 5.
Option 2-
Write everything to the event log and ignore side-effecting events when the history is loaded. (These aren't part of the history!)
Option 3-
Write side-effecting events to a dummy aggregate so they are published, but never loaded.
Option 4-
?
In the first option, there may be trouble if there is a concurrency violation. If the write fails in Step 5, the side effect cannot be easily rolled back. The second option write events that are not part of the aggregate's history. When loading in Step 2, these side-effecting events would have to be ignored. The 3rd option feels like a hack.
Which of these seems right to you?
Name events correctly
Events are "things that happened". So if you are able to name the events that only trigger side effects in a "X happened" fashion, they become a natural part of the event history.
In my experience, this is always possible, because side-effects don't happen out of thin air. Sometimes the name becomes a bit artificial, but it is still better to name events that way than to call them e.g. "send email to that client event".
In terms of your list of alternatives, this would be option 2.
Example
Instead of calling an event "send status email to customer event", call it "status email triggered event". Of course, if there is a better name for the actual trigger, use that one :-)
Option 4 - Have some other service subscribe to the events and produce the side effects, and any additional events related to them.
Events should be fine-grained.
Option 1- Don't write side-effecting events to the event log. Publish
these in the main process prior to Step 5.
What if you later need this part of the history by building a new bounded context?
Option 2- Write everything to the event log and ignore side-effecting
events when the history is loaded. (These aren't part of the history!)
How to ignore the effect of something which does not have any effect? :D
Option 3- Write side-effecting events to a dummy aggregate so they are
published, but never loaded.
Why do you need consistency boundary around something which you will never change?
What you are talking about is the most common form of domain events, which you use to communicate with other BC-s. Ofc. you need to save them.
Example: Business rules states that the customer should get a confirmation message (email or similar) when an order has been placed.
Lets say that a NewOrderRegisteredEvent is dispatched from the domain and is picked up by an event listener that sends of the confirmation message. When that is done some other event handler throws an exception or something else goes wrong and the unit of work is rolled back. We've now sent the user a confirmation message for something that was rolled back.
What is the "cqrs" way of solving problems like this where you want to do something after a unit of work has been committed? Another complicating factor is replaying of events. I don't want old confirmation messages to be re-sent whenever I replay recorded events in order to build a new view / projection.
My best theory so far: I've just started to look into the fascinating world of cqrs and was wondering whether this is something that would be implemented as a saga? If a saga is like a state machine where each transition only can take place a single time then I guess that would solve this problem? I just have a hard time visualizing how this will fit together with the command bus and domain events..
An Event should only occur after the transaction has been completed. If anything goes wrong and there's a rollback, then the event didn't occur from an external point of view. Therefore it shouldn't be published at all. Though an OrderRegistrationFailed event could be published if necessary.
You wouldn't want the mail to be sent unless the command has sucessfully been executed.
First a few reasons why the command handler -- as proposed in another answer -- would be the wrong place: Under some circumstances the command handler wouldn't be able to tell if the command will eventually succeed or not. Having the command handler invoke the mail sending would also put process knowledge inside the command handler, which would break the SRM and too tightly couple business rules with the application layer.
The mail should be sent after the fact, i.e. from an event handler.
To prevent this handler from firing during replay, you can just not register it. This works similar to how you test your application. You only register the handlers that you actually need.
Production system -> register all event handlers
Tests -> register only the tested event handlers
Replay -> register only the projection/denormalization handlers
Another - even more loosely coupled, though a bit more complex - possibility would be to have a Saga handle the NewOrderRegisteredEvent and issue a SendMail command to the appropriate bounded context (thanks, Yves Reynhout, for pointing this out in the question's comments).
There are two likely solutions
1) The publishing of the event and the handling of the event (i.e. the email) are part of a single transaction. In this case, your transaction framework takes care of it for you. If the email fails, then the event is rolled back. You'll likely retry the command. This is conceptually clean and easy to think about. No event is finished publishing until everyone that has something to say about it has had their say. However practically speaking, this can be painful, as it typically involves distributed transactions. These are hard to come by. Can your email client enroll in the same transaction as the database which is holding your events?
2) The publishing of the event is transactional, but the event handlers each deal with transactions in their own way. The event handler which sends emails could keep track of which events it had seen. If it crashed, it would request old events and process them. You could make a business decision as to how big a deal it would be if people had missing or duplicate emails. (For money-related transactions, the answer is probably you shouldn't allow it.)
Solution (2) is typically what you see promoted in DDD/CQRS circles as it's the more loosely coupled solution. Solution (1) is quite practical in a small system where the event store and the projections are in a single database and the projections don't change often. Solution (2) allows a diversity of event handlers to work in their own way. Solution (1) can cause lots of non-overlapping concerns to become entagled. In this case your order business rules don't complete until the many bizarre things that happen in emailing are taken care of. For one thing, it may slow you down quite a bit.
If the sending of the email were more interesting than "saw the event, sent the email", then you're right, you might have a saga or workflow on your hands. Email in large operations is often a complex system in its own right which you're unlikely to have to implement much of. You just need to be sure you put your email into a request queue of some sort (using approach (2)), and the email system is likely to do retries/batching/spam avoidance/working overnight/etc.
I've been looking at CQRS but I find it restricting when it comes to showing the result of commands in lets say a Web Application.
It seems to me that using CQRS, one is forced to refresh the whole view or parts of it to see the changes (using a second request) because the original command request will only store an event which is to be processed in future.
In a Web Application, is it possible that a Command request could carry the result of the event it creates back to the browser?
The answer to the headline of this question is quite simple: nothing, void or from a webbrower/rest point of view 200 OK with an empty body.
Commands applied to the system (if the change is successfully committed) does not yield a result. And in the case that you wish to leave the business logic on the server side, yes you do need to refresh the data by executing yet another request (query) to the server.
However most often you can get rid of the 2nd roundtrip to the server. Take a table where you modify a row and press a save button. Do you really need to update the table? Or in the case a user submits a comment on a blog post just append the comment to the other comments in the dom without the round trip.
If you find yourself wanting the modified state returned from the server you need to think hard about what you are trying to achieve. Most scenarios can be changed so that a simple 200 OK is more than enough.
Update: Regarding your question about queuing incoming commands. It's not recommended that incoming commands are queued since this can return false positives (a command was successfully received and queued but when the command tries to modify the state of the system it fails). There is one exception to the rule and that is if you are having a system with an append only model as state. Then is safe to queue the mutation of the system state till later if the command is valid.
Udi Dahans article called Clarified CQRS is always a good read on this topic http://www.udidahan.com/2009/12/09/clarified-cqrs/
Async commands are a strange thing to do in CQRS considering that commands can be accepter or rejected.
I wrote about it, mentioning the debate between Udi Dahan's vision and Greg Young's vision on my blog: https://www.sunnyatticsoftware.com/blog/asynchronous-commands-are-dangerous
Answering your question, if you strive to design the domain objects (aggregates?) in a transactional way, where every command initiates a transaction that ends in zero, one or more events (independently on whether there are some process managers later on, picking one event and initiating another transaction), then I see no reason to have an empty command result. It's extremely useful for the external actor that initates the use case, to receive a command result indicating things like whether the command was accepted or not, which events did it produce, or which specific state has now the domain (e.g: aggregate version).
When you design a system in CQRS with asynchronous commands, it's a fallacy to expect that the command will succeed and that there will be a quick state change that you'll be notified about.
Sometimes the domain needs to communicate with external services (domain services?) in an asynchronous way depending on those services api. That does not mean that the domain cannot produce meaningful domain events informing of what's going on and which changes have occured in the domain in a synchronous way. For example, the following flow makes a lot of sense:
Actor sends a sync command PurchaseBasket
Domain uses an external service to MakePayment and knows that the payment is being processed
Domain produces the events BasketPurchaseAttempted and/or PaymentRequested or similar
Still, synchronously, the command returns the result 200 Ok with a payload indicating some information about what has happened. Even if the payment hasn't completed because the payment platform is asynchronous, at least the actor has a meaningful knowledge about the result of the transaction it initiated.
Compare this design with an asynchronous one
Actor sends an async command PurchaseBasket
The system returns a 202 Accepted with a transaction Id indicating "thanks for your interest, we'll call you, this is the ticket number")
In a separate process, the domain initiates a process manager or similar with the payment platform, and when the process completes (if it completes, assuming the command is accepted and there are no business rules that forbid the purchase basket), then the system can start the notifying process to the actor.
Think about how to test both scenarios. Think about how to design UX to accommodate this. What would you show in the second scenario in the UI? Would you assume the command was accepted? Would you display the transaction Id with a thank you message and "please wait"? Would you take a big faith leap and keep the user waiting with a loading screen waiting for the async process to finish and be notified with a web socket or polling strategy for XXX seconds?
Async commands in CQRS are a dangerous thing and make us lazy domain designers.
UPDATE: the accepted answer suggest not to return anything and I fully disagree. Checkout Eventuous library and you'll see that returning a result is extremely helpful.
Also, if an async command can't be rejected it's... because it's not really a command but a fact.
UPDATE: I am surprised my answer got negative votes. Especially because Greg Young, the creator of CQRS term, says literally in his book about CQRS
One important aspect of Commands is that they are always in the imperative tense; that is they are
telling the Application Server to do something. The linguistics with Commands are important. A situation
could for with a disconnected client where something has already happened such as a sale and could
want to send up a “SaleOccurred” Command object. When analyzing this, is the domain allowed to say
no that this thing did not happen? Placing Commands in the imperative tense linguistically shows that
the Application Server is allowed to reject the Command, if it were not allowed to, it would be an Event
for more information on this see “Events”.
While I understand certain authors are biased towards the solutions they sell, I'd go to the main source of info in CQRS, regardless of how many hundred of implementations are there returning void when they can return something to inform requester asap. It's just an implementation detail, but it'll help model better the solution to think that way.
Greg Young, again, the guy who coined the CQRS term, also says
CQRS and Event Sourcing describe something inside a single system or component.
The communication between different components/bounded contexts (which ideally should be event driven and asynchronous, although that's not a requirement either) is outside the scope of CQRS.
PS: ignoring an event is not the same as rejecting a command. Rejection implies a direct answer to the command sender. Something "difficult" if you return nothing to the sender (not even a correlation ID?)
Source:
https://gregfyoung.wordpress.com/tag/cqrs/
https://cqrs.files.wordpress.com/2010/11/cqrs_documents.pdf