What happen to an event failure in a state machine? - state-machine

We just start a new project. People who started the project was not aware that it is a state machine application. After having a look at those states, I am wondering what happen to an event failure. Take this online shopping application state machine sample, what if the deliver event or payment received event fails? Are abandon, failed, or retry part of states?

When an event doesn't occur, through failure or otherwise, nothing happens.
In order to detect & handle this, time outs can be introduced to create a state transition that can then in turn be a corrective action (retry, etc).

Related

Axon creating aggregate inside saga

I'm not sure how to properly ask this question but here it is:
I'm starting the saga on specific event, then im dispatching the command which is supposed to create some aggregate and then send another event which will be handled by the saga to proceed with the logic.
However each time i'm restarting the application i get an error saying that event for aggregate at sequence x was already inserted, which, i suppose is because the saga has not yet been finished and when im restarting it it starts it again by trying to create new aggregate.
Question is, is there any way in the axoniq to track progress of the saga? Like should i set some flags when i receive event and wrap in ifs the aggregate creation?
Maybe there is another way which i'm not seeing, i just dont want the saga to be replayed from the start.
Thanks
The solution you've posted definitely would work.
Let me explain the scenario you've hit here though, for other peoples reference too.
In an Axon Framework 4.x application, any Event Handling Component, thus also your Saga instances, are backed by a TrackingEventProcessor.
The Tracking Event Processor "keeps track of" which point in the Event Stream it is handling events. It stores this information through a TrackingToken, for which the TokenStore is the delegating piece of work.
If you haven't specified a TokenStore however, you will have in-memory TrackingTokens for every Tracking Event Processor.
This means that on a restart, your Tracking Event Processor thinks "ow, I haven't done any event handling yet, let me start from the beginning of time".
Due to this, your Saga instances will start a new, every time, trying to recreate the given Aggregate instance.
Henceforth, specifying the TokenStore as you did resolved the problem you had.
Note, that in a Spring Boor environment, with for example the Spring Data starter present, Axon will automatically create the JpaTokenStore for you.
I've solved my issue by simply adding token store configuration, it does exactly what i require - track processed events.
Basic spring config:
#Bean
fun tokenStore(client: MongoClient): TokenStore = MongoTokenStore.builder()
.mongoTemplate(DefaultMongoTemplate.builder().mongoDatabase(client).build())
.serializer(JacksonSerializer.builder().build())
.build()

DomainEventPublisher consistency

Having just read Vaughn Vernon's effective aggregate design, I'm wondering about failures related to event publishing.
In the given example at page 9 (page 3 of the PDF), we call DomainEventPublisher.publish(). The event being published allows other aggregates to execute their behaviours.
What I'm wondering is: What happens if DomainEventPublisher.publish() fails ? What happens if DomainEventPublisher.publish() succeeds, but the transaction fails ?
How implementations handle these two cases ?
DomainEventPublisher.publish() is synchronous. You'd setup a generic handler (handles all events) which stores the events in the same database transaction as the business process, which means your event storage must have the ability to be transactionnal with whatever other storage mechanism you rely on to store the state of your aggregates.
Once events have been written on disk transactionnaly, you can then put them on a message queue for asynchronous delivery.
Are there other known ways to do it?
Well, rather than using a static DomainEventPublisher you could record events in a collection on the AR, just like in event sourcing and then implement a centralised mechanism to store them (e.g. transaction hooks, using aspects, etc.).
What happens if DomainEventPublisher.publish() succeeds, but the
transaction fails?
In this case I am against Vernon approach. I prefer to return the events to the application service. This way I can persist the changes performed by the aggregate using a transaction (if needed) and, if everything is Ok, I will publish the event. This also helps to keep the business layer entirely clean and pure.
In a few words; if the transaction fails then no event is raised.
What happens if DomainEventPublisher.publish() fails?
A domain event never fails, by business rules, because it's a notification of things that happened. If an aggregate said Yes to the operation and return a event expressing the business changes; then nothing in the world should say that this operation can not be done or has to be undone.
If the event fails by infrastructure then you need to have the tools to re-raise it (automatically or manually) when the outage is fixed and eventually archive the consistency in your system. Take a look at NServiceBus. It provides retries, error queues, logs and so on to never loose the events.
If the message system is down you have at least event logs that you can use to re-rise them into the message system.

Asana API Sync Error

I currently have a application running that passes data between Asana and Zendesk.
I have webhooks created for all my Project in Asana and all project events are sent to my webhook end point that verifies the request and tries to identify the event and update Zendesk with relevant data depending on the event type (Some events aren't required).
However I have been receiving the following request from the Webhooks just recently:
"events": [
{
"action": "sync_error",
"message": "There was an error with the event queue, which may have resulted in missed events. If you are keeping resources in sync, you may need to manually re-fetch them.",
"created_at": "2017-05-23T16:29:13.994Z"
}
]
Now because I don't poll the API for event updates I react when the events arrive with me, I haven't considered using a Sync key, the docs suggest this is only required when polling for events. Do I need to use one when using Webhooks also?
What am I missing?
Thanks in advance for any suggestions.
You're correct, you don't need to track a sync key for webhooks - we proactively try to reach out with them when something changes in Asana, and we track the events that haven't yet been delivered across webhooks (essentially, akin to us updating the sync key server-side whenever webhooks have been successfully delivered).
Basically what's happening here is that for some reason, our event queues detect that there's a problem with their internal state. This means that events didn't get recorded, or webhooks didn't get delivered after a long time. Our events and webhooks try to track changes in a best-effort sense, and there are some things that can happen with our production machines that can cause these sorts of issues, like a machine dying at an inopportune time.
Unfortunately, then, the only way to get back to a good state is to do a full scan of the projects you're tracking, which is what is meant by you may need to manually re-fetch them. Basically, a robust implementation of syncing Asana to external resources looks like:
A diff function that, given a particular task and external resource, detects what state is out of date or different between each resource and choose a merge/patch resolution (i.e. "Make Zendesk look like Asana")
Receiving a webhook runs that diff/patch process for that one task in a "live" fashion.
Periodically (on script startup, say, or when webhooks/events are missed and you get an error message like this) update all resources that might have been missed by scanning the entire project and do the diff/patch for every task. This is more expensive, but should be significantly more rare.

WF state machine workflow service and exceptions in triggers

I have a state machine working as a workflow service, having receive/send-reply activities as triggers for transitions.
Before send replies back, I have to do some work.
Problems come when exceptions happen in the process before sending the reply. In such case, if I don't handle the exception, the whole workflow is suspended; anyway, I shouldn't move to the next state if the requests wasn't properly handled.
Would it be enough to wrap the whole state machine with a Try/catch? Will the state machine recover from the last persisted state (I'm using Sql persistence)?
Are there other solutions?
Remark: workflows are hosted in IIS.
Thanks

Domain driven design and domain events

I'm new to DDD and I'm reading articles now to get more information. One of the articles focuses on domain events (DE). For example sending email is a domain event raised after some criteria is met while executing piece of code.
Code example shows one way of handling domain events and is followed by this paragraph
Please be aware that the above code will be run on the same thread within the same transaction as the regular domain work so you should avoid performing any blocking activities, like using SMTP or web services. Instead, prefer using one-way messaging to communicate to something else which does those blocking activities.
My questions are
Is this a general problem in handling DE? Or it is just concern of the solution in mentioned article?
If domain events are raised in transaction and the system will not handle them synchronously, how should they be handled?
When I decide to serialize these events and let scheduler (or any other mechanism) execute them, what happens when transaction is rolled back? (in the article event is raised in code executed in transaction) who will cancel them (when they are not persisted to database)?
Thanks
It's a general problem period never mind DDD
In general, in any system which is required to respond in a performant manner (e.g. a Web Server, any long running activities should be handled asynchronously to the triggering process.
This means queue.
Rolling back your transaction should remove item from the queue.
Of course, you now need additional mechanisms to handle the situation where the item on the queue fails to process - i.e the email isn't sent - you also need to allow for this in your triggering code - having a subsequent process RELY on the earlier process having already occurred is going to cause issues at some point.
In short, your queueing mechanism should itself be transactional and allow for retries and you need to think about the whole chain of events as a workflow.

Resources