Cirqus: "Expected an event with sequence number 0." exception - cirqus

Every once in a while I receive exceptions in Cirqus when trying to process commands. It happens with different types of commands, however it always happens with this specific aggregate root type (let's say its a registration form). We haven't deleted events nor messed with the Events table in any way, so I'm wondering what else can cause the issue.
The exact (but anonymized) error message is: Tried to apply event with sequence number 12 to aggregate root of type RegistrationForm with ID d863ac79-6bc0-480d-9d83-30b7696e7ea1 with current sequence number -1. Expected an event with sequence number 0.
So for example to debug the latest instance of the exception I queried the database for this aggregate id and got 37 events in return. I then checked the sequences and the sequences seemed correct. I also checked that the global sequences were at least also chronologically correct. Then I checked to see if the "meta" column had a different global sequence than the record, but that also checked out OK.
What I find most confusing is that other registration forms are able to go through. Looking at our logs there's no pattern I can identify, and also it only happens about 3-5% of the time.
I guess what I'm wondering is: what can cause this issue? how can I debug it? how can I prevent it from happening in the future?
System specifics: We're running under .net 4.5, using Cirqus 0.63.12 (and then also tested on 0.66.4), using Postgres 9.4 as the database (and using v0.63.12 of the Cirqus.Postgres package).

I found the issue! It seems that the PostgreSQL event source's SQL code was missing an Order By clause and in some cases my events were being returned out of order. I submitted this pull request as a proposed fix to the problem: https://github.com/d60/Cirqus/pull/75

Related

Can I track unexpected lack of changes using change feeds, cosmos db and azure functions?

I am trying to understand change feeds in Azure. I see I can trigger an event when something changes in cosmos db. This is useful. However, in some situations, I expect a document to be changed after a while. A question should have a status change that it has been answered. After a while an order should have a status change "confirmed" and a problem should have status change "resolved" or should a have priority change (to "low"). It is useful to trigger an event when such a change is happening for a certain document. However, it is even more useful to trigger an event when such a change after a (specified) while (like 1 hour) does not happen. A problem needs to be resolved after a while, an order needs to be confirmed after while etc. Can I use change feeds and azure functions for that too? Or do I need something different? It is great that I can visualize changes (for example in power BI) once they happen after a while but I am also interested in visualizing changes that do not occur after a while when they are expected to occur.
Achieving that with Change Feed doesn't sound possible, because as you describe it, Change Feed is reacting based on operations/events that happen.
In your case it sounds as if you needed an agent that needs to be running every X amount of time (maybe an Azure Functions with a TimerTrigger?) and executes a query to find items with X state that have not been modified in the past Y pre-defined interval (possibly the time interval associated with the TimerTrigger). This could be done by checking the _ts field of the state documents or your own timestamp field, see https://stackoverflow.com/a/39214165/5641598.
If your goal is to just deploy it on a dashboard, you could query using Power BI too.
As long as you don't need too much time precision (the Change Feed notifications are usually delayed by a few seconds) for this task, the Azure CosmosDB Change Feed could be easily used as a solution, but it would require some extra work from the Microsoft team to also support capturing deletion TTL expiration events.
A potential solution, if the Change Feed were to capture such TTL expiration events, would be: whenever you insert (or in your use case: change priority of) a document for which you want to monitor lack of changes, you also insert another document (possibly in another collection) that acts as a timer, specifying a TTL of 1h.
You would delete the timer document manually or by consuming the Change Feed for changes, in case a change actually happened.
You could also easily consume from the Change Feed the TTL expiration event and assert that if the TTL expired then there were no changes in the specified time window.
If you'd like this feature, you should consider voting issues such as this one: https://github.com/Azure/azure-cosmos-dotnet-v2/issues/402 and feature requests such as this one: https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/14603412-execute-a-procedure-when-ttl-expires, which would make the Change Feed a perfect fit for scenarios such as yours. Sadly it is not available yet :(
TL;DR No, the Change Feed as it stands would not be a right fit for your use case. It would need some extra functionalities that are planned but not implemented yet.
PS. In case you'd like to know more about the Change Feed and its main use cases anyways, you can check out this article of mine :)

How to avoid concurrency on aggregates status using Rebus in a server cluster

I have a web service that use Rebus as Service Bus.
Rebus is configured as explained in this post.
The web service is load balanced with a two servers cluster.
These services are for a production environment and each production machine sends commands to save the produced quantities and/or to update its state.
In the BL I've modelled an Aggregate Root for each machine and it executes the commands emitted by the real machine. To preserve the correct status, the Aggregate needs to receive the commands in the same sequence as they were emitted, and, since there is no concurrency for that machine, that is the same order they are saved on the bus.
E.G.: the machine XX sends a command of 'add new piece done' and then the command 'Set stop for maintenance'. Executing these commands in a sequence you should have Aggregate XX in state 'Stop', but, with multiple server/worker roles, you could have that both commands are executed at the same time on the same version of Aggregate. This means that, depending on who saves the aggregate first, I can have Aggregate XX with state 'Stop' or 'Producing pieces' ... that is not the same thing.
I've introduced a Service Bus to add scale out as the number of machine scales and resilience (if a server fails I have only slowdown in processing commands).
Actually I'm using the name of the aggregate like a "topic" or "destinationAddress" with the IAdvancedApi, so the name of the aggregate is saved into the recipient of the transport. Then I've created a custom Transport class that:
1. does not remove the messages in progress but sets them in state
InProgress.
2. to retrive the messages selects only those that are in a recipient that have no one InProgress.
I'm wandering: is this the best way to guarantee that the bus executes the commands for aggregate in the same sequence as they arrived?
The solution would be have some kind of locking of your aggregate root, which needs to happen at the data store level.
E.g. by using optimistic locking (probably implemented with some kind of revision number or something like that), you would be sure that you would never accidentally overwrite another node's edits.
This would allow for your aggregate to either
a) accept the changes in either order (which is generally preferable – makes your system more tolerant), or
b) reject an invalid change
If the aggregate rejects the change, this could be implemented by throwing an exception. And then, in the Rebus handler that catches this exception, you can e.g. await bus.Defer(TimeSpan.FromSeconds(5), theMessage) which will cause it to be delivered again in five seconds.
You should never rely on message order in a service bus / queuing / messaging environment.
When you do find yourself in this position you may need to re-think your design. Firstly, a service bus is most certainly not an event store and attempting to use it like one is going to lead to pain and suffering :) --- not that you are attempting this but I thought I'd throw it in there.
As for your design, in order to manage this kind of state you may want to look at a process manager. If you are not generating those commands then even this will not help.
However, given your scenario it seems as though the calls are sequential but perhaps it is just your example. In any event, as mookid8000 said, you either want to:
discard invalid changes (with the appropriate feedback),
allow any order of messages as long as they are valid,
ignore out-of-sequence messages till later.
Hope that helps...
"exactly the same sequence as they were saved on the bus"
Just... why?
Would you rely on your HTTP server logs to know which command actually reached an aggregate first? No because it is totally unreliable, just like it is with at-least-one delivery guarantees and it's also irrelevant.
It is your event store and/or normal persistence state that should be the source of truth when it comes to knowing the sequence of events. The order of commands shouldn't really matter.
Assuming optimistic concurrency, if the aggregate is not allowed to transition from A to C then it should guard this invariant and when a TransitionToStateC command will hit it in the A state it will simply get rejected.
If on the other hand, A->C->B transitions are valid and that is the order received by your aggregate well that is what happened from the domain perspective. It really shouldn't matter which command was published first on the bus, just like it doesn't matter which user executed the command first from the UI.
"In my scenario the calls for a specific aggregate are absolutely
sequential and I must guarantee that are executed in the same order"
Why are you executing them asynchronously and potentially concurrently by publishing on a bus then? What you are basically saying is that calls are sequential and cannot be processed concurrently. That means everything should be synchronous because there is no potential benefit from parallelism.
Why:
executeAsync(command1)
executeAsync(command2)
executeAsync(command3)
When you want:
execute(command1)
execute(command2)
execute(command3)
You should have a single command message and the handler of this message executes multiple commands against the aggregate. Then again, in this case I'd just create a single operation on the aggregate that performs all the transitions.

Prevent certain optionset changes in CRM via plugin

Is it possible to have a plugin intervene when someone is editing an optionset?
I would have thought crm would prevent the removal of optionset values if there are entities that refer to them, but apparently this is not the case (there are a number of orphaned fields that refer to options that no longer exist). Is there a message/entity pair that I could use to check if there are entities using the value that is to be deleted/modified and stop it if there are?
Not sure if this is possible, but you could attempt to create a plugin on the Execute Method, and check the input parameters in the context to determine what the Request Type that is being processed is. Pretty sure you'll be wanting to look for either UpdateAttributeRequest for local OptionSets, or potentially UpdateOptionSetRequest for both. Then you could run additional logic to determine what values are changing, and ensuring the database values are correct.
The big caveat to this, is if you even have a moderate amount of data, I'm guessing you'll hit the 2 minute limit for plugin execution and it will fail.

How to skip sequence numbes in Cirqus

I deleted a few events from my event store but now I get exceptions like this:
System.ApplicationException: Tried to apply event with sequence number 180 to aggregate root with ID 55b43b9e-cd9a-4db9-9b86-78feb7043051 with current sequence number 15. Expected an event with sequence number 16.
How can I ignore exceptions like this?
You can't..... Cirqus is very very strict about its sequence numbers, which is to guarantee that aggregate roots are hydrated to a correct state and are impossible to hydrate into some state that they never actually were in.
This means that if you need to "delete" events, you cannot just remove the events from the event store.
I've tried once that some logic went awry and accidentally generated 50000 events while the system was running and the users kept working, which we then removed by replicating the events (using EventReplicator) to a new event store, decorating the destination IEventStore with a "rewriter", that would then ignore certain events and rewrite all sequence numbers as necessary.
It was pretty hard to get right though.
Do you really need to delete the events? Can't you e.g. append some corrective events to the problematic aggregate roots' event streams?

NCA R12 with LoadRunner 12.02 - nca_get_top_window returns NULL

Connection successfully established by nca_connect_server() but i am trying to capture current open window by using nca_get_top_window() but it returns NULL. Due to this all subsequent requests fail
It depends on how you obtained your script, whether it recorded or manually written.
If script is written manually there is guarantee that it could be replayed, since it may happen that sequence of API (or/and its parameters) is not valid. If script is recorded – there might be missed correlation or something like this, common way to spot the issue – is to compare recording and replaying behavior (by comparing log files related to these two stages, make sure you are using completely extended kind of log files) to find out what and why goes wrong on replay, and how it digress from recording activity.

Resources