How to persist Saga instances using storage engines and avoid race condition - state-machine

I tried persisting Saga Instances using RedisSagaRepository; I wanted to run Saga in load balancing setup, so I cannot use InMemorySagaRepository.
However, after I switched, I noticed that some of the events published by Consumers were not getting processed by Saga. I checked the queue and did not see any messages.
What I noticed is it will likely occurs when the Consumer took little to no time to process command and publish event.
This issue will not occur if I use InMemorySagaRepository or add Task.Delay() in Consumer.Consume()
Am I using it incorrectly?
Also, If I want to run Saga in load balancing setup, and if the Saga needs to send multiple commands of the same type using dictionary to track completeness (similar logic as in Handling transition to state for multiple events). When multiple Consumer publish events at the same time, would I have race condition if two Sagas are process two different events at the same time? In this case, would the Dictionary in State object will be set correctly?
The code is available here
SagaService.ConfigureSagaEndPoint() is where I switch between InMemorySagaRepository and RedisSagaRepository
private void ConfigureSagaEndPoint(IRabbitMqReceiveEndpointConfigurator endpointConfigurator)
var stateMachine = new MySagaStateMachine();
var redisConnectionString = "";
var redis = ConnectionMultiplexer.Connect(redisConnectionString);
///If we switch to RedisSagaRepository and Consumer publish its response too quick,
///It seems like the consumer published event reached Saga instance before the state is updated
///When it happened, Saga will not process the response event because it is not in the "Processing" state
//var repository = new RedisSagaRepository<SagaState>(() => redis.GetDatabase());
var repository = new InMemorySagaRepository<SagaState>();
endpointConfigurator.StateMachineSaga(stateMachine, repository);
catch (Exception ex)
LeafConsumer.Consume is where we add the Task.Delay()
public class LeafConsumer : IConsumer<IConsumerRequest>
public async Task Consume(ConsumeContext<IConsumerRequest> context)
///If MySaga project is using RedisSagaRepository, uncomment await Task.Delay() below
///Otherwise, it seems that the Publish message from Consumer will not be processed
///If using InMemorySagaRepository, code will work without needing Task.Delay
///Maybe I am doing something wrong here with these projects
///Or in real life, we probably have code in Consumer that will take a few milliseconds to complete
///However, we cannot predict latency between Saga and Redis
//await Task.Delay(1000);
Console.WriteLine($"Consuming CorrelationId = {context.Message.CorrelationId}");
await context.Publish<IConsumerProcessed>(new

When you have events published in this manner, and are using multiple service instances with a non-transactional saga repository (such as Redis), you need to design your saga such that a unique identifier is used and enforced by Redis. This prevents multiple instances of the same saga from being created.
You also need to accept the events in more than the "expected" state. For instance, expecting to receive a Start, which puts the saga into a processing state, before receiving another event only in processing, is likely to fail. Allowing the saga to be started (Initially, in Automatonymous) by any of the sequence of events is recommended, to avoid out-of-order message delivery issues. As long as the events all move the dial from the left to the right, the eventual state will be reached. If an earlier event is received after a later event, it shouldn't move the state backwards (or to the left, in this example) but only add information to the saga instance and leave it at the later state.
If two events are processed on separate service instances, they'll both try to insert the saga instance to Redis, which will fail as a duplicate. The message should then retry (add UseMessageRetry() to your receive endpoint), which would then pick up the now existing saga instance and apply the event.


How to avoid memory leak when using pub sub to call function?

I stuck on performance issue when using pubsub to triggers the function.
//this will call on index.ts
export function downloadService() {
// References an existing subscription
const subscription = pubsub.subscription("DOWNLOAD-sub");
// Create an event handler to handle messages
// let messageCount = 0;
const messageHandler = async (message : any) => {
console.log(`Received message ${}:`);
console.log(`\tData: ${}`);
console.log(`\tAttributes: ${message.attributes.type}`);
// "Ack" (acknowledge receipt of) the message
await exportExcel(message);//my function
// messageCount += 1;
// Listen for new messages until timeout is hit
subscription.on("message", messageHandler);
async function exportExcel(message : any) {
//get data from database
const movies = await Sales.findAll({
attributes: [
raw: true,
... processing to excel// 800k rows
... bucket.upload to gcs
The function above is working fine if I trigger ONLY one pubsub message.
However, the function will hit memory leak issue or database connection timeout issue if I trigger many pubsub message in short period of time.
The problem I found is, first processing havent finish yet but others request from pubsub will straight to call function again and process at the same time.
I have no idea how to resolve this but I was thinking implement the queue worker or google cloud task will solve the problem?
As mentioned by #chovy in the comments, there is a need to queue up the excelExport function calls since the function's execution is not keeping up with the rate of invocation. One of the modules that can be used to queue function calls is async. Please note that the async module is not officially supported by Google.
As an alternative, you can employ flow control features on the subscriber side. Data pipelines often receive sporadic spikes in published traffic which can overwhelm subscribers in an effort to catch up. The usual response to high published throughput on a subscription would be to dynamically autoscale subscriber resources to consume more messages. However, this can incur unwanted costs — for instance, you may need to use more VM’s — which can lead to additional capacity planning. Flow control features on the subscriber side can help control the unhealthy behavior of these tasks on the pipeline by allowing the subscriber to regulate the rate at which messages are ingested. Please refer to this blog for more information on flow control features.

IMediatr with Autofac in Domain Objects DDD

I have set my Domain Model objects to be independent of any service and infrastructure logic.
I am also using Domain Events to react to some changes in Domain Models.
Now my problem is how to raise those events from the Domain Model objects itself.
Currently I am using Udi Dahan's DomainEvents static class for this (I need evens to be handled exactly when they happen and not at a latter time).
The events are used for many things, like logging, updating the data in related services and other Domain Model objects and db, publishing messages to the MassTransit bus etc.
The DomainEvents static class uses Autofac scope that I inject at some point in it, to find the IMediatr instance and to publish the events, like this:
public static class DomainEvents
private static ILifetimeScope Scope;
public async static Task RaiseAsync<TDomainEvent>(TDomainEvent #event) where TDomainEvent : IDomainEvent
var mediator = Scope?.Resolve<IMediatorBus>();
if (mediator != null)
await mediator!.Publish(#event).ConfigureAwait(false);
Debug.WriteLine("Mediator not set for DomainEvents!");
public static void SetScope(ILifetimeScope scope)
Scope = scope;
This all works ok in a single-threaded environment, but the method DomainEvents.SetScope() is a possible racing problem in multhi-threaded environment.
Ie. When I introduce MassTransit and create message consumers, each Message consumer will set the current LifetimeScope to DomainEvents by that method, and here is the problem, each consumer will overwrite the lifetime scope with the new one.
Why I use DomainEvents static class? Because I don't want to pollute my Domain Model Objects with infrastructure stuff.
I thought about making DomainEvents non static (define an interface), but then I need them injected in every Domain Model Object and I'm still thinking about this, but maybe there is a better way.
I want to know if there is a better way to handle this?
Maybe some change in DomainEvents class? Or maybe remove the DomainEvents static class end use an interface or DomainService to do this.
The problem is I don't like static classes, but I also don't like pushing non domain-specific dependencies into Domain Model Objects.
Please help.
To better clarify the process and for what I use DomainEvents...
I have a long-running process that can take from few minutes to few hours/days to complete.
So the process is going like this:
I receive an message from MassTransit ie ProcessStartMessage(processId)
Get the ProcessData for (processId) from Db.
Construct an in-memory Domain Model ProcessTracker (singleton) and put all the data I loaded from DB in it. (in-memory cache)
I receive another message from Masstransit ie. ProcessStatusChanged(processId, data).
Forward this message data to in-memory singleton ProcessTracker to process.
ProcessTracker process the data.
For ProcessTracker to be able to process this data it instantiates many Domain Model Objects, each responsible to process some part of the data. (Note there is NO more db calls and entity hydration from db, it all happens in memory, also Domain Model is not mapped to any entity, it is not connected to any db object).
At some point I need to log what a Domain Model object in the chain has done, has it work finished or started, has reached some milestone etc. This is done by raising DomainEvents. I also need to notify the GUI of those events, so they are used to send Masstransit messages too.
Ie.(pseudo code):
public class ProcessTracker
private Step _currentStep;
public void ProcessData(data)
DomainEvents.Raise(new ProcesTrackerDataProcessed());
public class Step
public Phase _currentPhase;
public void ProcessData(data)
if (data.IsManual && _someOtherCondition())
DomainEvents.Raise(new StepDataEvent1());
DomainEvents.Raise(new TransitionToNewPhase(this, data));
DomainEvents.Raise(new StepDataProcessed(this, data));
About db updates, those are not transactional and not important to the process and the Domain Model Object state is kept only in memory, if the process crash the process MUST begin from the start (there is NO recovery).
To end the process:
I receive ProcessEnd from the MassTransit
The message data is forwarded to the ProcessTracker
ProcessTracker handles the data an nets a result of the proceess
The result of the process is saved to db
A message is sent to other parties in the process that notifies them of a process completion.
Ask yourself first what are you going to do when you raise an event from your domain model?
Normally it works like this:
Get a command
Load a domain object from a repository
Execute behaviour
(here probably) Raise an event
Persist the new domain object state
So, where your extra domain event handlers would fit? Are you going to execute some other database calls, send an email? Remember that it all happens now, when you haven't even persisted the changed state of your domain object. What if your persistence fails? It will happen after you executed all the domain handlers.
You should not execute more than one transaction when you handle a single command. The Aggregate pattern clearly tells you that the aggregate is the transaction boundary. You should raise domain events after you complete the transaction, or within the same technical transaction, but it should only persist the aggregate state and the event. Domain event reactions potentially trigger transactions for other domain objects, and it should be done outside of the scope of handling the current command.
The issue is not at all technical, it is a design problem.
If you use MassTransit, you can only make it (relatively) reliable if you handle the command in a message consumer. Then, you can use in-memory outbox, which will not send an event unless the consumer succeeded. It is still not guaranteed that the event will be published in case of the broker failure.
Unless you go to Event Sourcing, you have two 100% reliable options:
Use a transactional outbox pattern (NServiceBus has one and it's quite complex). It has limitations on what type of database you use.
Store the event to the same database as the domain object, in a different table, within the same transaction. Poll the table with DELETE INTO and emit events to the broker from there.

Azure WebJobs getting initialized randomly

We have webjobs consisting of several methods in a single Functions.cs file. They have servicebus triggers on topic/queues. Hence, keep listening to topic/queue for brokeredMessage. As soon as the message arrives, we have a processing logic that does lot of stuff. But, we find sometimes, all the webjobs get reinitialized suddenly. I found few articles on the website which says webjobs do get initialized and it is usual.
But, not sure if that is the only way and can we prevent it from getting reinitialized as we call brokeredMessage.Complete as soon we get brokeredMessage since we do not want it to be keep processing again and again?
Also, we have few webjobs in one app service and few webjobs in other app service. And, we find all of the webjobs from both the app service get re initialized at the same time. Not sure, why?
You should design your process to be able to deal with occasional disconnects and failures, since this is a "feature" or applications living in the cloud.
Use a transaction to manage the critical area of your code.
Pseudo/commented code below, and a link to the Microsoft documentation is here.
var msg = receiver.Receive();
using (scope = new TransactionScope())
// Do whatever work is required
// Starting with computation and business logic.
// Finishing with any persistence or new message generation,
// giving your application the best change of success.
// Keep in mind that all BrokeredMessage operations are enrolled in
// the transaction. They will all succeed or fail.
// If you have multiple data stores to update, you can use brokered messages
// to send new individual messages to do the operation on each store,
// giving eventual consistency.
msg.Complete(); // mark the message as done
scope.Complete(); // declare the transaction done

Setup webjob ServiceBusTriggers or queue names at runtime (without hard-coded attributes)?

Is there any way to configure triggers without attributes? I cannot know the queue names ahead of time.
Let me explain my scenario here.. I have one service bus queue, and for various reasons (complicated duplicate-suppression business logic), the queue messages have to be processed one at a time, so I have ServiceBusConfiguration.OnMessageOptions.MaxConcurrentCalls set to 1. So processing a message holds up the whole queue until it is finished. Needless to say, this is suboptimal.
This 'one at a time' policy isn't so simple. The messages could be processed in parallel, they just have to be divided into groups (based on a field in message), say A and B. Group A can process its messages one at a time, and group B can process its own one at a time, etc. A and B are processed in parallel, all is good.
So I can create a queue for each group, A, B, C, ... etc. There are about 50 groups, so 50 queues.
I can create a queue for each, but how to make this work with the Azure Webjobs SDK? I don't want to copy-paste a method for each queue with a different ServiceBusTrigger for the SDK to discover, just to enforce one-at-a-time per queue/group, then update the code with another copy-paste whenever another group is needed. Fetching a list of queues at startup and tying to the function is preferable.
I have looked around and I don't see any way to do what I want. The ITypeLocator interface is pretty hard-set to look for attributes. I could probably abuse the INameResolver, but it seems like I'd still have to have a bunch of near-duplicate methods around. Could I somehow create what the SDK is looking for at startup/runtime?
(To be clear, I know how to use INameResolver to get queue name as at How to set Azure WebJob queue name at runtime? but though similar this isn't my problem. I want to setup triggers for multiple queues at startup for the same function to get the one-at-a-time per queue processing, without using the trigger attribute 50 times repeatedly. I figured I'd ask again since the SDK repo is fairly active and it's been a year..).
Or am I going about this all wrong? Being dumb? Missing something? Any advice on this dilemma would be welcome.
The Azure Webjob Host discovers and indexes the functions with the ServiceBusTrigger attribute when it starts. So there is no way to set up the queues to trigger at the runtime.
The simpler solution for you is to create a long time running job and implement it manually:
public class Program
private static void Main()
var host = new JobHost();
public static async Task Process(TextWriter log, CancellationToken token)
var connectionString = "myconnectionstring";
// You can also get the queue name from app settings or azure table ??
var queueNames = new[] {"queueA", "queueA" };
var messagingFactory = MessagingFactory.CreateFromConnectionString(connectionString);
foreach (var queueName in queueNames)
var receiver = messagingFactory.CreateMessageReceiver(queueName);
receiver.OnMessage(message =>
// do something
// Complete the message
catch (Exception ex)
// Log the error
// Abandon the message so that it can be retry.
}, new OnMessageOptions() { MaxConcurrentCalls = 1});
// await until the job stop or restart
await Task.Delay(Timeout.InfiniteTimeSpan, token);
Otherwise, if you don't want to deal with multiple queues, you can have a look at azure servicebus topic/subscription and create SqlFilter to send your message to the right subscription.
Another option could be to create your own trigger: The azure webjob SDK provides extensibility points to create your own trigger binding :
Binding Extensions Overview
Good Luck !
Based on my understanding, your needs seems to be building a message batch system in parallel. The #Thomas solution is good, but I think Azure Batch service with Table storage may be better and could be instead of the complex solution of ServiceBus queue + WebJobs with a trigger.
Using Azure Batch with Table storage, you can control the task creation and execute the task in parallel and at scale, even monitor these tasks, please refer to the tutorial to know how to.

Weird behaviour with Task Parallel Library Framework and Azure Instances

I need some help solving a problem involving the Task Parallel Library with Azure instances. Below is code for my Worker Role.
Whenever I upload multiple files, a request is inserted into the queue and the worker process continously process queries Queues and gets the message. Once a message is retrieved, I do some long runnning process. I used task schedulder so that mutliple request are served by multiple task instance on multiple instances.
Now the uestion is if one instance take a message from a queue and assigns the message to a task and it process, now i see another instance also retrieves the same message from Queue and process it. Because of that my tasks are executed multiple times.
Please help me on this problem. My requirement is only one Azure instance of one Ccre handles one task operation not by mutliple by task.
public override void Run()
//Step1 : Get the message from Queue
//Step 2:
Task<string>.Factory.StartNew(() =>
//Message delete from Queue
PopulateBlobtoTable(uri, localStoragePath);
catch (Exception ex)
return "Finished!";
catch (AggregateException ae)
foreach (var exception in ae.InnerExceptions)
I'm assuming you are using Windows Azure Storage queues, which have a default invisibility timeout of 90 seconds, when using the storage client APIs. If your message is not completely processed and explicitly deleted within that time period, it will reappear on the queue.
While you can increase this invisibility timeout to up to seven days when you add the message to the queue, you should be using operations that are idempotent, meaning it doesn't matter if the message is processed multiple times. It's your job to ensure idempotence, perhaps by recording a unique id (in table storage, SQL database, etc.) associated with each message and ignoring the message if you see it a second time and you find it's already been marked complete.
You might also look at Windows Azure Queues and Windows Azure Service Bus Queues - Compared and Constrasted. You'll note Service Bus queues have some additional constructs you can use to guarantee at-most-once (and at-least-once) delivery.
Now the uestion is if one instance take a message from a queue and assigns the message to a task and it process, now i see another instance also retrieves the same message from Queue and process it. Because of that my tasks are executed multiple times.
Are you getting the messages via "GET" semantics? If that's the case, then what's the visibility timeout you have set for your messages. When you "GET" a message, it should become invisible to other callers (read "instances" in your case) for a particular period of time which you can specify using visibility timeout period. Check out the documentation here for this:
