Is it possible to stop MassTransit service after a saga/consumer completes - azure

So I know this runs a bit counter to MassTransit style, but I want to take advantage of some key features of MT such as message broker connection management, sagas, scheduled messages.
However, I know the service will be rarely used. This is a fairly large data take from an API which has a throttle of 12,000 requests per hour. Once every 24 hours a saga will start to take data and move it into Data Lake. The service will run for some minutes until the throttle is hit, then start again where it left off (state) when enough time has passed, maybe something like 30 minutes later. The amount of data means this will repeat for several hours (2 to 4).
The fit for a saga and and scheduled message seems pretty good. But it would be better if the service did not have incur operating costs for being awake 24x7. There will only ever be one request at a time for one set of API credentials. There may come a time when we might have multiple sets of credentials.
Is there a way to nicely close down the service when the saga completes?
As this is likely to be implemented with a container instance I propose to start an instance from a queue triggered function or similar.

Assuming that this is the approach you want to take (versus just an Azure Web Job, triggered by Azure Scheduler), there are a number of options:
Publish an event when the saga completes, consume that event, use Task.Run() or whatever to stop the bus.
Use a receive observer to keep track of in-flight messages and when it reaches zero and stays there for n seconds, stop the bus, exit the function.
Though I wonder why not just use a scheduled job via Azure, seems easier unless MassTransit is being used for more than just scheduling.

Related

Azure Service Bus Queue Performance

I am using the Azure service bus queue for one of my requirements. The requirement is simple, an azure function will act as an API and creates multiple jobs in the queue. The function is scalable and on-demand new instance creation. The job which microservice creates will be processed by a windows service. So the sender is Azure function and the receiver is windows service. Since the azure function is scalable, there will be multiple numbers of functions will be executed in parallel. So, the number of jobs getting created into the queue will be in parallel, and probably one job in every 500MS. Windows service is a single instance that is a Queue listener listens to this Queue and executes in parallel. So, the number of senders might be more, the receiver is one instance. And each job can run in parallel must be limited(4, since it takes more time and CPU) Right now, I am using Aure Service Bus Queue with the following configuration. My doubt is which configuration produces the best performance for this particular requirement.
The deletion of the Job in the queue will not be an issue for me. So, Can I use Delete instead of Peek-Lock?
Also, right now, the number of items receiving by the listener is not in order. I want to maintain an order in which it got created. My requirement is maximum performance. The job is done by the windows service is a CPU intensive task, that's why I have limited to 4 since the system is a 4 Core.
Max delivery count: 4, Message lock duration: 5 min, MaxConcurrentCalls: 4 (In listener). I am new to the service bus, I need a suggestion for this.
One more doubt is, let's consider the listener got 4 jobs in parallel and start execution. One job completed its execution and became a completed status. So the listener will pick the next item immediately or wait for all the 4 jobs to be completed (MaxConcurrentCalls: 4).
The deletion of the Job in the queue will not be an issue for me. So, Can I use Delete instead of Peek-Lock?
Receiving messages in PeekLock receive mode will less performant than ReceiveAndDelete. You'll be saving roundtrips to the broker to complete messages.
Max delivery count: 4, Message lock duration: 5 min, MaxConcurrentCalls: 4 (In listener). I am new to the service bus, I need a suggestion for this.
MaxDeliveryCount is how many times a message can be attempted before it's dead-lettered. It appears to be equal to the number of cores, but it shouldn't. Could be just a coincidence.
MessageLockDuration will only matter if you use PeekLock receive mode. For ReceiveAndDelete it won't matter.
As for Concurrency, even though your work is CPU bound, I'd benchmark if higher concurrency would be possible.
An additional parameter on the message receiver to look into would be PrefetchCount. It can improve the overall performance by making fewer roundtrips to the broker.
One more doubt is, let's consider the listener got 4 jobs in parallel and start execution. One job completed its execution and became a completed status. So the listener will pick the next item immediately or wait for all the 4 jobs to be completed (MaxConcurrentCalls: 4).
The listener will immediately start processing the 5th message as your concurrency is set to 4 and one message processing has been completed.
Also, right now, the number of items receiving by the listener is not in order. I want to maintain an order in which it got created.
To process messages in the order they were sent in you will need to send and receive messages using sessions.
My requirement is maximum performance. The job is done by the windows service is a CPU intensive task, that's why I have limited to 4 since the system is a 4 Core.
There are multiple things to take into consideration. The location of your Windows Service location would impact the latency and message throughput. Scaling out could help, etc.

Time triggered Azure Function and queue processing

I have a Function called once a day processing all messages in a queue. But I would like to also have the retries and poison messages logic as with the Queue Trigger. Is this somehow possible?
So at that point you function is purely a timer triggered function and from there you are no different than a console app in terms of how you would have to process messages from a queue with your own client connection, message loop, retrying and dead lettering (poison) logic. It's honestly just not the right tool for that job.
One approach I suppose you could consider if you wanted to be creative so that you could benefit from using an Azure Function's built in queue trigger behavior while still controlling what time the queue is processed is actually starting and stopping the function instance itself via something like Azure Scheduler. Scheduling the starting of the function is pretty straightforward and, once started, it will immediately begin draining the queue. The challenge is knowing how to stop it. The Azure Function runtime with the queue binding won't ever stop on its own as it's reading off the queue using a pull model so it's just gonna sit there waiting for new messages to arrive, right? So stopping it is really a question of understanding the need of the business. Do you stop once there's no more messages left for that day? Do you stop at a specific time of day? Etc. There's no correct answer here, it's totally domain specific, but whatever that answer is will dictate the exact approach taken.
Honestly, as I said earlier on, I'm not sure this is the right tool for the job. Yeah, it's nice that you get the retry and poison handling, but you're really going against the grain of the runtime. I would personally suggest you look into scheduling this simple console executable, executed in a task-like fashion using Azure Container Instances (some docs on that approach here). If you were counting on the auto-scale of Azure Functions, that's totally something you can get from Azure Container Instances as well.

How to handle long running jobs that are posted to a service bus with only 5min peek lock

What do people tend to do when they have a design that put jobs on a service queue or topic that takes longer then the 5min max of peeklock?
I have been using the OnMessage(...) async messagepump of service bus and is wondering if thats not such a good idea after also since if I start moving the jobs to a table while processing them, then the messagepump will just empty the queue and I just have the problem elsewhere of making sure my jobs are scheduled even between servers.
If you have a long running message processing workflow the you can check the lockedUntilUtc property of the message and call RenewLock at the appropriate time.
http://msdn.microsoft.com/en-us/library/windowsazure/microsoft.servicebus.messaging.brokeredmessage.renewlock.aspx
in the next release of SDK the OnMessage processing loop will automatically do that for you so that convenience API is always a good idea to use.

Is it acceptable to use ThreadPool.GetAvailableThreads to throttle the amount of work a service performs?

I have a service which polls a queue very quickly to check for more 'work' which needs to be done. There is always more more work in the queue than a single worker can handle. I want to make sure a single worker doesn't grab too much work when the service is already at max capacity.
Let say my worker grabs 10 messages from the queue every N(ms) and uses the Parallel Library to process each message in parallel on different threads. The work itself is very IO heavy. Many SQL Server queries and even Azure Table storage (http requests) are made for a single unit of work.
Is using the TheadPool.GetAvailableThreads() the proper way to throttle how much work the service is allowed to grab?
I see that I have access to available WorkerThreads and CompletionPortThreads. For an IO heavy process, is it more appropriate to look at how many CompletionPortThreads are available? I believe 1000 is the number made available per process regardless of cpu count.
Update - Might be important to know that the queue I'm working with is an Azure Queue. So, each request to check for messages is made as an async http request which returns with the next 10 messages. (and costs money)
I don't think using IO completion ports is a good way to work out how much to grab.
I assume that the ideal situation is where you run out of work just as the next set arrives, so you've never got more backlog than you can reasonably handle.
Why not keep track of how long it takes to process a job and how long it takes to fetch jobs, and adjust the amount of work fetched each time based on that, with suitable minimum/maximum values to stop things going crazy if you have a few really cheap or really expensive jobs?
You'll also want to work out a reasonable optimum degree of parallelization - it's not clear to me whether it's really IO-heavy, or whether it's just "asynchronous request heavy", i.e. you spend a lot of time just waiting for the responses to complicated queries which in themselves are cheap for the resources of your service.
I've been working virtually the same problem in the same environment. I ended up giving each WorkerRole an internal work queue, implemented as a BlockingCollection<>. There's a single thread that monitors that queue - when the number of items gets low it requests more items from the Azure queue. It always requests the maximum number of items, 32, to cut down costs. It also has automatic backoff in the event that the queue is empty.
Then I have a set of worker threads that I started myself. They sit in a loop, pulling items off the internal work queue. The number of worker threads is my main way to optimize the load, so I've got that set up as an option in the .cscfg file. I'm currently running 35 threads/worker, but that number will depend on your situation.
I tried using TPL to manage the work, but I found it more difficult to manage the load. Sometimes TPL would under-parallelize and the machine would be bored, other times it would over-parallelize and the Azure queue message visibility would expire while the item was still being worked.
This may not be the optimal solution, but it seems to be working OK for me.
I decided to keep an internal counter of how many message are currently being processed. I used Interlocked.Increment/Decrement to manage the counter in a thread-safe manner.
I would have used the Semaphore class since each message is tied to its own Thread but wasn't able to due to the async nature of the queue poller and the code which spawned the threads.

Controlling azure worker roles concurrency in multiple instance

I have a simple work role in azure that does some data processing on an SQL azure database.
The worker basically adds data from a 3rd party datasource to my database every 2 minutes. When I have two instances of the role, this obviously doubles up unnecessarily. I would like to have 2 instances for redundancy and the 99.95 uptime, but do not want them both processing at the same time as they will just duplicate the same job. Is there a standard pattern for this that I am missing?
I know I could set flags in the database, but am hoping there is another easier or better way to manage this.
Thanks
As Mark suggested, you can use an Azure queue to post a message. You can have the worker role instance post a followup message to the queue as the last thing it does when processing the current message. That should deal with the issue Mark brought up regarding the need for a semaphore. In your queue message, you can embed a timestamp marking when the message can be processed. When creating a new message, just add two minutes to current time.
And... in case it's not obvious: in the event the worker role instance crashes before completing processing and fails to repost a new queue message, that's fine. In this case, the current queue message will simply reappear on the queue and another instance is then free to process it.
There is not a super easy way to do this, I dont think.
You can use a semaphore as Mark has mentioned, to basically record the start and the stop of processing. Then you can have any amount of instances running, each inspecting the semaphore record and only acting out if semaphore allows it.
However, the caveat here is that what happens if one of the instances crashes in the middle of processing and never releases the semaphore? You can implement a "timeout" value after which other instances will attempt to kick-start processing if there hasnt been an unlock for X amount of time.
Alternatively, you can use a third party monitoring service like AzureWatch to watch for unresponsive instances in Azure and start a new instance if the amount of "Ready" instances is under 1. This will save you can save some money by not having to have 2 instances up and running all the time, but there is a slight lag between when an instance fails and when a new one is started.
A Semaphor as suggested would be the way to go, although I'd probably go with a simple timestamp heartbeat in blob store.
The other thought is, how necessary is it? If your loads can sustain being down for a few minutes, maybe just let the role recycle?
Small catch on David's solution. Re-posting the message to the queue would happen as the last thing on the current execution so that if the machine crashes along the way the current message would expire and re-surface on the queue. That assumes that the message was originally peeked and requires a de-queue operation to remove from the queue. The de-queue must happen before inserting the new message to the queue. If the role crashes in between these 2 operations, then there will be no tokens left in the system and will come to a halt.
The ESB dup check sounds like a feasible approach, but it does not sound like it would be deterministic either since the bus can only check for identical messages currently existing in a queue. But if one of the messages comes in right after the previous one was de-queued, there is a chance to end up with 2 processes running in parallel.
An alternative solution, if you can afford it, would be to never de-queue and just lease the message via Peek operations. You would have to ensure that the invisibility timeout never goes beyond the processing time in your worker role. As far as creating the token in the first place, the same worker role startup strategy described before combined with ASB dup check should work (since messages would never move from the queue).

Resources