I have a Function called once a day processing all messages in a queue. But I would like to also have the retries and poison messages logic as with the Queue Trigger. Is this somehow possible?

So at that point you function is purely a timer triggered function and from there you are no different than a console app in terms of how you would have to process messages from a queue with your own client connection, message loop, retrying and dead lettering (poison) logic. It's honestly just not the right tool for that job.
One approach I suppose you could consider if you wanted to be creative so that you could benefit from using an Azure Function's built in queue trigger behavior while still controlling what time the queue is processed is actually starting and stopping the function instance itself via something like Azure Scheduler. Scheduling the starting of the function is pretty straightforward and, once started, it will immediately begin draining the queue. The challenge is knowing how to stop it. The Azure Function runtime with the queue binding won't ever stop on its own as it's reading off the queue using a pull model so it's just gonna sit there waiting for new messages to arrive, right? So stopping it is really a question of understanding the need of the business. Do you stop once there's no more messages left for that day? Do you stop at a specific time of day? Etc. There's no correct answer here, it's totally domain specific, but whatever that answer is will dictate the exact approach taken.
Honestly, as I said earlier on, I'm not sure this is the right tool for the job. Yeah, it's nice that you get the retry and poison handling, but you're really going against the grain of the runtime. I would personally suggest you look into scheduling this simple console executable, executed in a task-like fashion using Azure Container Instances (some docs on that approach here). If you were counting on the auto-scale of Azure Functions, that's totally something you can get from Azure Container Instances as well.


Is it possible to stop MassTransit service after a saga/consumer completes

So I know this runs a bit counter to MassTransit style, but I want to take advantage of some key features of MT such as message broker connection management, sagas, scheduled messages.
However, I know the service will be rarely used. This is a fairly large data take from an API which has a throttle of 12,000 requests per hour. Once every 24 hours a saga will start to take data and move it into Data Lake. The service will run for some minutes until the throttle is hit, then start again where it left off (state) when enough time has passed, maybe something like 30 minutes later. The amount of data means this will repeat for several hours (2 to 4).
The fit for a saga and and scheduled message seems pretty good. But it would be better if the service did not have incur operating costs for being awake 24x7. There will only ever be one request at a time for one set of API credentials. There may come a time when we might have multiple sets of credentials.
Is there a way to nicely close down the service when the saga completes?
As this is likely to be implemented with a container instance I propose to start an instance from a queue triggered function or similar.
Assuming that this is the approach you want to take (versus just an Azure Web Job, triggered by Azure Scheduler), there are a number of options:
Publish an event when the saga completes, consume that event, use Task.Run() or whatever to stop the bus.
Use a receive observer to keep track of in-flight messages and when it reaches zero and stays there for n seconds, stop the bus, exit the function.
Though I wonder why not just use a scheduled job via Azure, seems easier unless MassTransit is being used for more than just scheduling.

Azure Function Event Hub Trigger reliability

I'm a bit confused regarding the EventHubTrigger for Azure functions.
I've got an IoT Hub, and am using its eventhub-compatible endpoint to trigger an Azure function that is going to process and store the received data.
However, if my function fails (= throws an exception), that message (or messages) being processed during that function call will get lost. I actually would expect the Azure function runtime to process the messages at a later time again. Specifically, I would expect this behavior because the EventHubTrigger is keeping checkpoints in the Function Apps storage account in order to keep track of where in the event stream it has to continue.
The documention of the EventHubTrigger even states that
If all function executions succeed without errors, checkpoints are added to the associated storage account
But still, even when I deliberately throw exceptions in my function, the checkpoints will get updated and the messages will not get received again.
Is my understanding of the EventHubTriggers documentation wrong, or is the EventHubTriggers implementation (or its documentation) wrong?
This piece of documentation seems confusing indeed. I guess they mean the errors of Function App host itself, not of your code. An exception inside function execution doesn't stop the processing and checkpointing progress.
The fact is that Event Hubs are not designed for individual message retries. The processor works in batches, and it can either mark the whole batch as processed (i.e. create a checkpoint after it), or retry the whole batch (e.g. if the process crashed).
See this forum question and answer.
If you still need to re-process failed events from Event Hub (and errors don't happen too often), you could implement such mechanism yourself. E.g.
Add an output Queue binding to your Azure Function.
Add try-catch around processing code.
If exception is thrown, add the problematic event to the Queue.
Have another Function with Queue trigger to process those events.
Note that the downside of this is that you will loose ordering guarantee provided by Event Hubs (since Queue message will be processed later than its neighbors).
Quick fix. As retry policy would not work if down system is down for few hours. You can call Process.GetCurrentProcess().Kill(); in exception handling. This would stop the checkpoint moving forward. I have tested this with consumption based function app. You will not see anything in logs but i added email to notify that something went wrong and to avoid data loss i have killed the function instance.
Hope this helps.
Would put an blog over it and other part of workflow where I stop function in case of continuous failure on down system using logic app.

How to handle long running jobs that are posted to a service bus with only 5min peek lock

What do people tend to do when they have a design that put jobs on a service queue or topic that takes longer then the 5min max of peeklock?
I have been using the OnMessage(...) async messagepump of service bus and is wondering if thats not such a good idea after also since if I start moving the jobs to a table while processing them, then the messagepump will just empty the queue and I just have the problem elsewhere of making sure my jobs are scheduled even between servers.
If you have a long running message processing workflow the you can check the lockedUntilUtc property of the message and call RenewLock at the appropriate time.
in the next release of SDK the OnMessage processing loop will automatically do that for you so that convenience API is always a good idea to use.

Correct code pattern for recurrent events in Azure worker roles with sizable delays between each event

I have an Azure worker role whose job is to periodically run some code against a SQL Azure database. Here's my current code:
const int oneHour = 216000000; // milliseconds
while (true)
var numConversions = SaveSeedsToSQL.ConvertRemainingPotentialQueryURLsToSeeds();
SaveLogEntryToSQL.Save(new LogEntry { Count = numConversions });
Is Thread.Sleep(216000000) the best way of programming such regular but infrequent events or is there some kind of wake-up-and-run-again mechanism for Azure worker roles that I should be utilizing?
This code works of course, but there are some problems:
You can fail somewhere and this schedule gets all thrown off. That
is important if you must actually do it at a specific time.
There is no concurrency control here. If you want something only done once,
you need a mechanism such that a single instance will perform the
work and the other instances won't.
There are a few solutions to this problem:
Run the Windows Scheduler on the role (built in). That solves problem 1, but not 2.
Run Quartz.NET and schedule things. That solves #1 and depending on how you do it, also #2.
Use future scheduled queue messages in either Service Bus or Windows Azure queues. That solves both.
The first two options work with caveats, so I think the last option deserves more attention. You can simply create a message(s) that your role(s) will understand and post it to the queue. Once the time comes, it becomes visible and your normally polling roles will see it and can work on it. The benefit here is that it is both time accurate as well as a single instance operates on it since it is a queue message. When completed with the work, you can have the instance schedule the next one and post it to the queue. We use this technique all the time. You only have to be careful that if for some reason your role fails before scheduling the next one, the whole system kinda fails. You should have some sanity checks and safeguards there.

Controlling azure worker roles concurrency in multiple instance

I have a simple work role in azure that does some data processing on an SQL azure database.
The worker basically adds data from a 3rd party datasource to my database every 2 minutes. When I have two instances of the role, this obviously doubles up unnecessarily. I would like to have 2 instances for redundancy and the 99.95 uptime, but do not want them both processing at the same time as they will just duplicate the same job. Is there a standard pattern for this that I am missing?
I know I could set flags in the database, but am hoping there is another easier or better way to manage this.
As Mark suggested, you can use an Azure queue to post a message. You can have the worker role instance post a followup message to the queue as the last thing it does when processing the current message. That should deal with the issue Mark brought up regarding the need for a semaphore. In your queue message, you can embed a timestamp marking when the message can be processed. When creating a new message, just add two minutes to current time.
And... in case it's not obvious: in the event the worker role instance crashes before completing processing and fails to repost a new queue message, that's fine. In this case, the current queue message will simply reappear on the queue and another instance is then free to process it.
There is not a super easy way to do this, I dont think.
You can use a semaphore as Mark has mentioned, to basically record the start and the stop of processing. Then you can have any amount of instances running, each inspecting the semaphore record and only acting out if semaphore allows it.
However, the caveat here is that what happens if one of the instances crashes in the middle of processing and never releases the semaphore? You can implement a "timeout" value after which other instances will attempt to kick-start processing if there hasnt been an unlock for X amount of time.
Alternatively, you can use a third party monitoring service like AzureWatch to watch for unresponsive instances in Azure and start a new instance if the amount of "Ready" instances is under 1. This will save you can save some money by not having to have 2 instances up and running all the time, but there is a slight lag between when an instance fails and when a new one is started.
A Semaphor as suggested would be the way to go, although I'd probably go with a simple timestamp heartbeat in blob store.
The other thought is, how necessary is it? If your loads can sustain being down for a few minutes, maybe just let the role recycle?
Small catch on David's solution. Re-posting the message to the queue would happen as the last thing on the current execution so that if the machine crashes along the way the current message would expire and re-surface on the queue. That assumes that the message was originally peeked and requires a de-queue operation to remove from the queue. The de-queue must happen before inserting the new message to the queue. If the role crashes in between these 2 operations, then there will be no tokens left in the system and will come to a halt.
The ESB dup check sounds like a feasible approach, but it does not sound like it would be deterministic either since the bus can only check for identical messages currently existing in a queue. But if one of the messages comes in right after the previous one was de-queued, there is a chance to end up with 2 processes running in parallel.
An alternative solution, if you can afford it, would be to never de-queue and just lease the message via Peek operations. You would have to ensure that the invisibility timeout never goes beyond the processing time in your worker role. As far as creating the token in the first place, the same worker role startup strategy described before combined with ASB dup check should work (since messages would never move from the queue).
