I'm using Azure webjobs with queue-triggered functions (which rely on the Azure webjobs sdk) to perform some background processing work. Within the webjobs I make various connects to a SQL Azure database (using PetaPoco which uses System.Data.SqlClient).
I want to be purposeful in my database connection strategy - specifically because there are some concurrency issues inherent to the environment.
One concurrency scenario is with the SDK's BatchSize property that you can set for queue-triggered webjobs. It's my understanding that setting BatchSize > 1 results in multiple instances of the queue-triggered function running within the same webjob process.
The second concurrency scenario is the website scale-out scenario where you're running multiple instances of the webjob itself. These of course are in different processes.
In my website I have a database connection per request (the machine handles connection pooling by default). No problems there.
How should I treat connections in the webjob scenario, accounting for the concurrency scenarios described above? Webjobs are of course just long-lived console processes (these are continuous webjobs). Should I create a database connection when my webjob starts and simply re-use that connection through the webjob's lifetime? Should I instantiate and close connections per function when I need them?
These are the types of things I'm trying to understand.
Webjobs are of course just long-lived console processes (these are continuous webjobs).
The main process is the long-lived processes , but for trigged sub- process will be released after the triggered function is executed. It means that connection will also be released automatically in the sub-process. For best program practices that we 'd better close it manually before exit function.
The second concurrency scenario is the website scale-out scenario where you're running multiple instances of the webjob itself. These of course are in different processes.
WebJob SDK queue trigger will automatically prevents a queue triggered by multiple instances.
If your web app runs on multiple instances, a continuous WebJob runs on each machine, and each machine will wait for triggers and attempt to run functions. The WebJobs SDK queue trigger automatically prevents a function from processing a queue message multiple times; functions do not have to be written to be idempotent. However, if you want to ensure that only one instance of a function runs even when there are multiple instances of the host web app, you can use the Singleton attribute.
It's my understanding that setting BatchSize > 1 results in multiple instances of the queue-triggered function running within the same webjob process
BatchSize it means that how many queue messages that can be picked up simutaneouly to be executed in Parallel in a WebJob.
How to use Azure queue storage with the WebJobs SDK induling parallel execution and multiple instances, we could get more info from the doucment.
Related
I have this .NET long running API process/function that usually runs 30 mins in one execution that is hosted in AKS. This API is usually executed from the users coming from the front end of the app.
Due to concurrent executions from users, this is causing exhaustion of the app so I'm planning to implement a some sort of a queueing mechanism with the help of a scheduler(s).
What possibly is applicable Azure service that can execute my API in AKS on a scheduled basis (let's say every minute) and possibly check the database for some flagging values.
I need a way to check the table for some flagging value if there a currently running process or its been completed so it can process the next one, otherwise ignore the call until current on is complete.
I was looking into Azure Web Apps, Web Jobs or Batch Jobs but kinda confused which is applicable with my case.
Please advise thank you in advance.
There are a couple of options here.
Hangfire
Hangfire is an open-source library that can run background jobs in queues. In your case, you can enqueue each request from the client in a queue. Then Hangfire server will process them one by one (even with retry if the job fails). Hangfire supports SQL Server or Redis. You can query the storage to see the status of the queued jobs.
Hangfire can also run scheduled jobs, which will take care of that only one job run at a time.
Azure Service Bus
A more expensive option is to use Azure Service Bus for your queueing capability. For scheduled jobs, you can use AKS CronJobs but you will
implement the check yourself to see if there is a job already running.
Overall, I would recommend Hangfire, which can meet your requirements and is cheaper.
We have an Azure Function App (timer triggered) on a Consumption plan for testing purpose. The App fist fires a bunch of Stored Procedures on a SQL Server. We use Task.Run() and inside of it it's just a Synchronous operation to run an SP on the Server. It's a fire and forgets tasks that we require and the Exceptions/Errors from SQL are logged to the table inside of the SQL Server. This particular Azure App is a plan to migrate our SQL Agent Jobs (as we are moving towards a PaaS Database) to the cloud. Moreover, the Function App triggers an SP across multiple databases. So a single Task.Run for each DB.
The thing is the execution of the SP might take around 20 minutes to complete itself. I see that around 19 minutes the Connection is dropped. So I see that an SP was was started let's say at 5:00 AM and with appropriate logging inside of an SP, it went on till 5:19 AM and then it stopped (no success log). So I believe the SQLConnection from C# is dropped. The consumption plan default is 5 minutes. So if it's a timeout issue then why still I can continue till 19 minutes and then only it's dropped. I have observed this behavior for some days now.
I cannot arrive at a feasible explanation of the above behavior.
Maximum timeout for azure functions in consumption plan is 10min:
Change plan to support longer timeout or you can use Durable functions (intended for long-running tasks).
Durable Functions is an extension of Azure Functions that lets you
write stateful functions in a serverless compute environment. The
extension lets you define stateful workflows by writing orchestrator
functions and stateful entities by writing entity functions using the
Azure Functions programming model. Behind the scenes, the extension
manages state, checkpoints, and restarts for you, allowing you to
focus on your business logic.
Refs:
https://learn.microsoft.com/pl-pl/azure/azure-functions/functions-scale#function-app-timeout-duration
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-overview?tabs=csharp
https://learn.microsoft.com/en-us/learn/modules/create-long-running-serverless-workflow-with-durable-functions/
How does the concept of storage queue polling apply when an Azure Function is hosted under the consumption plan?
I get the principal of polling with classic hosted WebJob functions and I understand that the maximum polling interval of 1 minute can be overridden. However in the case of consumption plan hosting there is no app-level memory resident process, therefore I assume that Azure internals spin up a FunctionApp via some other trigger beyond my control.
The motivation for this question is that I am trying to understand typical E2E function invocation propagation delays when an Azure hosted WebApp adds a message to a storage queue. In my case the WebApp, StorageQueue and pre-compiled function DLL will run in the same Azure region.
I need to cap Azure Function invocation delays to under 10 seconds with an average of <3 seconds.
Unfortunately this isn't possible on the consumption plan with the current polling model, as we poll your trigger resource every 10s to determine if there are new events requiring a function instance to be loaded/started.
If your function app runs frequently enough that it always has active instances (a new queue message every 5 min, for example) you can get the invocation delays that you want, as the instances themselves handle the polling.
The worst case (no function instances running) is ~10s polling + ~5s instance startup time to process a new event.
I do have a AzureWorker that receives SMTP messages from TCP ports and pushes them to queues. Other threads pick up these messages from the queues and process them. Currently, process threads have their queue polling logic. Simply they check the queues and increase wait interval if the queues are empty.
I want to simplify the queue logic and make use of other Webjobs functionalities in this AzureWorker.
Is it possible to start a WebJobs thread in this AzureWorker and let that thread handle the details? Are there any limitations that I need to know?
Azure Worker Roles are a feature of Azure Cloud Services. Azure Web Jobs are a feature of Azure App Service. They are both built to provide similar ability to run background process tasks within the context of your application. Although, since they are features of different Azure services they can't be run together like you are asking in a nested fashion.
Is it possible to start a WebJobs thread in this AzureWorker and let that thread handle the details?
I agree with Chris Pietschmann, it does not enable us to start WebJobs thread directly in Azure Worker Role.
Other threads pick up these messages from the queues and process them. Currently, process threads have their queue polling logic. Simply they check the queues and increase wait interval if the queues are empty.
I want to simplify the queue logic and make use of other Webjobs functionalities in this AzureWorker.
If you’d like to complete this task by using WebJobs, you could write a program and run as a WebJobs in your Azure App Service. And WebJobs API provides a way to dynamically start/stop WebJobs via REST API, you could use it to manage your WebJobs in your Worker Role.
Can anybody explain the difference between Azure Web Jobs and Azure Scheduler
Azure Web Jobs
Only available on Azure Websites
It is used to run code at particular intervals. E.g. a console application every day
Used to trigger and run workloads.
Mainly recommended for workloads that either scale with the website or are relatively small.
Can be persistently running if "Always On" selected, otherwise you will get the 20 min timeout.
The code that needs to be run and schedule are defined together.
Azure Scheduler
Is not tied to Websites or Cloud Services
It allows you to call a website or add a message to a storage queue
Used for triggering events or triggering small workloads (e.g. add to queue), usually to trigger larger workloads
Mainly recommended for triggering more complex workloads.
This is only a trigger, and a separate function listening to trigger events (e.g. queue's) needs to be coded separately.
For many instances I prefer to use the scheduler to push to a storage queue and a worker role on each instance takes off the queue. This keeps tasks controlled granularly and can also move up or down in scale outside of your website.
With WebJobs they scale up and down with your site and hence your background tasks can become over taxed if your website is experiencing low traffic and scaled down.
Azure Scheduler - Provides a way to easily schedule http calls in a well-defined schedule, like every hour, every Friday at 9:00 am, Once a day, ...
Azure WebJobs - Provides a way to run small to medium work load (in the form of a script: .exe, .cmd, .sh, .js, ...) at the same context of an Azure Website (but can be hosted even with an empty website).
While a WebJob can run continuously (with a process that has a while loop) and Azure will make sure this WebJob is always running (with "Always On" set).
There is also an integration between Azure scheduler and Azure WebJobs where you have a WebJob that is running some finite work and the schduler is responsible for scheduling this work (invoking the WebJob).
So in summary, the scheduler is about scheduling work and WebJobs is about running work load.