I am struggling to answer this question for myself so can anyone answer it for me. How, exactly, do webjobs get run? By this I mean does the Azure framework manage access to the WebJob and are multiple instances run in separate processes?
I am under the impression that I get 12 or 16 parrallel instances by default. Is this correct? So if three messages are placed on a queue, that my webjob is triggered by, they will all run in parrallel.
AFAIK, when you scale out multiple instances, the webjobs will run on the instances in parallel in separate processes. But there are prerequisites of it,
The webjob should be continous and not Manual/Scheduled.
For this to happen correctly, you need to be running in Standard mode, and have the Always On setting enabled.
Note: If you use TimerTrigger in your webjob, it will not scale out. Refer to this article.
Behind the scenes, TimerTrigger uses the Singleton feature of the WebJobs SDK to ensure that only a single instance of your triggered function is running at any given time. When the JobHost starts up, for each of your TimerTrigger functions a blob lease (the Singleton Lock) is taken. This distrubuted lock ensures that only a single instance of your scheduled function is running at any time.
For more details about the issue, here are two similar posts for you to refer, 1 and 2.
Related
I have an azure function which contain PnP SDK Core code to integrate with SharePoint. the code loop through list items newly created >> add a folder structure to them. i configure the Azure Function to run each 10 minutes. but the azure function execution might exceed 10 minutes if the user added many list items. so is there a way to prevent a new instance of the azure function from running if there is a already a running instance of the azure function that has not finished yet?
Thanks
AFAIK that is the default behavior, see this doc:
Behind the scenes, TimerTrigger uses the Singleton feature of the WebJobs SDK to ensure that only a single instance of your triggered function is running at any given time.
If any process runs longer than the scheduled timer, the new incoming process waits for the older process to finish and then uses the same instance.
Let's say I have an Azure Function named func running. In the middle of func running, I deploy some new changes to the func. Will func finish the current run and then start with the new changes or will the current run just end?
Maybe this can help you:
https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-zero-downtime-deployment
The reliable execution model of Durable Functions requires that orchestrations be deterministic, which creates an additional challenge to consider when you deploy updates. When a deployment contains changes to activity function signatures or orchestrator logic, in-flight orchestration instances fail. This situation is especially a problem for instances of long-running orchestrations, which might represent hours or days of work.
To prevent these failures from happening, you have two options:
Delay your deployment until all running orchestration instances have completed.
Make sure that any running orchestration instances use the existing versions of your functions.
I have a continuous Webjob running on my Azure Website. It is responsible for doing some work after retrieving items from a QueueTrigger. I am attempting to increase the rate in which the items are processed off the Queue. As I scale out my App Service Plan, the processing rate increases as expected.
My concern is that it seems wasteful to pay for additional VMs just to run additional instances of my Webjob. I am looking for options/best practices to run multiple instances of the same Webjob on a single server.
I've tried starting multiple JobHosts in individual threads within Main(), but either that doesn't work or I was doing something wrong... the Webjob would fail to run due to what looks like each thread trying to access 'WebJobSdk.marker'. My current solution is to publish my Webjob multiple times, each time modifying 'webJobName' slightly in 'webjob-publish-settings.json' so that the same project is considered a different Webjob at publish time. This works great so far, expect that it creates a lot of additional work each time I need to make any update.
Ultimately, I'm looking for some advice on what the recommended way of accomplishing this would be. Ideally, I would like to get the multiple instances running via code, and only have to publish once when I need to update the code.
Any thoughts out there?
You can use the JobHostConfiguration.QueuesConfiguration.BatchSize and NewBatchThreshold settings to control the concurrency level of your queue processing. The latter NewBatchThreshold setting is new in the current in progress beta1 release. However, by enabling "prerelease" packages in your Nuget package manager, you'll see the new release if you'd like to try it. Raising the NewBatchThreshold setting increases the concurrency level - e.g. setting it to 100 means that once the number of currently running queue functions drops below 100, a new batch of messages will be fetched for concurrent processing.
The marker file bug was fixed in this commit a while back, and again is part of the current in progress v1.1.0 release.
I need to write some data to database every 50 seconds or so. It's similar to a Windows service that's running on background and silently doing its job. Starting and stopping is not an option in my case as I need a small amount of previously inserted data to be stored in memory. What's the best solution for this when using Windows Azure or AWS?
Thank you.
With Windows Azure, you can choose either a Web or Worker role (both basically Windows 2008 Server R2 or SP2) and have some type of timed event, as #Lucifure suggested. You could also run a scheduler, like Quartz.net, or take advantage of windows Azure queues or service bus queues to have messages show up at a certain time. However: You cannot have a "forever" task in a given role instance, in that periodically your VM instances will be rebooted (e.g. for host OS maintenance every month). With role shutdowns, you'll get notice, which you can handle these shutdown notices in Stopping() or OnStop(). If you have multiple instances, you can use a scheduler or queue to ensure your events still trigger every 50 seconds or so, and get handled across multiple instances (but only by one instance at any given time).
To preserve your in-memory information, one idea is to store that information in a cache. You have 2 choices:
Distributed (shared) cache service, which has been around for some time now. It runs independently of your role instances.
In-memory cache, just introduced in June 2012. Assuming you have more than one instance, the cache is spread across those instances. You can even run the cache inside of memory of your existing roles.
More information on caching is here.
There are a few StackOverflow answers regarding Quartz.net and Windows Azure, such as this one.
On Windows Azure, you can use a Worker Role, which can do this. It can be simple as a while loop.
Try this article for an introduction.
http://www.c-sharpcorner.com/uploadfile/40e97e/windows-azu-creating-and-deploying-worker-role/
You could setup a System.Threading.Timer to fire every 50 seconds or so, and do your work whenever the event occurs.
I want to create a Web Crawler, that takes the content of some website and saves it in a blob storage. What is the right way to do that on Azure? Should I start a Worker role, and use the Thread.Sleep method to make it run once a day?
I also wonder, if I use this Worker Role, how would it work if I create two instances of it? I noticed using "Compute Emulator UI" that the command "Trace.WriteLine" works on both instances at the same time, can someone clarify this point.
I created the same crawler using php and set the cron job to start the script once a day, but it took 6 hours to grab the whole content, thats why I want to use Azure.
This is the right way to do it, as of Jan 2014 Microsoft introduced Azure WebJobs, where you can create a project (console for example), and run it as a scheduled task (occurrence once, recurrence)
https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/
http://www.hanselman.com/blog/IntroducingWindowsAzureWebJobs.aspx
Considering that a worker role is basically Windows 2008 Server, you can run the same code you'd run on-premises.
Consider, though, that there are several reasons why a role instance might reboot: OS updates, crash, etc. In these cases, it's possible you'd lose the work being done. So... you can handle this in a few ways:
Queue. Place a message on a command queue. If it's a once-a-day task, you can just push the message on the queue when done processing the previous message. Note that you can put an invisibility timeout on the message, so it doesn't appear for a day. In the event of failure during processing, the message will re-appear on the queue and a different instance can pick it up. You can also modify the message as you go, to keep track of your status.
Scheduler. Just make sure there's only one instance running (by way of a mutex). An easy way to do this is to attempt to obtain a write-lock on a blob (there can only be one).
One thing to consider is breaking up your web-crawl into separate tasks (url's?) and place those individually on the queue? With this, you'd be able to scale, running either multiple instances or, potentially, multiple threads in the same instance (since web-crawling is likely to be a blocking operation, rather than a cpu- and bandwidth-intensive one).
A single worker role running once a day is probably the best approach. I would not use thread sleep though, since you may want to restart the instance and then it may, depening on your programming, start before one day or later than one day. What about putting the task command as a message on the Azure Queue and dequeuing it once it has been picked up by a worker role, then adding a new task command on the Azure Queue once.