Azure Web Job - Scale out kills existing web job - azure

I'm looking to use the API to change the number of web job instances I have running based on the size of a processing Q, I know I can setup rules in the portal but the minimum aggregation time is 60 minutes and I don't want to have the system waiting 60 minutes before scaling up if we suddenly get a burst of work.
The issue I have is that currently if I scale out in the portal manually from say 1 to 5 instances it kills the single running instance and then starts 5 new ones.
I assume if I did this through the API the same thing would happen, do you know if there is any way to avoid this?
Thanks
Si
UPDATE:
See below, I submitted 4 jobs and then as the first was processing I scaled out from 1 to 3 instances and this is what happened, the job that Never Finished then reran after the next 3 had finished as it's message would have popped back on the queue because it's processing failed initially.

if I scale out in the portal manually from say 1 to 5 instances it kills the single running instance and then starts 5 new ones.
As my test, if you scale your webjob, it will not kill the single running instance. I created a Webjob template and write a timer trigger in it.
Here is the time I scaled my web app:
Here is the trigger log in Azure storage ('azure-jobs-host-output'):
If you find your webjob is running in 'inactive instance' state in Azure webjob dashboard. Please do not worry about it. Your webjob is still in running. Please have a look at David's reply in this thread. Here is a snippet:
This is actually a bug in what the Portal displays. The Portal ends up asking an arbitrary instance about the WebJob status, and if it happens to hit any instance other than the one that's actually running it, it will be reported as inactive.

Related

Azure Container Apps - Service stopping despite minimum replica being 1

I've got a .NET worker service based on a cron schedule running in a Docker container and pushed up to Azure Container Apps. The schedule is managed within the application itself.
The scaling is set to have a minumum of 1 replica running at all times.
However, we've found that for some reason the application starts up, idles waiting for the schedule trigger for ~20-30 seconds, stops for 2 seconds, starts and idles for ~20-30 seconds again and then doesn't run again for ~5-6 minutes. During the idling time, the job might start if the cron schedule lines up while the process is running.
Is there any way to diagnose why it might be auto-killing the application?
I can't seem to find any logs that show any fatal exceptions or along those lines, and running in other environments (locally, Azure Container Instance etc.) doesn't replicate the behavior. My suspicion is that it's the auto-scaling behavior: Azure is noticing that the process is idle for 20-30 seconds at a time and killing that replica, only for it to spin up again 5 mins later. However, I can't seem to find anything to prove that theory.
I'm aware that other resource types might be better suited (Container Instances, App Service, Functions) though for now I'm stuck with Container Apps.
Found the cause of the issue based on this SO question:
Azure Container Apps Restarts every 30 seconds
Turns out, Azure was trying to do health checks on it despite no HTTP ports being exposed. Azure thinking the container is unhealthy, kills and restarts it. Turning off HTTP ingress (and therefore the health checks) solved this issue.

How cloud services are provisioned (and billed) once a new deployment is requested to Azure REST API?

I'm using Azure REST API to create, deploy and start a Cloud Service (classic) (cspkg hosted in Azure Storage) with hundreds of instances. I'm noticing that time Azure takes to provision and start the requested instances is really heterogeneous. First instances might start in 6-7 minutes but last ones might take up to 15-20 minutes, about 10 minutes longer than first ones. So my questions are:
Is this the expected behaviour? If so, what's the logic behind this? Could I do anything to speed things up?
How is Azure billing this? Is it counting the total count of instances since the very initial time when Cloud Service is deployed? or is it taking into account the specific timing on each individual instance?
UPDATE: I've been testing more scenarios and I've found a puzzling surprise. If I replace all the processes that my Cloud Service instances should run by a simple wait for some minutes (run .bat file with timeout command) then all the instances start almost at the same time (about 15 seconds between fastest and slowest instance). It was not just luck and random behaviour, I've proved that this behavior is repeatable and I can't even try to explain the root reason.
I also checked this a few weeks ago, and the startup time, depends on the size of the machine, if it is large it has more resources, so the boot time is faster, and also, if there is any error, exception on startup the VM will recycle till it can successfully start. I googled it, but did not find any solution to speed this up, so I don't think it is possible to do anything about the startup time. In the background every time when you deploy something, it will create a Windows Server, and boot it up and deploy your package on it and puts your web roles behind load balancer, this is why it takes so long, because a lot of things are happening.
The billing part is also not the best for the classic cloud services, you have to pay for it even during the startup and recycle, and even when it is turned off, so if you are done with your update, you should delete the VMs from your staging slot or scale it down, because you will pay for it even if it is turned off.

Execute time-consuming jobs on Azure

I have used Azure Schedulers before for quick jobs before. It targets a URL which is ASPX page or WebApi and it did the job.
Now I have a job that takes up to 15-20 minutes. Of course, I am getting timeout error after 30 seconds.
I'm trying to avoid creating a Windows Service or some console application that would run on Azure VM, rather have a non-UI application that runs in the background.
Do you have any suggestion what should I do?
You should use an Azure WebJob for this. WebJobs support simple scheduling via a cron expression (details here). Basically you upload a simple script file or exe that performs the work you want done, upload it to your WebApp along with a cron schedule expression, and Azure WebJobs will make sure it runs on schedule.
For your scenario, you'll want to create a "Continuous" WebJob and ensure you've enabled "Always On" which ensures the background job continues running (i.e. it isn't request triggered).
WebJobs sure is a good solutions, but it will share resources with its attached Web App.
You could consider using an Azure Cloud Service. I do that myself for longer running tasks, that are more CPU intensive.
Read more
For long running WebJobs, you have to tinker with the Timeout value (by default 2 minutes) or make sure your Webjob makes some Console.Writes.
To achieve that, go to the Web App Settings > Application Settings and add the following configurations:
WEBJOBS_IDLE_TIMEOUT - Time in seconds after which we'll abort a running triggered job's process if it's in idle, has no cpu time or output.
SCM_COMMAND_IDLE_TIMEOUT - Time in milisecods. By default, when your build process launches some command, it's allowed to run for up to 60 seconds without producing any output. If that is not long enough, you can make it longer, e.g. to make it 10 minutes:

Is Azure Scheduled WebJob started if previous one is still running?

I have a Scheduled Azure WebJob which runs every 5 mins. It's not clear what happens if the running times takes 10 mins. Is a new one started parallel to the one still running, or is it not started until the previous one has finished?
From this answer What happens when a scheduled WebJob runs for a long time :
As i understand it scheduled webjobs is just triggered webjobs that is run using Azure Scheduler, if you open Azure Scheduler in management portal you can see the webjobs and even configure them in more detail. (You can see the log too which would give you the simple answer to your question).
If you like to look at whats going on your scheduled webjob is run as a Triggered webjob by Kudu, if you look in the Kudu source you will see that a lockfile is created when a job is started, and if you try to start another job a ConflictException is thrown if there is already a lock file.
The Azure scheduler calls your job using a webhook that catches the ConflictException and gives you the "Error_WebJobAlreadyRunning" warning which will tell you: "Cannot start a new run since job is already running."

WebJob doesn't Trigger

I've created a simple Azure WebJob that uses a QueueInput trigger. It deployed without any problems and I've schedule it via the management portal so that it 'Runs continuously'
Initial testing seemed fine, with the job triggering shortly after placing anything in the queue.
By chance I then left it about a day before placing anything else in the queue. This time the job hadn't triggered within a few minutes so I logged in to the portal to view the invocation logs - which showed that the job had just that moment been triggered.
That seemed too much of a coincidence so I left it another day before placing something in the queue. Again, the job didn't trigger. I left it overnight and by morning it still hadn't triggered.
When I logged in to the management portal this time I noticed that the job was marked as 'Aborted' on the WebJobs page. It was like that only for about 10 seconds before the status changed to 'Running'. And then the job immediately triggered from what was placed in the queue the night before, as expected.
As it's an alpha release I'm expecting glitches. Just wondering whether anyone else has had a similar experience.
For WebJobs SDK, your job must be running in order to listen for triggers (new queue messages, new blobs, etc). Azure Websites free tier has quotas and will put your job to sleep which means it's no longer listening on triggers. Using the site may cause it to come back to life and start listening to triggers again.
The SDK dashboard will show a warning icon next to functions if the hosting job is not running (it detects this via heartbeats).
Make sure that your website is configured with the "Always On" setting Enabled.
If your site contains continuously running jobs they may not perform reliably if this setting is disabled.
http://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
By default, web sites are unloaded if they have been idle for some period of time. This lets the system conserve resources. You can enable the Always On setting for a site in Standard mode if the site needs to be loaded all the time. Because continuous web jobs may not run reliably if Always On is disabled, you should enable Always On when you have continuous web jobs running on the site.

Resources