I have an azure web job running continuously, but the logs indicated that over the weekend it's status changed to Aborted, then Stopped. Although I did not use the website for the weekend, I am not sure why this would happen as there are still a lot of messages on the queue that need to be processed.
What can cause a continuous web job to stop or abort?
Does it have a timeout period?
Can the occurrence of multiple errors also cause it to stop or abort?
The job itself doesn't have a timeout period but the website does. Unless you enable the Always On option, the website (and the jobs) will unload after some idle time.
Another reason why a continuous job could stop is if you are running on the free tier and the job uses too much CPU time (I think you have 2.5 minutes CPU time for every 5 minutes).
Related
I am running a triggered web job - runs on every 15 minutes - it is a console application which moves data from SQL Server to Mongo Database. I have configured the WEBJOBS_IDLE_TIMEOUT and SCM_COMMAND_IDLE_TIMEOUT values to 3600. It was in an idle state for the last 5 days and it didn't abort or timed out. I had to restart the web app to abort the web job and restart it.
Is there a way I can configure the web job should abort after a specific time period?
The WEBJOBS_IDLE_TIMEOUT configuration only applies if your webjob has no CPU and no IO activity (see WebJobs in the Kudu wiki)
CPU activity is measured via Process.TotalProcessorTime while IO activity is measured via inspecting the standard and error output streams of your webjob for new characters. Source: IdleManager.WaitForExit and Executable.ExecuteInternal.
If possible, let your webjob exit gracefully after it's done its job. If you want to forcefully abort a long running operation that is still consuming CPU or IO, you'll have to code this in your webjob directly.
I'm looking to use the API to change the number of web job instances I have running based on the size of a processing Q, I know I can setup rules in the portal but the minimum aggregation time is 60 minutes and I don't want to have the system waiting 60 minutes before scaling up if we suddenly get a burst of work.
The issue I have is that currently if I scale out in the portal manually from say 1 to 5 instances it kills the single running instance and then starts 5 new ones.
I assume if I did this through the API the same thing would happen, do you know if there is any way to avoid this?
Thanks
Si
UPDATE:
See below, I submitted 4 jobs and then as the first was processing I scaled out from 1 to 3 instances and this is what happened, the job that Never Finished then reran after the next 3 had finished as it's message would have popped back on the queue because it's processing failed initially.
if I scale out in the portal manually from say 1 to 5 instances it kills the single running instance and then starts 5 new ones.
As my test, if you scale your webjob, it will not kill the single running instance. I created a Webjob template and write a timer trigger in it.
Here is the time I scaled my web app:
Here is the trigger log in Azure storage ('azure-jobs-host-output'):
If you find your webjob is running in 'inactive instance' state in Azure webjob dashboard. Please do not worry about it. Your webjob is still in running. Please have a look at David's reply in this thread. Here is a snippet:
This is actually a bug in what the Portal displays. The Portal ends up asking an arbitrary instance about the WebJob status, and if it happens to hit any instance other than the one that's actually running it, it will be reported as inactive.
We have an Azure Webjob running on one of our websites that is supposed to run once a day, during the night. This generally processes some data, updates a few records ... It's a process that runs for a few minutes and in the end simply prints "OK".
The Webjob is set to run at 10 PM.
Now, we notice in our logging, that it does indeed run at 10PM, and then runs again at 10:01PM and 10:02PM and 10:03PM. It runs four times.
What I suppose is happening is that the job is taking some time to process (it takes about three minutes on our production environment) and the Azure Webjob keeps triggering it once every minute until it receives a response.
Can someone confirm or deny this? I've tried reading the documentation and googling for this specific problem, but couldn't find much information about it, other than that it might show this behavior if an exception is thrown (which is definitely not the case).
Can this behavior be configured in some way?
I have a web api which does a task and it currently takes couple of minutes based on the data. This can increases over time.
I have Azure scheduler job which calls this web api every 10 minutes. I want to avoid the case where the second call after 10 minutes overlaps with the first call because of the increase in time for execution. How can I put the smarts in the web api so that I detect and avoid the second call if the first call is running.
Can I use AutoResetEvent or lock statements? Or keeping a storage flag to indicate busy/free a better option?
Persistent state is best managed via storage. Can your long-running activity persist through a role reset (after all, a role may be reset at any time as long as availability constraints are met).
Ensure that you think through scenarios where your long running job terminates halfway through.
The Windows Azure Scheduler has a 30 seconds timeout. So we cannot have a long running task called by a scheduler. The overlap of 2 subsequent calls is out of question.
Also it seems like having a long running task from a WebAPI is a bad design, because the recycling of app pools. I ended up using Azure service bus. When a task is requested message is posted to the queue. That way the time occupied by the webapi is limited.
I've created a simple Azure WebJob that uses a QueueInput trigger. It deployed without any problems and I've schedule it via the management portal so that it 'Runs continuously'
Initial testing seemed fine, with the job triggering shortly after placing anything in the queue.
By chance I then left it about a day before placing anything else in the queue. This time the job hadn't triggered within a few minutes so I logged in to the portal to view the invocation logs - which showed that the job had just that moment been triggered.
That seemed too much of a coincidence so I left it another day before placing something in the queue. Again, the job didn't trigger. I left it overnight and by morning it still hadn't triggered.
When I logged in to the management portal this time I noticed that the job was marked as 'Aborted' on the WebJobs page. It was like that only for about 10 seconds before the status changed to 'Running'. And then the job immediately triggered from what was placed in the queue the night before, as expected.
As it's an alpha release I'm expecting glitches. Just wondering whether anyone else has had a similar experience.
For WebJobs SDK, your job must be running in order to listen for triggers (new queue messages, new blobs, etc). Azure Websites free tier has quotas and will put your job to sleep which means it's no longer listening on triggers. Using the site may cause it to come back to life and start listening to triggers again.
The SDK dashboard will show a warning icon next to functions if the hosting job is not running (it detects this via heartbeats).
Make sure that your website is configured with the "Always On" setting Enabled.
If your site contains continuously running jobs they may not perform reliably if this setting is disabled.
http://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
By default, web sites are unloaded if they have been idle for some period of time. This lets the system conserve resources. You can enable the Always On setting for a site in Standard mode if the site needs to be loaded all the time. Because continuous web jobs may not run reliably if Always On is disabled, you should enable Always On when you have continuous web jobs running on the site.