Azure web jobs and always on

Azure web jobs and always on - azure

I have read that web jobs require "always on" enabled
http://blog.amitapple.com/post/73574681678/git-deploy-console-app/
I have a free web site and my job continues to run also after 20 minutes. As I'm writing, it runs since more than two hour. Why? Perhaps because my site has traffic so the app-pool continue to be "up and running"?

While your job continued to run more than 20 minutes there is no promise from Windows Azure Websites for that, so if you need to rely on that you'll need to enable "always on", also free sites have a limitation on the amount of CPU they can use so even if your web job will be always on, this limitation will most likely be reached.
One reason why your job kept going could be because you kept checking it (in the dashboard), looking at the dashboard for the continuous jobs will also start them if they went down (due to "always on" not set).

Related

How cloud services are provisioned (and billed) once a new deployment is requested to Azure REST API?

I'm using Azure REST API to create, deploy and start a Cloud Service (classic) (cspkg hosted in Azure Storage) with hundreds of instances. I'm noticing that time Azure takes to provision and start the requested instances is really heterogeneous. First instances might start in 6-7 minutes but last ones might take up to 15-20 minutes, about 10 minutes longer than first ones. So my questions are:
Is this the expected behaviour? If so, what's the logic behind this? Could I do anything to speed things up?
How is Azure billing this? Is it counting the total count of instances since the very initial time when Cloud Service is deployed? or is it taking into account the specific timing on each individual instance?
UPDATE: I've been testing more scenarios and I've found a puzzling surprise. If I replace all the processes that my Cloud Service instances should run by a simple wait for some minutes (run .bat file with timeout command) then all the instances start almost at the same time (about 15 seconds between fastest and slowest instance). It was not just luck and random behaviour, I've proved that this behavior is repeatable and I can't even try to explain the root reason.

I also checked this a few weeks ago, and the startup time, depends on the size of the machine, if it is large it has more resources, so the boot time is faster, and also, if there is any error, exception on startup the VM will recycle till it can successfully start. I googled it, but did not find any solution to speed this up, so I don't think it is possible to do anything about the startup time. In the background every time when you deploy something, it will create a Windows Server, and boot it up and deploy your package on it and puts your web roles behind load balancer, this is why it takes so long, because a lot of things are happening.
The billing part is also not the best for the classic cloud services, you have to pay for it even during the startup and recycle, and even when it is turned off, so if you are done with your update, you should delete the VMs from your staging slot or scale it down, because you will pay for it even if it is turned off.

Azure App Service: How can I determine which process is consuming high CPU?

UPDATE: I've figured it out. See the end of this question.
I have an Azure App Service running four sites. One of the sites has two deployment slots in addition to the primary one. Recently I've been seeing really high CPU utilization for the App Service plan as a whole.
The dark orange line shows the CPU percentage. This is just after restarting all my sites, which brought it down to this level.
However, when I look at the CPU use reported by each site, it's really low.
The darker blue line shows the CPU time, which is basically nothing. I did this for all of my sites, and all the graphs look the same. Basically, it seems that none of my sites are causing the issue.
A couple of the sites have web jobs, so I took a look at the logs but everything is running fine there. The jobs run for a few seconds every few hours.
So my question is: how can I determine the source of this CPU utilization? Any pointers would be greatly appreciated.
UPDATE: Thanks to the replies below, I was able to get more detail into what was happening. I ended up getting what I needed from SCM / Kudu tools. You can get here by going to your web app in Azure and choosing Advanced Tools from the side nav. From the Kudu dashboard, choose Process Explorer. The value in the Total CPU Time column is not directly useful, because it's the time in seconds that the process has run since it started, which might have been minutes or days ago.
However, if you make a record of the value at intervals, you can look at the change over time, and one process might jump out at you. In my case, it was my WebJobs process. Every 60 seconds, this one process was consuming about 10 seconds of processor time, just within one environment.
The great thing about this Kudu dashboard is, if you can catch the problem while it is actually happening, you can hit the Start Profiling button and capture a diagnostic session. You can then open this up in Visual Studio and get some nice details about where the CPU time is being spent.
Just in case anyone else is seeing similar issues, I'll provide more details about my particular case. As I mentioned, my WebJobs exe was the culprit, and I found that all the CPU time was being spent in StackExchange.Redis.SocketManager, which manages connections to Azure Redis Cache. In my main web app, I create only one connection, as recommended. But Since my web jobs only run every once in a while, I was creating a new connection to Azure Redis Cache each time one ran, which apparently can lead to issues. I changed my code to create the Redis Cache connection once when the WebJob process starts up and use the existing connection when any individual WebJob runs.
Time will tell if this really fixes the issue, but I think it will. When the problem occurred, it always fit the same pattern: After a few days of running fine, my CPU would slowly ramp up over the course of about 12 hours. My thinking is that each time a WebJob ran, it created a connection object, which at first didn't produce trouble, but gradually as WebJobs ran every hour or two, cruft was building up until finally some critical threshold was met and the CPU usage would take off.
Hope this helps someone out there. Best wishes!

May be you should go to webApp scm?
%yourAppName%.scm.azurewebsites.com;
There is a page, that can show you all process, that runned now on your web app. (something like Console > Process).
Also you can go to support page (from scm right corner).
You can find some more info about your performance there, and make memory dump (not for this problem, but it useful for performance issues).

According to your description, I assumed that you could leverage the Crash Diagnoser extension to capture dump files from your Web Apps and WebJobs when the CPUs usage percentage is higher than the specific threshold to isolate this issue. For more details, you could refer to this official blog.

How does one know why an Azure WebSite instance(WebApp) was shutdown?

By looking at my Pingdom reports I have noted that my WebSite instance is getting recycled. Basically Pingdom is used to keep my site warm. When I look deeper into the Azure Logs ie /LogFiles/kudu/trace I notice a number of small xml files with "shutdown" or "startup" suffixes ie:
2015-07-29T20-05-05_abc123_002_Shutdown_0s.xml
While I suspect this might be to do with MS patching VMs, I am not sure. My application is not showing any raised exceptions, hence my suspicions that it is happening at the OS level. Is there a way to find out why my Instance is being shutdown?
I also admit I am using a one S2 instance scalable to three dependent on CPU usage. We may have to review this to use a 2-3 setup. Obviously this doubles the costs.
EDIT
I have looked at my Operation Logs and all I see is "UpdateWebsite" with status of "succeeded", however nothing for the times I saw the above files for. So it seems that the "instance" is being shutdown, but the event is not appearing in the "Operation Log". Why would this be? Had about 5 yesterday, yet the last "Operation Log" entry was 29/7.
An example of one of yesterday's shutdown xml file:
2015-08-05T13-26-18_abc123_002_Shutdown_1s.xml

You should see entries regarding backend maintenance in operation logs like this:
As for keeping your site alive, standard plans allows you to use the "Always On" feature which pretty much do what pingdom is doing to keep your website warm. Just enable it by using the configure tab of portal.
Configure web apps in Azure App Service
https://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/

Every site on Azure runs 2 applications. 1 is yours and the other is the scm endpoint (a.k.a Kudu) these "shutdown" traces are for the kudu app, not for your site.
If you want similar traces for your site, you'll have to implement them yourself just like kudu does. If you don't have Always On enabled, Kudu get's shutdown after an hour of inactivity (as far as I remember).
Aside from that, like you mentioned Azure will shutdown your app during machine upgrade, though I don't think these shutdowns result in operational log events.

Are you seeing any side-effects? is this causing downtime?
When upgrades to the service are going on, your site might get moved to a different machine. We bring the site up on a new machine before shutting it down on the old one and letting connections drain, however this should not result in any perceivable downtime.

WebJob doesn't Trigger

I've created a simple Azure WebJob that uses a QueueInput trigger. It deployed without any problems and I've schedule it via the management portal so that it 'Runs continuously'
Initial testing seemed fine, with the job triggering shortly after placing anything in the queue.
By chance I then left it about a day before placing anything else in the queue. This time the job hadn't triggered within a few minutes so I logged in to the portal to view the invocation logs - which showed that the job had just that moment been triggered.
That seemed too much of a coincidence so I left it another day before placing something in the queue. Again, the job didn't trigger. I left it overnight and by morning it still hadn't triggered.
When I logged in to the management portal this time I noticed that the job was marked as 'Aborted' on the WebJobs page. It was like that only for about 10 seconds before the status changed to 'Running'. And then the job immediately triggered from what was placed in the queue the night before, as expected.
As it's an alpha release I'm expecting glitches. Just wondering whether anyone else has had a similar experience.

For WebJobs SDK, your job must be running in order to listen for triggers (new queue messages, new blobs, etc). Azure Websites free tier has quotas and will put your job to sleep which means it's no longer listening on triggers. Using the site may cause it to come back to life and start listening to triggers again.
The SDK dashboard will show a warning icon next to functions if the hosting job is not running (it detects this via heartbeats).

Make sure that your website is configured with the "Always On" setting Enabled.
If your site contains continuously running jobs they may not perform reliably if this setting is disabled.
http://azure.microsoft.com/en-us/documentation/articles/web-sites-configure/
By default, web sites are unloaded if they have been idle for some period of time. This lets the system conserve resources. You can enable the Always On setting for a site in Standard mode if the site needs to be loaded all the time. Because continuous web jobs may not run reliably if Always On is disabled, you should enable Always On when you have continuous web jobs running on the site.

DotNetNuke on Windows Azure Websites performance

I am evaluating the Windows Azure WebSites Preview (WAWS I think, not sure with all these changing names and acronyms that Microsoft loves to mutate on) with DotNetNuke (DNN) which I am also using for years on a "non cloud" V-Server. Installation was a breeze. I only tried the free shared instance and I have tested with 1 and with 3 active instances with similar results.
First hit performance always was a problem with my previous DNN installations, when a website was idle for a while (15 minutes or so) the process would stop and then the next unlucky visitor will wait at least around 20 seconds. With some IIS tweaking it was possible to minimize this problem but I had the best results with a monitoring service that will request a page from DNN every five minutes and keep the process up.
While surfing the DNN page usually performs well on WAWS, I immediately noticed that the "first hit" problem is an issue with DNN on WAWS so I configured a monitoring service for the page. That did not help and the monitoring service will always report that the site is down. Almost as if WAWS was trying to avoid keeping the site up since it detected that only a monitoring service was requesting the page.
Also, when navigating on the DNN pages and then pausing for just a minute or two, I will often get an "Internet Explorer could not load this page" error with no specific error code.
Do others have experience with the DNN performance on WAWS or maybe know why the "first hit" is such a problem?

I suspect that Microsoft is actively trying to avoid the keep-alive tricks that many ASP.Net devs use. WAWS, like many shared hosting platforms, relies on only having a certain number of active websites on the server at any one time in order to achieve higher server densities and keep the cost of hosting under control. This is one of the reasons that they can offer this service for free.

I think what you want to look into is "keep alive."
What you are experiencing is the ASP .NET process getting killed for your application due to inactivity. When the process isn't in memory and the site is accessed IIS has to spin it back up which is the 10 - 20 second lag you get upon accessing your site as the process gets up again and/or just in time compiles.
You can schedule some 3rd party monitoring services to check your site every 10 minutes via an HTTP request that will keep your site up. Just pinging it will not keep it up.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string