Deploying Windows Azure, DNS name not working - azure

I have two free subscriptions for windows azure and because I exceeded the limit on the first one, Microsoft closed it down. So I tried to deploy my application from the other subscription, and changed a few settings, and it seems to take a lot longer and the dns name of the depolyed application (in production area) does not seem to work. (I've been waiting for about 15 minutes.. in the other subscription it was almost immediate that the link started to work..). Also my webrole seems to be in a state of busy for a very long time..
The application always worked fine and now I'm getting all this trouble just by switching subscription?? I'm getting really frustrated with this especially because I all worked perfectly before. Now I have to 'waste' my time getting all the things to work again and I can't start with anything new. I don't think this is normal but I can't seem to find the solution to this either.
edit:
Over half an hour the dns finally started working but this still does not fix the problem with the extreme slow deploying and the busy state of the webrole..

Please study the discussion below to understand why the time to deploy an application could vary between 10-30 minutes:
Is there a way to reduce time between Azure deployment start and role OnStart() code being invoked?
Above details will helped you to get the answer about your statement ".. this still does not fix the problem with the extreme slow deploying and the busy state of the webrole.."..
To add more about that, when your application is deployment phase it goes through several state and in some cases the time taken in one state could be longer then expected and during this time you will see status as "Busy", "Initializing", "Starting.." etc and these state actually explain which level you are during your deployment. I hope this helps you to understand the time taken during deployment.

Related

How to debug Azure swapping process (sometimes bringing site down)

We have a pretty large project that is running on Azure. For some reason swap times became really slow recently, like at least 10 minutes.
Somtimes during the swap the site becomes superslow, like that it doesn't respond for minutes.
Other times the swap just doesn't work for one reason or another.
We are using initializationPage to warmup the most specific pages, but it doesn't seem to help.
Question
Is it possible to see what's going on during the swap? I'm trying to debug why it's so slow. Is there any log that I can see why it's stuck on what?
We can't deploy emergency fixes without bringing the whole site down. and sometimes the whole site goes down.
Any help to debug swapping problems would greatly appreciated.
Update
I found the following in 'Activity log' on the Azure Portal, but I still can't find any details or any hint what is going on exactly.
So: The resource operation completed with terminal provisioning state 'Failed'.
Where can I find details? It really annoys me that I have to buy Azure Developer support while I'm spending hundreds euros per month already on something that seems broken or at least very uninformative about what is going wrong.
So: The resource operation completed with terminal provisioning state 'Failed'.
Where can I find details?
Microsoft has a few things that may help you.
You can view the operations for a deployment through the Azure portal.
You may be most interested in viewing the operations when you have
received an error during deployment so this article focuses on viewing
operations that have failed. The portal provides an interface that
enables you to easily find the errors and determine potential fixes.
The "View deployment operations with Azure Resource Manager" is directly from Microsoft it has several steps to follow. Follow the URL: Microsoft
I hope this helps.

Azure App Service: How can I determine which process is consuming high CPU?

UPDATE: I've figured it out. See the end of this question.
I have an Azure App Service running four sites. One of the sites has two deployment slots in addition to the primary one. Recently I've been seeing really high CPU utilization for the App Service plan as a whole.
The dark orange line shows the CPU percentage. This is just after restarting all my sites, which brought it down to this level.
However, when I look at the CPU use reported by each site, it's really low.
The darker blue line shows the CPU time, which is basically nothing. I did this for all of my sites, and all the graphs look the same. Basically, it seems that none of my sites are causing the issue.
A couple of the sites have web jobs, so I took a look at the logs but everything is running fine there. The jobs run for a few seconds every few hours.
So my question is: how can I determine the source of this CPU utilization? Any pointers would be greatly appreciated.
UPDATE: Thanks to the replies below, I was able to get more detail into what was happening. I ended up getting what I needed from SCM / Kudu tools. You can get here by going to your web app in Azure and choosing Advanced Tools from the side nav. From the Kudu dashboard, choose Process Explorer. The value in the Total CPU Time column is not directly useful, because it's the time in seconds that the process has run since it started, which might have been minutes or days ago.
However, if you make a record of the value at intervals, you can look at the change over time, and one process might jump out at you. In my case, it was my WebJobs process. Every 60 seconds, this one process was consuming about 10 seconds of processor time, just within one environment.
The great thing about this Kudu dashboard is, if you can catch the problem while it is actually happening, you can hit the Start Profiling button and capture a diagnostic session. You can then open this up in Visual Studio and get some nice details about where the CPU time is being spent.
Just in case anyone else is seeing similar issues, I'll provide more details about my particular case. As I mentioned, my WebJobs exe was the culprit, and I found that all the CPU time was being spent in StackExchange.Redis.SocketManager, which manages connections to Azure Redis Cache. In my main web app, I create only one connection, as recommended. But Since my web jobs only run every once in a while, I was creating a new connection to Azure Redis Cache each time one ran, which apparently can lead to issues. I changed my code to create the Redis Cache connection once when the WebJob process starts up and use the existing connection when any individual WebJob runs.
Time will tell if this really fixes the issue, but I think it will. When the problem occurred, it always fit the same pattern: After a few days of running fine, my CPU would slowly ramp up over the course of about 12 hours. My thinking is that each time a WebJob ran, it created a connection object, which at first didn't produce trouble, but gradually as WebJobs ran every hour or two, cruft was building up until finally some critical threshold was met and the CPU usage would take off.
Hope this helps someone out there. Best wishes!
May be you should go to webApp scm?
%yourAppName%.scm.azurewebsites.com;
There is a page, that can show you all process, that runned now on your web app. (something like Console > Process).
Also you can go to support page (from scm right corner).
You can find some more info about your performance there, and make memory dump (not for this problem, but it useful for performance issues).
According to your description, I assumed that you could leverage the Crash Diagnoser extension to capture dump files from your Web Apps and WebJobs when the CPUs usage percentage is higher than the specific threshold to isolate this issue. For more details, you could refer to this official blog.

Azure VIP swap takes more than a minute

....and the site doesn't respond in that time at all. Some responses even fail.
This is giving a horrible user experience where the site suddenly stops reacting for the people that are online at that moment.
The site is completely warmed up and responsive through the temporarily url.
Here is the log:
Is there anything to speed this up?
Is this normal behaviour or is this an error that should be reported?
From the log you posted it seems that the operation that consumes a long time is "ChangeDeploymentConfigurationBySlot".
An azure configuration change by default triggers the restart of the role instance (which might explain why your clients are experiencing downtime).
You can get further details in this blog: https://alexandrebrisebois.wordpress.com/2013/09/29/handling-cloud-service-role-configuration-changes-in-windows-azure/

Intermittent Microsoft Azure Web Site access failure

I have a number of small MVC apps deployed as Microsoft Windows Azure websites. This has been working for several months.
Yesterday I rolled out a new one, and the deployment was unremarkable, everything worked fine. But a couple of hours later, access to the site was unavailable. The symptoms were that when the browser tried to navigate to the URL for that site, it would try to load for several minutes and then just give up with a completely blank page.
I attempted to stop and restart the site, and it worked once, but the symptoms came back several minutes later. Then I tried to stop and restart, and it didn't work.
I deployed the identical app to three additional URLs. Again, immediately on deployment, they all work fine, however, they fail at some interval in the future. They seem to not all fail at once. Sometimes restarting the site will fix the problem, and sometimes not.
IMPORTANT: If I wait for some period of time, the site may start to work again on its own.
However, deploying four versions of the app so that our users can go to a backup one if the primary one is not working is not optimal.
Any words of wisdom as to how I might go about debugging this?
ADDITIONAL INFO NOV 25, 2013:
When sites are failing, the IIS logs show either 500 or 502 Internal Service Errors. Our own MVC code is never hit, not even app_start.
You can start by checking the logs and remote debugging
http://www.drdobbs.com/windows/azure-sdk-22-supports-visual-studio-2013/240163499
Are the apps working locally?
Might not be the same problem, but from time to time our Azure instances will get the blue question mark of death as a status.
The reason we found out was that Microsoft will do upgrades on instances from time to time. If you have just one instance in a cloud service/role, then from time to time they will do maintenance and during that time it will be dead.
I have confirmed this with their support.
The only way to get around this that I know of is to create two instances. Then Microsoft guarantees ~99% availability.
Of course I also confirmed with them that this means twice the cost. =/
If that's not the issue I would enable RDP and get onto the machine to see what the problem is. Microsoft has these tools to help debug problems: http://blogs.msdn.com/b/kwill/archive/2013/08/26/azuretools-the-diagnostic-utility-used-by-the-windows-azure-developer-support-team.aspx
First, you should always run multiple instances of your web role with more than 1 upgrade domain. This is configurable in the service definition (CSDEF). Without this, you don't get an SLA from Microsoft, so you can't really complain that the VMs go down.
Second, to figure out what might be going on with these boxes, you should have both logs (my preference is to roll my own with page blobs or table storage), AND you should always have RDP access to a pre-production environment (production as well if you're not too fussed about security). Once on the box, look through the event viewer for errors.
Third, when an outage occurs check out the azure service dashboard (http://www.windowsazure.com/en-us/support/service-dashboard/) for outages.
Lastly, contact Microsoft support. It may take a few hours, but they are pretty good.
That it is happening repeatedly and for extended periods of time (more than 5 minutes), I would be there's something wrong with your hosted service. Again, RDP in and poke around. Good luck.
To debug your sites try to enable diagnostic logs:
http://www.windowsazure.com/en-us/develop/net/common-tasks/diagnostics-logging-and-instrumentation/
Another nice way to look around your site is using the debug console:
https://github.com/projectkudu/kudu/wiki/Kudu-console

Worker Role goes Cycling... after some time, who can I get an alert?

I have a worker role deployed that works fine for a period of time (days...) but at some point it stops or crashes, then it can't restart at all and stays "Cycling...". The only solution is to Reimage the Role.
How can I set an automatic alert so I get an email when the Role becomes unresponsive (and Cycling...) ?
Thanks
Alerts or notifications like this are not available today, but they are being worked on. If this is causing service interruptions you could always sign up for an external monitoring service which will send you alerts whenever your site is down.
However, I would recommend solving the root cause of the problem rather than just Reimaging it to fix the symptom. Here is how I would start:
You are most likely hitting the issue described in http://blogs.msdn.com/b/kwill/archive/2012/09/19/role-instance-restarts-due-to-os-upgrades.aspx. In particular, see #1 under Common Issues where it talks about common causes for a role to not restart properly after being rebooted due to OS updates. Notice that #1 also talks about how to simulate these types of Azure environment issues (ie. manually do a Reboot from the portal) so you can reproduce the failure and debug it.
To troubleshoot the issue I would recommend reading through the troubleshooting series at http://blogs.msdn.com/b/kwill/archive/2013/08/09/windows-azure-paas-compute-diagnostics-data.aspx. Of particular interest to you is probably the "Troubleshooting Scenario 2 – Role Recycling After Running Fine For 2 Weeks"
Azure cannot notify you of such conditions. Consider placing a try/catch around your loop in the WorkerRole with a catch that can email you in case of an issue.
Alternatively, if you're open to using third party services, consider AzureWatch (I'm affiliated with the product). It can alert you in case your instance becomes Unresponsive, Busy, or goes thru other non-Ready status

Resources