Very suddenly without any changes or recent access my Azure virtual server is no longer available for RDP or web...I have logged into the azure control panel and everything appears to running without issue but it is not working.
I have checked the end points and they are present for both RDP and Web, totally weird.
I have 2 virtual servers and the other one is working fine and responding.
Anyone ever experience this? Just when my client wants to view his website as well...
http://cn-web-02.cloudapp.net is the URL
TIA
As I just answered for this question, Virtual Machines are in Preview and not in Production yet. There are several reasons why your Virtual Machines became unavailable (see other answer). Given that this is the second reported incident here today, it's a good guess it's related to the underlying Host OS being updated, which would take your Virtual Machine offline for a short period of time.
I tried your URL and it's available again. Just remember about this being in Preview, especially since you mention having a client that wants to view his website. If you put a production website in Virtual Machines, then you'll have to absorb the risk of not having an SLA.
Having said that: You can mitigate downtime risk by running two Virtual Machines, listening on a load-balanced input endpoint. Be sure to have both Virtual Machines in the same Availability Set. Doing that ensures that the Windows Azure fabric controller will not take both Virtual Machines offline at the same time when doing things like Host OS updates. If this were in Production, you'd then have a very high availability scenario. Even in Preview, you'll improve availability by taking advantage of Availability Sets. Note: You'll need to use some type of shared session cache, since visitors will now be sent to either one of your Virtual Machines.
I had same experience on it! We had 2 instances and all of its were re-imaged without any notified. I known it since we made some local change via RDP.
Reboot or Reimage may help! You may try!
Turns out it was an outage from Microsoft...for over 22 hours but everything is back up and running. This is the 2nd time in 6 months this has happened for long stretches...makes me a little nervous to say the least.
Thanks for the input everyone and for anyone that's interested MS have a good site that tracks the service levels on Azure. Windows Azure Service Dashboard
S
Related
This morning I found 5 of my Azure Virtual machines to be stuck in Starting mode.
All other VMs are working ok.
I managed to stop the VMs using the Azure command shell and then start them again but they are still stuck in starting mode with no end in sight.
It has now been over 5 1/2 hours and still stuck in starting mode.
I have contacted Microsoft support but they are taking hours to respond :(((
The Azure Status page doesn't show anything is wrong in my region.
Has anybody else experiencing this problem?
We've had the same issue and it's linked to a big issue Azure is having this morning.
The trick we used in order to get the instance running again is:
1. stop the VMs via Powershell
2. change the size of the vm and back (preferably from A to D as this is different hardware)
3. start the VM
We also have people complaining about RDP not working where reboots fixed the problem.
There are currently some problems with Azure, including the VM service. Also the status page does not reflect all of the problems. Here you have to keep in mind that this page also show impacts affecting most of the service customers. It does not reflect minor outages to single customers. You should keep an eye at the Azure blog which possibly gives a statement related to the current problems.
What works for me is a redeploy of the Virtual Machine within the Azure Portal whenever it gets stuck at "Starting...". Altho it takes half an hour to redeploy, it solves the issue. More details here.
Same problem I experienced and what I did is I resized Virtual Machine's Disk Size, You can go for increasing the whole VM size / power but for me the Disk size fixed it, probably it was updating and the disk file ran out.
I have a website (orders.cpidealers.com) running on an Azure Virtual Machine currently configured to Basic, A2 (2 cores, 3.5 GB memory) monitoring 3 endpoints.
Every morning since Tuesday, June 24,
The website has been unavailable (the browser just spins, I don't even get a 401 or any error)
I can't RDP into the virtual machine,
The endpoint status shows a warning triangle (although when I click on the link next to it some say Not Available while others give a time, I'm not sure I know how to translate the endpoint status box).
To resolve the problem, I login to Azure and restart the Virtual Machine. So far, everything seems to work fine for the remainder of the day until I arrive to work in the morning at 7:30 (Mountain Time).
Any suggestions on how to troubleshoot this?
Well, it seems to me like your app somehow manages to hang IIS by wasting resources. Cant tell you more without any data. You should enable some performance counters monitoring and see what is going on.
http://azure.microsoft.com/en-us/documentation/articles/cloud-services-dotnet-use-performance-counters/
http://www.codeproject.com/Articles/303686/Windows-Azure-Diagnostics-Performance-Counters-In
It looks like the system was hanging as Rouen mentioned. From that, we found this article which seems to have resolved the problem: IIS: Web Application hangs periodically needs system reboot
Here is everything my developer did:
I changed a few other things on the server. Set the sql server to never auto close, which should help the performance in the morning, set the gupdate to manual ( we did that together ) and then I found this article, which seems an exact case for our problem so I set the Credentials Manager to automatic and restarted.
IIS: Web Application hangs periodically needs system reboot
I am on a Windows Azure trial to evaluate migrating a number of commercial ASP.NET sites to Azure from dedicated hosting. All was going OK ... until just now!
Some background - the sites are set up under Web Roles (i.e. as opposed to Web Sites) using SQL Azure and SQL Reporting. The site content was under the X: drive (there was also a B: drive that seemed to be mapped to the same location). There are several days left of the trial.
Without any apparent warning my test sites suddenly stopped working. Examining the server (through RDP) I saw that the B: and X: drives had disappeared (just C: D & E: I think were left), and in IIS the application pools and Sites had disappeared. In the Portal however, nothing seemed to have changed - the same services & config seemed to be there.
Then about 20 minutes later the missing drives, app pools and sites reappeared and my test sites started working again! However, the B: drive was gone and now there was an F: drive (showing the same as X:); also the MS ReportViewer 2008 control that I had installed earlier in the day was gone. It is almost as if the server had been replaced with another (but the IIS config was restored from the original).
As you can imagine, this makes me worried! If this is something that could happen in production there is no way I would consider hosting commercial sites for clients on Azure (unless there is some redundancy system available to keep a site up when such a failure occurs).
Can anyone explain what may have happened, if this is possible/predictable under a live subscription, and if so how to work around it?
One other thing to keep in mind is that an Azure Web Role is not persistent. I'm not sure how you installed the MS Report Viewer 2008 control but anything you add or install outside of a deployment package when you push your solution to Azure is not guaranteed to be available at some future point.
I admit that I don't fully understand the full picture when it comes to the overall architecture of Azure but I do know that Web Roles can and do re-create themselves from time to time. When the role recycles, it returns to the state as it was when it was installed. This is why Microsoft suggests using at least 2 instances of your role because while one or the other may recycle they will never recycle both at the same time, part of what guarantees the 99.9% uptime.
You might also want to consider an Azure VM. They are persistent but require you to maintain the server in terms of updates and software much in the way I suspect you are already doing with your dedicated hosting.
I've been hosting my solution in a large (4 core) web role, also using SQL Azure, for about two years and have had great success with it. I have roughly 3,000 users and rarely see the utilization of my web role go over 2% (meaning I've got a lot of room to grow). Overall it is a great hosting solution in my opinion.
According to the Azure SLA Microsoft guarantees up time of 99.9% or higher on all its products per billing month. (20 min on the month would be .0004% loss, not being critical, just suggesting that they are still within their SLA)
Current status shows that sql databases were having issues in the US north last night, but all services appear to be up currently
Personally, I have seen the dashboard go down, and report very weird problems, but the services that I programmed to worked just fine all the way through it. When I experienced this problem it was reported on the Azure Status, the platform status and the twitter feed
While I have seen bumps, they are few and far between, and I find reliability to be perceptibly higher than other providers that I have worked with.
As for workarounds I would suggest a standard mode for your websites and increasing instances of the site. You might try looking into the new add ins that are available with the latest Azure release. Active Cloud Monitoring by Metrichub might be what you require.
It sounds like you're expecting the web role to act as a Virtual Machine instance.
Web Roles aren't persistent (the machine can be destroyed and recreated at any time), so you should do any additional required set up as a 'startup task' in your Azure project (never install software manually).
Because of this you need at least 2 instances so that rolling upgrades (i.e. Windows security patches, hotfixes and so on) can be performed automatically without having your entire deployment taken offline.
If this doesn't suit your use case then you should look at Azure Virtual Machines, but you'll need to manage updates and so on yourself. It's usually better to use Web Roles properly as you can then do scaling and so on a lot more easily.
I have 2 VM machines on Azure.
One suddenly stopped responding.
It was down for around 30 minutes, until I just browsed into the Azure portal, and then I saw it was in the Starting state, and then it was up & running again.
How can I tell why my VM was shutdown?
EDIT: I'm assuming you're talking about Virtual Machines (IaaS), and not Cloud Services (PaaS).
Virtual Machines can, and will, restart, for several reasons. For example:
Hardware failure, where your Virtual Machine will then be restarted on another server.
Host OS refresh. This is the operating system running the physical server.
Some type of OS crash
Also keep in mind: Virtual Machines are in Preview with no SLA today. So there wouldn't be any information readily available to you for determining why your Virtual Machine became unavailable.
If it was unavailable for 30 minutes, then this hints at something akin to a host OS update or your virtual machine being moved. If it was down for, say, 5 minutes, then I'd guess it was an OS crash.
UPDATE I just looked at the Azure Dashboard which is showing degraded Compute with Virtual Machines (see RSS feed with problem description). Perhaps this is the root cause of your particular outage...
there are several things that might cause this to happen, your VM may have been crashed due to bad coding or bad development, the second reason I think is that the number of VM you created is not enough to the incoming traffic. this could cause your VM to restart if the number of incoming traffic is more than the number of VMs can handle.
I've been working with Windows Azure and Amazon Web Services EC2 for a good many months now (almost getting to the years range) and I've seen something over and over that seems troubling.
When I deploy a .NET build into Windows Azure into a web role (or service role) it takes usually 6-15 minute for it to startup. In AWS's EC2 it takes about the same to startup the image and then a minute or two to deploy the app to IIS (pending of course its setup).
However when I boot up an AWS instance with SUSE Linux & Mono to run .NET, I get one of these booted and deploy code to it in about 2-3 minutes (again, pending it is setup).
What is going on with Windows OS images that cause them to take soooo long to boot up in the cloud? I don't want FUD, I'm curious about the specific details of what goes on that causes this. Any specific technical information regarding this would be greatly appreciated! Thanks.
As announced at PDC, Azure will soon start to offer full IIS on Azure web roles. Somewhere in the keynote demo by Don Box, he showed that this allows you to use the standard "publish" options in Visual Studio to deploy to the cloud very quickly.
If I recall correctly, part of what happens when starting a new Azure role is configuring the network components, and I remember some speaker at a conference mentioning once that that was very time consuming. This might explain why adding additional instances to an already running role is usually faster (but not always: I have seen this take much more than 15 minutes as well on ocassion).
Edit: also see this PDC session.
I don't think the EC2 behavior is specific to the cloud. Just compare boot times of Windows and Linux on a local system - in my experience, Linux just boots faster. Typically, this is because the number of services/demons launched is smaller, as is the number of disk accesses that each of them needs to make during startup.
As for Azure launch times: it's difficult to tell, and not comparable to machine boots (IMO). Nobody knows what Azure does when launching an application. It might be that they need to assemble the VM image first, or that a lot of logging/reporting happens that slows down things.
Don't forget, there is a Fabric controller that needs to check for fault zones and deploy your VMs across multiple fault zones (to give you high availability, at least when there are more than two instances). I can't say for sure, but that logic itself might take some extra time. This might also explain why network setup could be a little complicated.
This will of course explain the difference (if any) between boot times in the cloud and boot times for windows locally or in Amazon. Any difference in operating systems is completely dependent on the way the OS is built!