I've been working with Windows Azure and Amazon Web Services EC2 for a good many months now (almost getting to the years range) and I've seen something over and over that seems troubling.
When I deploy a .NET build into Windows Azure into a web role (or service role) it takes usually 6-15 minute for it to startup. In AWS's EC2 it takes about the same to startup the image and then a minute or two to deploy the app to IIS (pending of course its setup).
However when I boot up an AWS instance with SUSE Linux & Mono to run .NET, I get one of these booted and deploy code to it in about 2-3 minutes (again, pending it is setup).
What is going on with Windows OS images that cause them to take soooo long to boot up in the cloud? I don't want FUD, I'm curious about the specific details of what goes on that causes this. Any specific technical information regarding this would be greatly appreciated! Thanks.
As announced at PDC, Azure will soon start to offer full IIS on Azure web roles. Somewhere in the keynote demo by Don Box, he showed that this allows you to use the standard "publish" options in Visual Studio to deploy to the cloud very quickly.
If I recall correctly, part of what happens when starting a new Azure role is configuring the network components, and I remember some speaker at a conference mentioning once that that was very time consuming. This might explain why adding additional instances to an already running role is usually faster (but not always: I have seen this take much more than 15 minutes as well on ocassion).
Edit: also see this PDC session.
I don't think the EC2 behavior is specific to the cloud. Just compare boot times of Windows and Linux on a local system - in my experience, Linux just boots faster. Typically, this is because the number of services/demons launched is smaller, as is the number of disk accesses that each of them needs to make during startup.
As for Azure launch times: it's difficult to tell, and not comparable to machine boots (IMO). Nobody knows what Azure does when launching an application. It might be that they need to assemble the VM image first, or that a lot of logging/reporting happens that slows down things.
Don't forget, there is a Fabric controller that needs to check for fault zones and deploy your VMs across multiple fault zones (to give you high availability, at least when there are more than two instances). I can't say for sure, but that logic itself might take some extra time. This might also explain why network setup could be a little complicated.
This will of course explain the difference (if any) between boot times in the cloud and boot times for windows locally or in Amazon. Any difference in operating systems is completely dependent on the way the OS is built!
Related
I'm trying to move some computations to Azure cloud services. One of the steps of the workflow I'm trying to implement includes running a Win32 desktop application generating a file. Obviously, we cannot have a user interaction for cloud calculations, so the application is launched with command line arguments. The process starts, generates a file, and then exists. At the moment I cannot refactor the code and move this functionality to command-line windowless utility.
First, I chose Azure Functions because they are intended for event-driven short calculations, and that's exactly what I need. Also they are cheap. But I encountered a problem that processes in Azure Functions are being executed inside a sandbox blocking User32/GDI32 system calls and thus preventing me from launching desktop applications.
Another solution I came up with is mounting a virtual machine drive with all needed Visual C++ redistributables installed and then using Azure Batch with nodes based on the pre-configured drive. But this solution has another drawbacks, since it takes minutes to mount a new node. Of course, I could have some nodes that are always active, but anyway the further scaling is slow and having active nodes is not so cheap. Also I have a feeling that Azure Batch is a bit overkill, because there is no need for HPC in my case. Azure Functions' computation capabilities are enough for me.
Is there some kind of compromise solution? So that I would have a solution with fast scaling and quick responses, but with no need to establish Azure Batch based on Azure Virtual Machines?
A lot of GDI32 calls are available now but in a containerized form.
So, you can deploy a function with the desktop application but inside a docker container.
Refer the following articlefor more explanation.
Refer the following documentation on how to deploy containerized function.
I am on a Windows Azure trial to evaluate migrating a number of commercial ASP.NET sites to Azure from dedicated hosting. All was going OK ... until just now!
Some background - the sites are set up under Web Roles (i.e. as opposed to Web Sites) using SQL Azure and SQL Reporting. The site content was under the X: drive (there was also a B: drive that seemed to be mapped to the same location). There are several days left of the trial.
Without any apparent warning my test sites suddenly stopped working. Examining the server (through RDP) I saw that the B: and X: drives had disappeared (just C: D & E: I think were left), and in IIS the application pools and Sites had disappeared. In the Portal however, nothing seemed to have changed - the same services & config seemed to be there.
Then about 20 minutes later the missing drives, app pools and sites reappeared and my test sites started working again! However, the B: drive was gone and now there was an F: drive (showing the same as X:); also the MS ReportViewer 2008 control that I had installed earlier in the day was gone. It is almost as if the server had been replaced with another (but the IIS config was restored from the original).
As you can imagine, this makes me worried! If this is something that could happen in production there is no way I would consider hosting commercial sites for clients on Azure (unless there is some redundancy system available to keep a site up when such a failure occurs).
Can anyone explain what may have happened, if this is possible/predictable under a live subscription, and if so how to work around it?
One other thing to keep in mind is that an Azure Web Role is not persistent. I'm not sure how you installed the MS Report Viewer 2008 control but anything you add or install outside of a deployment package when you push your solution to Azure is not guaranteed to be available at some future point.
I admit that I don't fully understand the full picture when it comes to the overall architecture of Azure but I do know that Web Roles can and do re-create themselves from time to time. When the role recycles, it returns to the state as it was when it was installed. This is why Microsoft suggests using at least 2 instances of your role because while one or the other may recycle they will never recycle both at the same time, part of what guarantees the 99.9% uptime.
You might also want to consider an Azure VM. They are persistent but require you to maintain the server in terms of updates and software much in the way I suspect you are already doing with your dedicated hosting.
I've been hosting my solution in a large (4 core) web role, also using SQL Azure, for about two years and have had great success with it. I have roughly 3,000 users and rarely see the utilization of my web role go over 2% (meaning I've got a lot of room to grow). Overall it is a great hosting solution in my opinion.
According to the Azure SLA Microsoft guarantees up time of 99.9% or higher on all its products per billing month. (20 min on the month would be .0004% loss, not being critical, just suggesting that they are still within their SLA)
Current status shows that sql databases were having issues in the US north last night, but all services appear to be up currently
Personally, I have seen the dashboard go down, and report very weird problems, but the services that I programmed to worked just fine all the way through it. When I experienced this problem it was reported on the Azure Status, the platform status and the twitter feed
While I have seen bumps, they are few and far between, and I find reliability to be perceptibly higher than other providers that I have worked with.
As for workarounds I would suggest a standard mode for your websites and increasing instances of the site. You might try looking into the new add ins that are available with the latest Azure release. Active Cloud Monitoring by Metrichub might be what you require.
It sounds like you're expecting the web role to act as a Virtual Machine instance.
Web Roles aren't persistent (the machine can be destroyed and recreated at any time), so you should do any additional required set up as a 'startup task' in your Azure project (never install software manually).
Because of this you need at least 2 instances so that rolling upgrades (i.e. Windows security patches, hotfixes and so on) can be performed automatically without having your entire deployment taken offline.
If this doesn't suit your use case then you should look at Azure Virtual Machines, but you'll need to manage updates and so on yourself. It's usually better to use Web Roles properly as you can then do scaling and so on a lot more easily.
Very suddenly without any changes or recent access my Azure virtual server is no longer available for RDP or web...I have logged into the azure control panel and everything appears to running without issue but it is not working.
I have checked the end points and they are present for both RDP and Web, totally weird.
I have 2 virtual servers and the other one is working fine and responding.
Anyone ever experience this? Just when my client wants to view his website as well...
http://cn-web-02.cloudapp.net is the URL
TIA
As I just answered for this question, Virtual Machines are in Preview and not in Production yet. There are several reasons why your Virtual Machines became unavailable (see other answer). Given that this is the second reported incident here today, it's a good guess it's related to the underlying Host OS being updated, which would take your Virtual Machine offline for a short period of time.
I tried your URL and it's available again. Just remember about this being in Preview, especially since you mention having a client that wants to view his website. If you put a production website in Virtual Machines, then you'll have to absorb the risk of not having an SLA.
Having said that: You can mitigate downtime risk by running two Virtual Machines, listening on a load-balanced input endpoint. Be sure to have both Virtual Machines in the same Availability Set. Doing that ensures that the Windows Azure fabric controller will not take both Virtual Machines offline at the same time when doing things like Host OS updates. If this were in Production, you'd then have a very high availability scenario. Even in Preview, you'll improve availability by taking advantage of Availability Sets. Note: You'll need to use some type of shared session cache, since visitors will now be sent to either one of your Virtual Machines.
I had same experience on it! We had 2 instances and all of its were re-imaged without any notified. I known it since we made some local change via RDP.
Reboot or Reimage may help! You may try!
Turns out it was an outage from Microsoft...for over 22 hours but everything is back up and running. This is the 2nd time in 6 months this has happened for long stretches...makes me a little nervous to say the least.
Thanks for the input everyone and for anyone that's interested MS have a good site that tracks the service levels on Azure. Windows Azure Service Dashboard
S
Is it possible to create one or several azure VMs on my local machine? I want to create a web app and load test it locally, without the need of putting it in the cloud. I'm thinking at the following scenario: I have a local VM running a IIS server with my web app; I use a tool to generate a lot of load; I need to deploy the second VM containing the same things as the first VM. The downtime of the web app should be equal to 0(hopefully).
Clarification(update):
I want to achieve the following: create a web app and a monitoring app(CPU,Memory) and deploy them on one VM. On a load test, if the VM cannot handle it(e.g. CPU goes above 80%), I want to programmatically deploy a new VM(with the same configuration, having both the web app and the monitoring app), such that no downtime occurs.
Azure has several ways for you to host sites.
Virtual Machines is just that, normal VMs. You can create them locally and upload them, but everything is up to you, including how to handle upgrades. If that is what you need to do then I don't know how you would handle upgrades with no down time; though, you can add multiple VMs to a load balancer and then upgrade them one at a time.
It sounds like what you really want to explore is Cloud Services. You can run one or more VMs locally in the emulator, upgrade with no down time once in the cloud, implement auto scaling (you will have to use a tool or write some code).
Alternatively you may want to look at Azure Web sites, but that is a completely different concept and you can't really test load and load balancing locally the same way.
Based on your statement that you essentially want to auto-scale your application you want to look at Cloud Services with Auto Scaling. However, you can't fully test this in the cloud emulator - but you can test your logic.
Background
Azure Cloud Services is designed for this kind of thing; You don't really work with VMs in the way you may be used to, instead you create a package that Azure then deploys to as many servers as you like. Once up and running, you can manually go into the management console and increase or decrease the number of active servers simply by moving a slider. Of course, you want to do this automatically, so you have a few options.
There is a management API you can use to change the number of servers. So, it would be quite simple to write a bit of code that you spin up in another thread from WebRole.Start and that simply sits and monitors the CPU on the machine and then calls the management API to spin up a new server instance if your CPU goes over a certain treshold. Okay, locally you can only test that the call to the management API is made, you won't actually see the new server coming up. But, if you grab your free trial of Azure and just try it you will see that you really don't need to test that part - it just works.
However, in practice there is an awful lot more to auto scaling. Here are some of the things you need to consider;
Even relatively idle web servers will often spike briefly to 100% so just having a simple treshold is unlikely to be good enough; You need to decide on how long the server needs to be over a certain treshold before you spin up another server instance.
What happens when you have more than one server? And, on Azure, you should always have at least two servers to ensure you have resilience. Note that the idea with Cloud Services really is to have many small servers rather than a few big servers. You pay per core, not per number of servers.
Imagine you currently have three servers and one is really busy for some reason and the other two are idle. Do you want to spin up a fourth server?
Imagine you currently have two servers and they are both quite busy. Do you really want them both to start a new server so you end up with four servers running?
There are several ways to handle these challenges. For starters, rather than having monitor programs running locally on each server, you are better of moving that monitoring outside; Azure comes with the ability to dump performance metrics to table storage at whatever interval you choose. You can then run an external program that retrieves the performance data over time from all your current servers and then reason about the overall workload before deciding to spin up or shut down additional servers. Now, you can of course host that external monitor program in a separate thread on each of your webroles to give your monitoring resilience - but the key point is that the monitoring program doesn't monitor the server it runs on, it monitors all the servers. You will, of course, still have to deal with stopping multiple monitoring program instances from all starting and stopping servers. One way to do is to place stop/start commands onto an Azure "message queue" (there are a few different types) and use the built-in "de-duper" which will automatically delete identical commands that are put on the queue within a certain time window (I am over simplyfing but you get the idea).
The actual answer
Really, though, you want to look at the Auto Scaling Application Block which will do most of this for you. I guess that is the real answer to your question, but I wanted to provide a bit of context first.
Again, I recognise you asked for how to test this locally - but I believe that that question doesn't really make sense in the context of Azure and I hope the above information helps.
I'm pretty sure you can't do that and it wouldn't make sense anyway. If you want load testing, you need to run that in an environment as similar to production as possible and that means you have to run your application is Azure cloud. How else do you know that the load will actually be processed fine on real cloud?
I'm looking into Windows Azure now and wondering if one can implement a TCP/IP server using Worker Roles - i.e, when a request comes in on a socket - a worker role (and not a web role) will accept it, treat it well and then return an answer on that same socket request.
Another question is - should I do it, or maybe just implement my own non-blocking server using .NET and put it in one worker role or a VM?
Thanks!
There's a full worked example of a telnet server on Maaten Balliauw's blog - see http://blog.maartenballiauw.be/post/2010/01/17/Creating-an-external-facing-Azure-Worker-Role-endpoint.aspx
On your second question, most answers seem to recommend using worker roles for code instead of using VMs - worker roles in general are "architecturally preferred" for Azure, and VMs are there mainly for when you need to support existing (legacy) code.
Adding to Stuart's answer: A Worker Role will give you nearly everything a VM role is going to offer you, without you having to worry about maintaining the OS. VM roles are needed for a few specific scenarios. I enumerated them in this other StackOverflow answer, but just for completeness, here are those scenarios which require a VM role:
Startup / setup takes a really long time. This is a bit subjective, but a good rule-of-thumb is around 5 minutes. Remember that, every time your role instances boot up, they need to re-run any tasks in your startup, including software installs, so role instance availability is delayed until all startup tasks are run.
Startup / setup tasks are unreliable and don't always work the first time you run them. Software setups need to run in unattended mode, and must reliably succeed.
Human interaction is required. If the software install can't be completely automated, there's no way to script it.
When it comes to hosting a TCP service, you can choose to host something either publically available or only internal to your other role instances. For public hosts, you have up to 25 endpoints to work with across your deployment, and for internal hosts, you have up to 5 endpoints per role. See my blog post here for more details around this.