I would like to create an application that holds large amount of volatile data in memory. Only small part of this data needs to be persisted when host machine shuts down, or in case of maintenance. Outages should be rare, this in memory data needs to be accessible for most of the time, but rare restrats of service is bearable.
If I have been developing for a server, I would create a WindowsService, which runs reliably while the machine is up, and I would persist a fraction of the data in the OnStop() method.
I'm thinking of moving this whole thing to the cloud. The question is that if a Worker Role is similiar to a Windows Service from this point of view? Does it run most of the time with rare outages, or is it recycled / restarted from time to time or when it is idle?
Like Windows Service, Worker role is meant for processing background tasks. However one thing you would need to keep in mind that your worker role can go down any time. It may be because of hardware failure or software updates. Thus you can't always assume this to be highly available. That's why Windows Azure recommends deploying multiple instances of your application.
What you could do is have multiple instances of your worker role running and all of them sharing a common cache where you would put volatile data. Do take a look at Windows Azure Caching (http://msdn.microsoft.com/en-us/library/windowsazure/gg278356.aspx) where you could either dedicate some memory of a VM (i.e. an instance) for caching purpose or have a full VM dedicated for caching. That way you'll have your volatile data somewhere outside of your worker roles and thus making it available to all instances.
Related
I am new to azure cloud and I have deployed my first Worker Role.
The process that takes 30 minute to complete in my local system is taking more than 1 hour on azure worker role.
To find the issue I have taken access of remote desktop of worker role. I checked the task manager and found that the worker role process (WaWorkerHost.exe) was using just 12% of the CPU even if no other process was running. If I run the same code in my local system it is taking 24-25% of CPU.
I think that is why the worker role is taking double time then in my local system.
The VM size of my worker role is Extra Large (8 Cores, 14336MB).
I think as there is no other process on worker role my process should get more CPU usage. But I am not able to find the way by which I can increase the CPU usage of worker role process (WaWorkerHost.exe) from azure portal.
Please with this.
Thanks.
CPU consumption only is not enough to determine if a machine is working hard or hardly working. Your assumption "CPU percentage is low, so the machine isn't busy (enough)" is way too simplistic. You should take more resources into consideration like disk access, memory usage and network access.
As you can imagine it's relatively simple to create an application that doesn't use up all your CPU, while it completely freezes your machine. Just have it read and write files from all over the disk, for instance.
EDIT:
Expanding in the first paragraph: what is it the Worker Role actually does? Does it connect to some type of storage? Or maybe an internally hosted database or files? Is it putting messages on a queue or calling external services?
All of these things might be the reason the Worker Role is taking longer to complete the task than your local machine does. For instance because of network latency. And while it seem like it isn't too busy if you only look at the CPU, it might be very busy waiting for an answer from an external resource.
I have a product which uses CPU ID, network MAC, and disk volume serial numbers for validation. Basically when my product is first installed these values are recorded and then when the app is loaded up, these current values are compared against the old ones.
Something very mysterious happened recently. Inside of an Azure VM that had not been restarted in weeks, my app failed to load because some of these values were different. Unfortunately the person who caught the error deleted the VM before it was brought to my attention.
My question is, when an Azure VM is running, what hardware resources may change? Is that even possible?
Thanks!
Answering this requires a short rundown of how Azure works.
In each data centres there are thousands of individual machines. Each machine runs a hypervisor which allows a number of operating systems to share the same underlying hardware.
When you start a role, Azure looks for available resources - disk space CPU RAM etc and boots up a copy of the appropriate OS VM in thoe avaliable resources. I understand from your question that this is a VM role - so this VM is the one you uploaded or created.
As long as your VM is running, the underlying virtual resources provided by the hypervisor are not likely to change. (the caveat to this is that windows server 2012's hyper visor can move virtual machines around over the network even while they are running. Whether azure takes advantage of this, I don't know)
Now, Azure keeps charging you for even when your role has stopped because it considers your role "deployed". So in theory, those underlying resources still "belong" to your role.
This is not guaranteed. Azure could decided to boot up your VM on a different set of virtualized hardware for any number of reasons - hardware failure being at the top of the list, with insufficient capacity being second.
It is even possible (tho unlikely) for your resources to be provided by different hardware nodes.
An additional point of consideration is that it is Azure policy that disaster recovery (or other major event) may include transferring your roles to run in a separate data centre entirely.
My point is that the underlying hardware is virtual and treating it otherwise is most unwise. Roles are at the mercy of the Azure Management Routines, and we can't predict in advance what decisions they may make.
So the answer to your question is that ALL of the underlying resources may change. And it is very, very possible.
A large e-commerce site is looking to switch its session cache from Shared cache to dedicated cache.
It is usually running on medium-size servers (5-6)... During busy times, it's running on 20 medium servers. During the very busy times, it is not unreasonable to have 2000+ requests per second to the site
Is co-located cache good enough here or must cache be in the dedicate worker role?
Also, must high-availability be enabled for session data? The site relies upon session data to be present for good user experience. But the cache is persisted to Azure blob storage, so I'm not sure I totally get the high-availability option
The use of dedicated roles depends on how many roles you want to run, and whether or not the memory usage of your web roles determines if they scale. For example, if your web roles are always pushing memory usage, and it is memory and not CPU that is the trigger for scaling out - then consider using dedicated roles for the cache, as your web roles can then handle the load for longer. If your web roles are cpu intensive, then dedicating memory on each role to the cache may be preferred. You also need to consider that if running in dedicated roles, you need more than one role to handle the load and availability, so even during non-busy times, you will have at least 3 roles running the cache (but possibly fewer web roles). You may also want to use dedicated cache if you do lots of deployments or scaling down - where roles are shut down intentionally and frequently.
One consideration on co-located role caching is that if you had sticky sessions the latency would be lower, as the item is on the same machine. Unfortunately, the Azure load balancer is round robin, and not sticky at all, so the chance that a session gets back to the same machine is low (1/5 of the time for 5 roles). This means that most of the time the cache item will be fetched from another role in the cluster, so co-located latency benefits are lost.
The cache is distributed and in-memory - there is no blob storage that I am aware of (except for 'cluster's runtime state' - whatever that is. An item loaded into cache is made available to other machines on the cluster from the machine that it is stored (in memory) on (a read from machine B to machine A does not also store it on machine A - see comment below). Cached items are always in memory only, and the cache size is limited by available memory.
The high availability option copies the item to a separate machine (not storage), so if one machine fails, there is still a copy somewhere. High availability will also use more memory, as an item uses memory in two different places. The chances of failure maybe low enough for your e-commerce app - if an item is not cached (either through failure or expiry) it may be reconstructed from persisted data. If you are, for example, keeping the basket in cache and not persisted to storage, you don't want it lost if a role recycles - in which case high availability may be the best option.
Great answer #SimonMunro however in my experience the Azure Co-located Cache is not fit for production. Our load testing has shown us that when a server is recycled that it takes an exceptional long period of time for a the cache to recover. We have coded against this by fetching the data from our database however our site grinds to a halt due to the stress on the database. This not only happens when a node is recycled; but also if you scale your cloud services up and down; and even when you perform a VIP swap.
We have performed the same tests using the Azure dedicated cache and have found it to handle the situation of a cache worker role recycling with little to no effect to the performance of the site. It is my recommendation is to use the Azure Dedicated Cache in all cases if you want your site to perform.
I need to write some data to database every 50 seconds or so. It's similar to a Windows service that's running on background and silently doing its job. Starting and stopping is not an option in my case as I need a small amount of previously inserted data to be stored in memory. What's the best solution for this when using Windows Azure or AWS?
Thank you.
With Windows Azure, you can choose either a Web or Worker role (both basically Windows 2008 Server R2 or SP2) and have some type of timed event, as #Lucifure suggested. You could also run a scheduler, like Quartz.net, or take advantage of windows Azure queues or service bus queues to have messages show up at a certain time. However: You cannot have a "forever" task in a given role instance, in that periodically your VM instances will be rebooted (e.g. for host OS maintenance every month). With role shutdowns, you'll get notice, which you can handle these shutdown notices in Stopping() or OnStop(). If you have multiple instances, you can use a scheduler or queue to ensure your events still trigger every 50 seconds or so, and get handled across multiple instances (but only by one instance at any given time).
To preserve your in-memory information, one idea is to store that information in a cache. You have 2 choices:
Distributed (shared) cache service, which has been around for some time now. It runs independently of your role instances.
In-memory cache, just introduced in June 2012. Assuming you have more than one instance, the cache is spread across those instances. You can even run the cache inside of memory of your existing roles.
More information on caching is here.
There are a few StackOverflow answers regarding Quartz.net and Windows Azure, such as this one.
On Windows Azure, you can use a Worker Role, which can do this. It can be simple as a while loop.
Try this article for an introduction.
http://www.c-sharpcorner.com/uploadfile/40e97e/windows-azu-creating-and-deploying-worker-role/
You could setup a System.Threading.Timer to fire every 50 seconds or so, and do your work whenever the event occurs.
Can someone please summarize the advantages of creating an Azure WokerRole vs. simply starting a new thread?
By starting a new worker role instance you have all of the memory and CPU available to that instance size vs. when creating threads you'd be sharing the resources of one role for that instance size.
I would say that it also depends on what you're processing. Also, I think that threading or any parallel processing only makes sense when you're using a Medium instance and up where you have 2 or more cores.
The primary advantages IMHO are that you create a seperation of concerns as well as the ability to dependently scale the capacity of the background process and front end.
I assume you mean starting a new thread from an IIS-hosted service/app in a WebRole. My main concern would be recycling of IIS app pools and memory consumption.
Depending on the type of application, load on your application and IIS settings you don't have a lot of control over the lifecycle and resources of the process your thread will be living in.