how to prevent azure from scaling out additional instances until they are ready? - azure

We are having issues with an Azure Application Service. One of our webservices (MVC) caches data from the database at startup (Application_Start) - this takes approximately 3 minutes. Until this is ready we can't handle requests.
This is known so we set it 'always on' and will aim to only restart it during off-peak times if necessary.
However, we expect heavy load to the server next month, and in our testing of the auto-scaling, we have found that when it adds additional instances, each of these instances goes through the same startup delay - but the traffic is split between the current running instance and the new one that's warming up so e.g. half of the requests start failing for that 3 minute period.
How can we configure Azure to delay using the new instance until it is ready? (or should we be using e.g. AWS instead?).
Some of the documentation points to using a custom Load Balancer Probe however it mainly talks about VM's whereas we are using PAAS.

Do try to reduce the data you need to load on app_start and try to lazy load data into Cache on first request. Some times even after doing all of this we do end up with large sets of data that is necessary on start.
There are two ways we can approach this.
One, assuming you are using in-memory caching and every instance of the app needs to hydrate its in-memory cache on App_Start. Try to use a external cache provider like Azure Cache for Redis, your new instance can just point to this external cache without having to reload the data.
Two, you can depend on Application Initialization Module which was introduced in IIS 7.5 (installed on Azure App Services' IIS). To use this feature, you need to add applicationInitialization section under web.server section of web.config. This will help you not make the instance available until the warm-up process is completed. More info on how to use ApplicationInitialization is available in this blog post
The best case would be to use the combination of both, applicationInitialization will point at a page in your application which checks if the external cache is available and hydrated, if yes, complete, else hydrate the external cache.

You can do this in Azure with other resource type than classic VM like an App Service. App Services scale up and down with instances that share the same memory pool and thread pool.

There is a lot of good information, in the link https://www.jan-v.nl/post/warming-up-your-app-service that was included in one of the comments.
Based on that information the functionality that you require is not available in the free tier.
I would approach the problem differently. Why does it take 3 mins to load the data from the database? Since it is only loaded on start it should be data that does not change often.
Could you:
Optimise the reading of data from the database?
Reduce the amount of data you read from the database?
Export the data to a file, and read it from a file?

My recommendation would be to use an Azure Load Balancer with a health probe

Related

Recommended Azure service to replace Azure functions

We have a service running as an Azure function (Event and Service bus triggers) that we feel would be better served by a different model because it takes a few minutes to run and loads a lot of objects in memory and it feels like it loads it every time it gets called instead of keeping in memory and thus performing better.
What is the best Azure service to move to with the following goals in mind.
Easy to move and doesn't need too many code changes.
We have long term goals of being able to run this on-prem (kubernetes might help us here)
Appreciate your help.
To achieve first goal:
Move your Azure function code inside a continuous running Webjob. It has no max execution time and it can run continuously caching objects in its context.
To achieve second goal (On-premise):
You need to explain this better, but a webjob can be run as a console program on-premise, also you can wrap it into a docker container to move it from on-premise to any cloud but if you need to consume messages from an Azure Service Bus you will need an On-Premise-Azure approach connecting your local server to the cloud with a VPN or expressroute.
Regards.
There are a couple of ways to solve the said issue, each with slightly higher amount of change from where you are.
If you are just trying to separate out the heavy initial load, then you can do it once in a Redis Cache instance and then reference it from there.
If you are concerned about how long your worker can run, then Webjobs (as explained above) can work, however, that is something I'd suggest avoiding since its not where Microsoft is putting its resources. Rather look at durable functions. Here an orchestrator function can drive a worker function. (Even here be careful, that since durable functions retain history after running for very very very long times, the history tables might get too large - so probably program in something like, restart the orchestrator after say 50,000 runs (obviously the number will vary based on your case)). Also see this.
If you want to add to this, the constrain of portability then you can run this function in a docker image that can be run in an AKS cluster in Azure. This might not work well for durable functions (try it out, who knows :) ), but will surely work for the worker functions (which would cost you the most compute anyways)
If you want to bring the workloads completely on-prem then Azure functions might not be a good choice. You can create an HTTP server using the platform of your choice (Node, Python, C#...) and have that invoke the worker routine. Then you can run this whole setup inside an image on an AKS cluster on prem and to the user it looks just like a load balanced web-server :) - You can decide if you want to keep the data on Azure or bring it down on prem as well, but beware of egress costs if you decide to move it out once you've moved it up.
It appears that the functions are affected by cold starts:
Serverless cold starts within Azure
Upgrading to the Premium plan would move your functions to pre-warmed instances, which should counter the problem you are experiencing:
Pre-warmed instances for Azure Functions
However, if you potentially want to deploy your function/triggers to on-prem, you should spin them out as microservices and deploy them with containers.
Currently, the fastest way would probably be to deploy the containerized triggers via Azure Container Instances if you don't already have a Kubernetes Cluster running. With some tweaking, you can deploy them on-prem later on.
There are few options:
Move your function app on to premium. But it will not help u a lot at the time of heavy load and scale out.
Issue: In that case u will start facing cold startup issues and problem will be persist in heavy load.
Redis Cache, it will resolve your most of the issues as the main concern is heavy loading.
Issue: If your system is multitenant system then your Cache become heavy during the time.
Create small micro durable functions. It will be not the answer of your Q as u don't want lots of changes but it will resolve your most of the issues.

Auto suggest, Azure Webapp & .Net core WebAPI iMemoryCache

Tech Stack
Azure WebApp
.Net core 2.1 WebApi
We have around 4k reference data which is used during auto suggest lookup, so in this i was wondering whether i should cache this data on WebApp or should always get it from database / 3rd party API.
I know i can use RedisCache to solve this issue, but i would like to know how Azure WebApp works when it comes to caching, it will have memory pressure? When? Yes then scale-up is the only solution?
We are using IMemoryCache in .net Core to store reference data and it expires on daily basis or when Azure WebApp is restarted (So 1st user will get delay till it gets all data in cache).
Data size is in range of 500KB - 1MB & sometimes goes till 3MB+.
What is the best approach?
iMemoryCache is not suggested when using WebApps because it is tightly bound to your application instance, so if you try to scale out your app (in case of load surges during the day) your caching mechanism will be broken.
RedisCache is pretty much a dictionary, key-value pairs.
It is very fast on look-ups but it could be very slow in some other operations like a GetAllKeys when it has to run through the whole cache. That will bring your cache server to its knees, so it needs to be handled carefully.
It will not put any significant pressure in the memory consumption of your app, you only need to have a static client. The rest is handled by the redis server.
If you plan to scale up your application (give more RAM and CPU resources to your one running instance) the iMemory cache is probably fine.
If you plan to scale out (create multiple instances of your application), that is strongly suggested for all stateless applications, then RedisCache (or any other distributed cache) is an one way for you if you need a caching mechanism.
Value and key max size is 512MB so you are on the safe side regarding value data size.
Attention
Be sure to use the Connection multiplexer as it is suggested in the official documentation because it automatically re-establishes the connection in case it is lost. That was a bug earlier, when redis cache server was going into maintenance your calls where redirected to the fail over instance but the connection was failing, so you needed to restart your application.

What does the Azure Web Apps architecture look like?

I've had a few outages of 10 to 15 minutes, because apparently Microsoft had a 'blip' on their storages. They told me that it is because of a shared file system between the instances (making it a single point of failure?)
I didn't understand it and asked how file share is involved, because I would assume a really dumb stateless IIS app that communicates with SQL Azure for its data.
I would assume the situation below:
This is their reply to my question (I didn't include the drawing)
The file shares are not necessarily for your web app to communicate to
another resources but they are on our end where the app content
resides on. That is what we meant when we suggested that about storage
being unavailable on our file servers. The reason the restarts would
be triggered for your app that is on both the instances is because the
resources are shared, the underlying storage would be the same for
both the instances. That’s the reason if it goes down on one, the
other would also follow eventually. If you really want the
availability of the app to be improved, you can always use a traffic
manager. However, there is no guarantee that even with traffic manager
in place, the app doesn’t go down but it improves overall availability
of your app. Also we have recently rolled out an update to production
that should take care of restarts caused by storage blips ideally, but
for this feature to be kicked it you need to make sure that there is
ample amount of memory needs to be available in the cases where this
feature needs to kick in. We have couple of options that you can have
set up in order to avoid any unexpected restarts of the app because of
a storage blip on our end:
You can evaluate if you want to move to a bigger instance so that
we might have enough memory for the overlap recycling feature to be
kicked in.
If you don’t want to move to a bigger instance, you can always use
local cache feature as outlined by us in our earlier email.
Because of the time differences the communication takes ages. Can anyone tell me what is wrong in my thinking?
The only thing that I think of is that when you've enabled two instances, they run on the same physical server. But that makes really little sense to me.
I have two instances one core, 1.75 GB memory.
My presumption for App Service Plans was that they were automatically split into availability sets (see below for a brief description) Largely based on Web Apps sales spiel which states
App Service provides availability and automatic scale on a global data centre infrastructure. Easily scale applications up or down on demand, and get high availability within and across different geographical regions.
Following on from David Ebbo's answer and comments, the underlying architecture of Web apps appears to be that the VM's themselves are separated into availability sets. However all of the instances use the same fileserver to share the underlying disk space. This file server being a significant single point of failure.
To mitigate this Azure have created the WEBSITE_LOCAL_CACHE_OPTION which will cache the contents of the file server onto the individual Web App instances. Using caching in lieu of solid, high availability engineering principles.
The problem here is that as a customer we have no visibility into this issue, we've no idea if there is a plan to fix it, or if or when it will ever be fixed since it seems unlikely that Azure is going to issue a document that admits to how badly this has been engineered, even if it is to say that it is fixed.
I also can't imagine that this issue would be any different between ASM and ARM. It seems exceptionally unlikely that there was originally a high availability solution at the backend that they scrapped when ARM came along. So it is very likely that cloud services would suffer the exact same issue.
The small upside is that now that we know this is an issue, one possible solution would be to deploy multiple web apps and have a traffic manager between them. Even if they are in the same region, different apps should have different backend file servers.
My first action would be to reply to that email, with a link to the Web Apps page, (and this question) with a copy of the quote and ask how to enable high availability within a geographic region.
After that you'll likely need to rearchitect your solution!
Availability sets
For virtual machines Azure will let you specify an availability set. An availability set will automatically split VMs into separate update and fault domains. Meaning that servers will end up in different server racks, and those server racks won't get updates at the same time. (it is a little more complex than that, but that's the basics!)
Azure Web Apps do used a shared file storage. The best way to think about it is that all the instances of your app map to the same network share that have your files. So if you modify the files by any mean (e.g. FTP, msdeploy, git, ...), all the instances instantly get the new files (since there is only one set of files).
And to answer your final question, each instance does run on a separate VM.

Send Message to Azure Web Site Instance

we are evaluating Azure right now and I really like the azure web sites, especially because of the very easy and fast deployment, which is helpful in our current situation where we make a lot of tests.
We have some in-memory-caches for information that is accessed very often per request like text-strings for multi-language-support and configuration settings edited by the site administrator. I would like to have a system where each instance of the web site has a copy of this cached data, but i need to send flush-events for cache invalidation to all instances when some settings are changed by the administrator. I guess that the azure service bus is perfect for this with the publish-subscribe-model, but I dont want to pay 3€ per instance just for sending some messages around.
Is there an option to open an individuell endpoint per instance, where I can a wcf-service for example?
This is no good way to direct a request at a specific instance of a Windows Azure Web Site that I'm aware of. The load balancer for Web Sites is defaulted to use sticky sessions (which you can turn off), but there isn't a way to force the request going in to be directed to once instance of a web site over another.
You could look at the Service Bus as you mentioned with a Topic and several subscriptions, which is indeed an option, but as you point out it does cost something. I'm interested in where you got the calculation for the amount though. Brokered messaging is charged per message (with "empty requests also being included"). If you had an instance checking once a minute for a month it's only about 43,000 calls. You can get 1 million calls for a US Dollar. With the long polling that Service Bus has in the managed client you end up with fewer "empty" calls than standard polling.
Another option is to simply use a different polling mechanism. In this case you are just wanting an indicator that you should, or should not flush the cache. You could put a text file in BLOB storage that contains a cache current version value. This could be whatever you want, a number, a guid, doesn't matter. Each instance would then from time to time check this BLOB file. If the value in the file is different than what they last saw they refresh their cache. Then they hold on to the new cache version value for their next call. You can either set this up as a WebJob on a schedule or do your own background polling.
Finally, there is the Windows Azure Cache Service (preview) which is usable by Web Sites, but that would cost additional and, if you really are caching the exact data on all instances, wouldn't be as efficient. It would give you the ability to deal with the cache service directly though, independent of the instances that are using it, allowing you to reset and such as you needed, on demand, in one fell swoop.
Personally I'd suggest taking a look at the Service Bus again.

Create azure VM on my local machine

Is it possible to create one or several azure VMs on my local machine? I want to create a web app and load test it locally, without the need of putting it in the cloud. I'm thinking at the following scenario: I have a local VM running a IIS server with my web app; I use a tool to generate a lot of load; I need to deploy the second VM containing the same things as the first VM. The downtime of the web app should be equal to 0(hopefully).
Clarification(update):
I want to achieve the following: create a web app and a monitoring app(CPU,Memory) and deploy them on one VM. On a load test, if the VM cannot handle it(e.g. CPU goes above 80%), I want to programmatically deploy a new VM(with the same configuration, having both the web app and the monitoring app), such that no downtime occurs.
Azure has several ways for you to host sites.
Virtual Machines is just that, normal VMs. You can create them locally and upload them, but everything is up to you, including how to handle upgrades. If that is what you need to do then I don't know how you would handle upgrades with no down time; though, you can add multiple VMs to a load balancer and then upgrade them one at a time.
It sounds like what you really want to explore is Cloud Services. You can run one or more VMs locally in the emulator, upgrade with no down time once in the cloud, implement auto scaling (you will have to use a tool or write some code).
Alternatively you may want to look at Azure Web sites, but that is a completely different concept and you can't really test load and load balancing locally the same way.
Based on your statement that you essentially want to auto-scale your application you want to look at Cloud Services with Auto Scaling. However, you can't fully test this in the cloud emulator - but you can test your logic.
Background
Azure Cloud Services is designed for this kind of thing; You don't really work with VMs in the way you may be used to, instead you create a package that Azure then deploys to as many servers as you like. Once up and running, you can manually go into the management console and increase or decrease the number of active servers simply by moving a slider. Of course, you want to do this automatically, so you have a few options.
There is a management API you can use to change the number of servers. So, it would be quite simple to write a bit of code that you spin up in another thread from WebRole.Start and that simply sits and monitors the CPU on the machine and then calls the management API to spin up a new server instance if your CPU goes over a certain treshold. Okay, locally you can only test that the call to the management API is made, you won't actually see the new server coming up. But, if you grab your free trial of Azure and just try it you will see that you really don't need to test that part - it just works.
However, in practice there is an awful lot more to auto scaling. Here are some of the things you need to consider;
Even relatively idle web servers will often spike briefly to 100% so just having a simple treshold is unlikely to be good enough; You need to decide on how long the server needs to be over a certain treshold before you spin up another server instance.
What happens when you have more than one server? And, on Azure, you should always have at least two servers to ensure you have resilience. Note that the idea with Cloud Services really is to have many small servers rather than a few big servers. You pay per core, not per number of servers.
Imagine you currently have three servers and one is really busy for some reason and the other two are idle. Do you want to spin up a fourth server?
Imagine you currently have two servers and they are both quite busy. Do you really want them both to start a new server so you end up with four servers running?
There are several ways to handle these challenges. For starters, rather than having monitor programs running locally on each server, you are better of moving that monitoring outside; Azure comes with the ability to dump performance metrics to table storage at whatever interval you choose. You can then run an external program that retrieves the performance data over time from all your current servers and then reason about the overall workload before deciding to spin up or shut down additional servers. Now, you can of course host that external monitor program in a separate thread on each of your webroles to give your monitoring resilience - but the key point is that the monitoring program doesn't monitor the server it runs on, it monitors all the servers. You will, of course, still have to deal with stopping multiple monitoring program instances from all starting and stopping servers. One way to do is to place stop/start commands onto an Azure "message queue" (there are a few different types) and use the built-in "de-duper" which will automatically delete identical commands that are put on the queue within a certain time window (I am over simplyfing but you get the idea).
The actual answer
Really, though, you want to look at the Auto Scaling Application Block which will do most of this for you. I guess that is the real answer to your question, but I wanted to provide a bit of context first.
Again, I recognise you asked for how to test this locally - but I believe that that question doesn't really make sense in the context of Azure and I hope the above information helps.
I'm pretty sure you can't do that and it wouldn't make sense anyway. If you want load testing, you need to run that in an environment as similar to production as possible and that means you have to run your application is Azure cloud. How else do you know that the load will actually be processed fine on real cloud?

Resources