I've got a NodeJS application that does some moderately intense logic work when a user requests it. For example, a user on the frontend can click Analyze and the server will perform the work, which could take 30 seconds to 1 minute (non-blocking)
My app is not aimed at the wide public but at an audience of a few thousand. So there is a chance that several people might analyze at the same time.
I'm currently planning to deploy the app via Elastic Beanstalk, but I am not sure exactly how it will deal with a server when it is busy and if I have to implement some kind of custom signal to tell the load balancer to send requests to another instance, if the current one is busy performing analysis.
I understand that Lambdas are often held up as an option in this case, but I would much prefer to keep it simple and keep the code in my Node app.
How should I design this to ensure the app could handle doing analysis and still handling other requests normally?
Elastic Beanstalk uses Autoscaling Group to launch and maintain the EC2 instances required to run the Application. With Autoscaling Groups you can increase/decrease the EC2 instance count dynamically with Autoscaling Scaling policies. By default, Autoscaling Group provides scaling based on CPU, Network IN, Network Out, Request Count, Latency etc.. You can use any of these metrics and Scale-up your infrastructure dynamically.
You can refer to AWS Documentation here for more information.
Related
Is there a way to exclude an AppService instance from the Load Balancer:
Via the portal?
Via the SDK?
Via the SDK would be ideal, then we could set the MakeVisibleToLoadBalance flag (if such a thing existed) once all initialization completed.
If it's only available via the portal, it would be good to set n seconds after an instance is loaded before it becomes visible to the load balancer.
Reason:
When we restart an instance (e.g. via advanced restart), the metrics show a significant increase in response times, every time.
I believe the cause is the load balancer thinks the machine is available but it really hasn't completed initialization, so requests that the load balancer sends to that instance are significantly delayed.
Another reason is we may observe an instance is performing poorly, it would be great if we could exclude that instance until either it recovered or was restarted.
//As per the discussion with wallismark in the 'comments'. Copied the helpful comments to answer.
To fix the 'reason'/scenarios you have mentioned above, you could leverage ApplicationInitialization method. Every time your application starts, this can be because of a new worker coming online (horizontal scaling) or even just a cold start caused by a new deployment, config change etc. The ApplicationInitialization will be executed to warm up the site before accepting requests on that worker.
So the Application Initialization Module, handy feature that allows you to warm your app prior to the application receiving requests to help avoid the cold-start or slow initial load times when the app is restarted. Please checkout - https://ruslany.net/2015/09/how-to-warm-up-azure-web-app-during-deployment-slots-swap/
- It has also been implemented for all other operations in which a new worker is provisioned (such as auto scale, manual scale or Azure fabric maintenance). But, you cannot exclude the instance from the load balancer.
If your requirement fits, you could leverage ARR affinity; in a multi-instance deployment, ensures that the client is routed to the same instance for the life of the session. You can set this option to Off for stateless applications.
Typically, the Scale-out (trigger) -multiple running copies of your WebApps and handle the load balancing configurations necessary to distribute incoming requests across all instances. When you have more than one instance a request made to your WebApp can go to any of them using a load-balancer that will decide which instance to route the request based on how busy each instance is at the time.
To share more information on this feature - On load-balancer is that once a request from your browser is made to the site, it will add a ARRAffinity cookie to it (with the response) containing the specific instance id that will make the next request from this browser go to the same instance. You can use this feature to send a request to a specific instance of our site. You can find the setting in the App Service's Application Settings:
When multiple apps are run in the same App Service plan, each scaled-out instance runs all the apps in the plan.
I have my website (abc.azurewebsites.net) hosted to Azure Web Apps using Visual Studio.
Now after 1 month I am facing problems with traffic management. My CPU is always 90 - 95% as the number of requests is too high.
Does anyone know how to add Traffic Management in this web app without changing the domain abc.azurewebsites.net? Is it hard coded in my application?
I thought of changing the web app to a Virtual Machine but now as it's already deployed I am scared of domain loss.
When you Scale your Web App you add instances of your current pricing tier and Azure deploys your Web App package to each of them.
There's a Load Balancer over all your instances, so, traffic is automatically load balanced between them. You shouldn't need a Virtual Machine for this and you don't need to configure any extra Traffic Manager.
I can vouch that my company is using Azure Web Apps to manage more than 1000 concurrent users making thousands of requests with just 2-3 instances. It all depends on what your application does and what other resources does it access too, if you implemented or not a caching strategy and what kind of data storage you are using.
High CPU does not always mean high traffic, it's a mix of CPU and Http Queue Length that gives you an idea of how well your instances are handling traffic.
Your solution might implementing a group of things:
Performance tweak your application
Add caching strategies (distributed cache like Azure Redis is a good option)
Increase Web App instances by configuring Auto-Scaling based on HTTP Queue Length / CPU.
You should not have to change your domain to autoscale a Web App, but you may have to change your pricing tier. Scaling to multiple instance is available at Basic pricing tier, and autoscaling starts at Standard tier. Custom domains are allowed at these levels but you don't have to change your domain if you don't want to.
Here is the overview of scaling a web app https://azure.microsoft.com/en-us/documentation/articles/web-sites-scale/
Adding a Virtual Machine (VM) is very costly as compared to adding instance. On top of it, Redundancy (recommended) for the VMs, adding NIC etc will blow up the cost. Maintenance is another challenge. PAAS (webApp etc) is always a better option than IAAS.
Serverless offerings like Azure Functions can also be thought of. They support http trigger and scale up really well.
I have hosted a node.js app using an azure VM connected to a cloud service. Now i am trying to figure out how many sessions have hit that endpoint and their usage around it like, server response time, latency, memory usage, disk usage, unique users etc. Is there a way to get it?
If you're using VMs, you need to implement logging mostly by yourself.
Azure, using the Portal, allows you to view metrics for the VM, like CPU and memory usage. However, to get application-level metrics, such as response times, number of requests, etc, you need to design your own solution.
If your Node.js application communicates with the Internet through a reverse proxy (e.g. Nginx, IIS, etc), then you could fetch those metrics from your web server's logs
Otherwise, you'll have to implement logging inside your JS code. The correct way depends on the framework you're using (if any): Express, Koa, Hopi, etc.
On the other hand, were you using PaaS (Azure Web Apps), you'd get most of these metrics automatically.
If Azure App Service plans are virtual machines dedicated to the Web, API, Logic, and Mobile apps defined within them, does that mean that a web app in an app service plan is an instance of a virtual web server in IIS on that virtual machine?
Assuming this is the case and that each virtual web site gets it's own application pool, is there an Azure scaling strategy or scenario where more than one worker process in that app pool will run, creating a web garden? My understanding of web app scale out is that it results in additional VMs being allocated and not additional worker processes.
The scaling strategy will depend upon the pricing tier you have opted for.
Basically each Service Plan will contain a collection of Web, API, Logic, Mobile apps. These will form a web garden within the Service Plan server you choose.
If you initially choose a single B1 Basic Service Plan, you will get a single virtual machine with all of your applications running on that. As the load on that server increases, you can scale it up to larger servers, but it will still be running on a single server.
If you then choose to create a second instance (and a 3rd, 4th, 5th...) that second server will be a replica of the first server, with the load being balanced between the two. (3,4...)
While I've not seen documentation for this, I would imagine that each Web, API, etc app is run under its own application pool / worker process, and scale out is simply duplicated instances.
I'm not sure what a Virtual Server is, but each app runs in its own dedicated application pool and w3wp.exe process. There is only a single w3wp.exe process per application pool, so no web gardens.
Is there a specific reason you think you need these to scale your apps? In most cases, using web gardens is the wrong way to scale, as adding more processes can cause unnecessary overhead (amongst other problems - you can find some useful resources on the web). You almost always want to prefer threads over processes for improving concurrency. If you're running out of physical resources (CPU, memory, etc), then the correct way to scale is to add additional VMs.
I am trying to scale a web app on Azure from a single web instance to multiple instances. The web app does a fair amount of processing of per-user state, it's also fairly interactive so latency is important. We currently have a single database, testing has shown it is not the bottleneck so for this question let's assume we don't have to worry about scaling it, all instances will hit the same database. In this case, I think per-user load balancing is the best option, as per-request will result in per-user state being duplicated in lots of web instances. Apart from the issue of maintaining consistency, I am concerned this would result in unacceptable latency for end users.
This link says that ARR does per-user load balancing by default on Azure. However, the Traffic Manager, which from what I can gather is automatically enabled when you spin up multiple web instances on Azure, does per-request load balancing.
So my question is, which of these two load balancing schemes will I be using if I add a few more instances to my Web Hosting Plan? If I need to manually disable the Traffic Manager, what is the best way to do this?
Calum - you can leverage the standard SQL Session State Provider in Azure or you could look at the Azure Redis Cache provider as well for backing stores for user session state.
When deploying to Cloud Service Web Roles you automatically get a load balancer instance in front of your hosts. It's relatively transparent other than configuration of Endpoints. Each newly added/removed auto-scaled instance gets added to the Cloud Service and is automatically added/removed to the load balancer.
As others have said, Azure Traffic Manager provides a higher level service which can direct traffic to multiple Azure Regions (data centers) and even on-premises endpoints.
A good overview of Load Balancing can be found here: http://azure.microsoft.com/blog/2014/04/08/microsoft-azure-load-balancing-services/