Number of instances needed for windows azure application - azure

I'm fairly new to Windows Azure and want to host a survey application that will be filled out by appr. 30.000 users simultaniously.
The application consists of 1 .aspx page that will be sent to the client once, asks 25 questions and will give a wrap-up of the given answers at the end. When the user has given the answer and hits the 'next question' buttons the given answer will be send via an .ashx handler to the server. The response is the next question and answers. The wrap-up is sent to the client after a full postback.
The answer is saved in an Azure Table that is partitioned so that each partition can hold a max of 450 users.
I would like to ask if someone can give an estimated guess about how many web-role instances we need to start in order to have this application keep running. (If that is too hard to say, is it more likely to start 5, 50 or 500 instances?)
What is a better way to go: 20 small instances or 5 large instances?
Thanks for your help!

The most obvious answer: you would be best served by testing this yourself and see how your application holds up. You can easily get performance counters and other diagnostics out of Windows Azure; for instance, you can connect Microsoft SCOM (System Center Operations Manager) to monitor your environment during test. Site Hammer is a simple load testing tool for Windows Azure (on MSDN code gallery).
Apart from this very obvious answer, I will share some guesstimates: given the type of load, you are probably better of with more small instances as opposed to a lower number of large ones, especially since you already have your storage partitioned. If you are really going to have 30K visitors simultaneously and give them a ~15 second interval between reading the questions & posting their answers you are looking at 2,000 requests per second. 10 nodes should be more than enough to handle that load. Remember that this is just a simple estimate, lacking any form of insight in your architecture, etc. For these types of loads, caching is a very good idea; it will dramatically increase the load each node can handle.
However, the best advice I can give you is to make sure that you are actively monitoring. It takes less than 30 minutes to spin up additional instances, so if you monitor your environment and/or make sure that you are notified whenever it starts to choke, you can easily upgrade your setup. Keep in mind that you do need to contact customer support to be able to go over 20 instances (this is a default limit, in place to protect you from over-spending).

Aside from the sage advice tijmenvdk gave you, let me add my opinion on instance size. In general, go with the smallest size that will support your app, and then scale out to handle increased traffic. This way, when you scale back down, your minimum compute cost is kept low. If you ran, say, a pair of extra-large instances as your baseline (since you always want minimum two instances to get the uptime SLA), your cost footprint starts at 0.12 x 8 x 2 = $1.92 per hour, even during low-traffic times. If you go with small instances, you'd be at 0.12 x 1 x 2 = $0.24 per hour.
Each VM size as associated CPU, memory, and local 9non-durable) disk storage, so pick the smallest size unit that your app works efficiently in.
For load/performance-testing, you might also want to consider a hosted solution such as Loadstorm.

How simultaneous are the requests in reality?
Will they all type the address in at exactly the same time?
That said, profile your app locally, this will enable you to estimate CPU, Network and Memory usage on Azure. Then, rather than looking at how many instances you need, look at how you can reduce the requirement! Apply these tips, and profile locally again.
Most performance tips have a tradeoff between cpu, memory or bandwith usage, the idea is to ensure that they scale equally. If you're application runs out of memory, but you have loads of CPU and network, dont
For a single page survey, ensure your html, css & js is minified, ensure its cacheable.
Combine them if possible, and to get really scaleable, push static files (css,js & images) to a CDN. This all reduces the number of requests the webserver has to deal with, and therefore reduces the number of webroles you will need = less network.
How does the ashx return the response? i.e. is it sending html, xml or json?
personally, I'd get it to return JSON, as this will require less network bandwidth, and most likely less server side processing = less mem and network.
Use Asyncronous API's to access azure storage (this uses IO completion ports to free up the iis thread to handle more requests until azure storage comes back = enabling cpu to scale)
tijmenvdk has already mentioned using queues to write. Do the list of questions change? if not, cache them, so that the app only has to read from table storage once on start-up and once for each client for the final wrap-up = saves network and cpu at the expense of memory.
All of these tips are equally applicable to a normal web application, on a single server or web-farm environment.
The point I'm trying to make is that what you can't measure, you cant improve, and measurement, improvement and cost all go hand in hand. Dynamic scaling will reduce costs, but fundamentally if your application hasn't been measured and resource usage optimised, asking how many instances you need is pointless.

Related

Choosing the right EC2 instance for three NodeJS Applications

I'm running three MEAN stack programmes. Each application receives over 10,000 monthly users. Could you please assist me in finding an EC2 instance for my apps?
I've been using a "t3.large" instance with two vCPUs and eight gigabytes of RAM, but it costs $62 to $64 per month.
I need help deciding which EC2 instance to use for three Nodejs applications.
First check CloudWatch metrics for the current instances. Is CPU and memory usage consistent over time? Analysing the metrics could help you to decide whether you should select a smaller/bigger instance or not.
One way to avoid too unnecessary costs is to use auto scaling groups and load balancers. By using them and finding and applying proper settings, you could have always right amount of computing power for your applications.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html
https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-groups.html
Depends on your applications. If your apps need more compute power or more memory or more storage? Deciding a server is similar to installing an app on system. Check what are basic requirements for it & then proceed to choose server.
If you have 10k+ monthly customers, think about using ALB so that traffic gets distributed evenly. Try caching to server some content if possible. Use unlimited burst mode of t3 servers if CPU keeps hitting 100%. Also, try to optimize code so that fewer resources are consumed. Once you are comfortable with ec2 choice, try to purchase saving plans or RIs for less cost.
Also, do monitor the servers & traffic using Cloudwatch agent, internet monitor etc features.

Run many low traffic webapps on a single machine so that a webapp only starts when it is needed?

I'm working on several different webapps written in node. Each webapp has very little traffic (maybe a few HTTP requests per day) so I run them all on a single machine with haproxy as a reverse proxy. It seems each webapp is consuming almost 100MB RAM memory which adds up to a lot when you have many webapps. Because each webapp receives so little traffic I was wondering if there is a way to have all the webapps turned off by default but setup so that they automatically start if there is an incoming HTTP request (and then turn off again if there hasn't been any HTTP requests within some fixed time period).
Yes. These a dozen different ways to handle this. With out more details not sure the best way to handle this. One option is using node VM https://nodejs.org/api/vm.html Another would be some kind of Serverless setup. See: https://www.serverless.com/ Honestly, 100MB is a drop in the bucket with ram prices these days. Quick google shows 16GB ram for $32 or to put that differently, 160 nodes apps. I'm guessing you could find better prices on EBay or a something like that.
Outside learning this would be a total waste of time. Your time is worth more than the effort it would take to set this up. If you only make minimum wage in the US it'd take you less than 4 hours to make back the cost of the ram. Better yet go learn Docker/k8s and containerize each of those apps. That said learning Serverless would be a good use of time.

Jmeter web application performance testing doubts?

I am new to jmeter and have a couple of doubt about web application performance testing.
Is it necessary to load all embedded resource in jmeter for performance testing ?
I have written a Jmeter script that exercise all REST apis. Is this enough to find the application performance at the server side ?
How Ramp up time affects the Performance test ?
For how much time the test needs to be executed, to get an accurate performance report ?
Load Generation configuration - Generating load from machines attached to application cluster / from different LAN ?
Kindly find my view on the questions below:
I believe that load test needs to be as much realistic as possible so representing real browser behavior is a must. Real browsers download embedded resources like scripts, images and styles, moreover, they use a concurrent thread pool of 2 - 8 threads to do this in parallel. So you need to configure JMeter similarly. However real browsers download these assets only once, on subsequent requests they return embedded resources from cache. So make sure that you configure JMeter to:
download embedded resources
use concurrent pool for it
add HTTP Cache Manager to your test plan
It should be enough from functional point of view as usually static content is being served separately. However see point 1, if you have possibility to simulate real user behavior - go for it
It is better to have reasonable ramp-up and ramp-down periods so the load could increment gradually so both server and load generator sides won't experience peak stress loads (unless it is your test case). See the bit on ramp-up from JMeter documentation
Ramp-up needs to be long enough to avoid too large a work-load at the start of a test, and short enough that the last threads start running before the first ones finish (unless one wants that to happen).
Start with Ramp-up = number of threads and adjust up or down as needed.
By default, the thread group is configured to loop once through its elements.
Usually peak load follows general Pareto principle, during "peak" periods application served 80% of requests during 1-2 hours time frame and remaining 20% of requests were more or less equally distributed between remaining 20 hours in a day. So it should be enough to test your application providing anticipated peak load for a couple of hours. Again if time allows I would recommend to go for Soak testing to see if there are any memory leaks and for Stress testing to determine application load boundaries and whether it recovers from stress load or not
Theoretically application shouldn't care regarding requests source (unless it uses different logic to handle requests from i.e. different geo regions). One thing is obvious: don't run load generator and application under test on the same machine. If one JMeter instance cannot create enough load to implement test scenario - go for distributed testing
I'd like to add some more perspective:
Question 1 & 2:
The Pareto principle can be applied here also - meaning, that it takes a lot of effort to properly simulate reality, downloading all resources used by a browser to render a page and to give the proper 'weights' to different URLs, simulating user behaviour accurately. This is where many load tests fail, because simulating reality accurately is very, very hard. As the previous response mentions, most static content is often served via CDNs or similar anyway, and what you really want to test is usually your own system's capability to handle traffic.
Considering the above, I would say that if you spend 20% of effort setting up a load test that tests your REST API, you will get 80% of the results you want. If you on the other hand go for a completely realistic test, you will spend another 80% of effort for only 20% more results. The effect of this is that in many cases it is better to go for the simpler test, that does not simulate reality accurately. It gives you the most return on your invested time.
Question 3: Agree fully with previous response here. Ramp up slowly, unless your specific use case sees very sudden traffic peaks (like if you're an online auction service or ticket sales or similar). Can also be a good idea to configure your test so it spends some time on a "plateau" after ramping up to peak load, and not just stopping the load test once you reach the peak.
Question 4: I would say you need to run the load test long enough to produce stable, statistically significant results. This can be 5 minutes or 5 hours depending on your scenario, but half an hour is probably a good minimum time to aim for in mostly all cases. The test duration should not be dependent on how long your site tends to experience peak load in real life though - not unless you're doing some kind of soak test.
Question 5: Traffic origin is sometimes worth thinking about, as different source locations lead to different network delay between (simulated) clients and server, which affects transaction rates. If you run a load test with 1,000 VUs on a system located in New York, and generate the traffic from Australia, you will not get a lot of transactions per second due to the high network delay. If you run the same test using a load generator in New York instead, your transaction rate will be a lot higher because the network delay is so much lower. Of course, you can always add more concurrent clients/VUs/connections and get the same transaction rate on a high-delay network link that you would on a low-delay link, but at the cost of forcing the server to keep a lot more (TCP) connection state, using more file descriptors and buffer memory. I.e. might not be a very realistic scenario.

Should each website be its own `node.js` process

We host about 150 websites (possibly scaling to 300+) that we are considering migrating to node.js. Most of the sites are fairly low traffic <1mil pageviews per month.
Should each website be it's own node.js process, or should we serve all websites using the same node.js process (or small set of load balanced processes). Is there a technical limit or a reasonable limit to the number of node processes per server?
Process per site: Feels inefficient, but I don't know if it actually is inefficient. Would ensure one buggy site doesn't affect other sites.
Process per core/small set of processes: Likely higher performance, but what happens when I need to update a sites codebase, won't it take down other sites? Also, code failures in one site would affect other sites.
Ideally, I would prefer one process per site so that we could host all sites from each worker server. That way when load increases we can just spin up another identical worker server and load balance between the two without having to arbitrarily say SiteA goes to ServerA and SiteB goes to ServerB. Any node.js gurus available to offer some wisdom?
All static file requests will be handled likely by Nginx or something like Varnish.
There are a lot of issues at play here. The big picture answer is, it depends... as it always does when you bring in the whole "performance" discussion. That being said, the simplest way to get a solid Node set up is to note the following basic facts about NodeJS, and I will also comment on their implications as they pertain to your questions.
The concurrency you get with Node works really good in certain situations, namely IO heavy operations. What we're really talking about here is minimizing the amount of downtime to wait for the next request. Because of this, Node works really well in an environment where there is one process per core on a machine. Node does really well at maximizing the amount of CPU available to serve requests under heavy load. This being said, if you have literally ZERO other work going on in your even loop, you can see minor performance increases (in terms of max requests/second/processor core) by having multiple node processes per core. But, I've never seen any benefit from increasing this number past 3. Even under circumstances where the entire event loop was literally just a file server.
On the process per site comment. This is a bad idea for many reasons. For one, a well put together node server can process thousands of requests per second. Our (company name omitted) servers, hosted through Amazon EC2 on medium clusters (lots of ram, mid CPU clock, 4 cores), typically fail around 3000 requests per second per cluster. Our servers do a fair bit of CPU work, for simple file servers I'm sure you can do much better. Strictly speaking, sure, per site, you will be able to serve more requests by launching each site in its own process/core/escalating quickly here! But it's not necessary from a cost and over complication of your architecture point of view. What I WOULD recommend, is investing in a setup with a lot of RAM. The ability for your server to cache often requested files will effect your performance infinitely more than launching an abundance of processes for a given machine.
On the whole RAM thing. The number of processes you want to launch for a given core is dependant on two things. One is how much synchronous work done in your event loop. The more synchronous work, the more time between a given request coming in and the event loop being ready to adress the next one. If you have a busy event loop, you will be in a situation where you require more processes/CPU Core. The other thing that can effect this, particularly relevant for file servers, is the amount of RAM. Node runs much better in a high ram environment, but you can say this about ANY file server really... What this has to do with, is the number of active asynchronous operations. One downside of the way node works, is under heavy loads, you can get a large number of event handlers active at once. This is great for concurrency/simplicity, however, if your server is busy waiting around for a lot of async disk/IO to happen it will slow down and crash much sooner than if you had plenty of RAM. If you don't have enough RAM to handle all of these event handlers, you will want to keep to the 1 process/core arrangement. Otherwise, it is easier for Node to spin up many event handlers simultaneously, and again cause you to crash sooner than you would otherwise.
I don't really have enough information to tell you what you SHOULD do. This depends entirely too much on the architecture of your specific server, sites, size of your sites, amount of data... etc. But these three pieces of knowledge are the basic things that help you get the most out of your Node server. To be honest, your idea about load balancing mixed with the considerations above, should do nicely for you. Surely, microoptimizations are possible, but if you do these things, you should easily see requests/second in the thousands before you start experiencing crashes because of DDOS type of conditions.
No, don't do it. Keep it simple! And check out http://12factor.net/.
A few hundred processes is nothing compared to the simplicity you otherwise lose. It would be a terrible decision, on so many levels, to have more than one site (or, "logical application unit") served by a single Node process.
If you're asking this question, you may want to explore Node more before you "migrate" to Node. Error handling and separation of concerns are more complicated in Node than in other situations. Specifically, neither the domain nor cluster APIs are mature. But really it's the philosophy of clean and simple application deployment that you'd be violating. I could go on and on.

What are the most important statistics to look at when deploying a Node.js web-application?

First - a little bit about my background: I have been programming for some time (10 years at this point) and am fairly competent when it comes to coding ideas up. I started working on web-application programming just over a year ago, and thankfully discovered nodeJS, which made web-app creation feel a lot more like traditional programming. Now, I have a node.js app that I've been developing for some time that is now running in production on the web. My main confusion stems from the fact that I am very new to the world of the web development, and don't really know what's important and what isn't when it comes to monitoring my application.
I am using a Joyent SmartMachine, and looking at the analytics options that they provide is a little overwhelming. There are so many different options and configurations, and I have no clue what purpose each analytic really serves. For the questions below, I'd appreciate any answer, whether it's specific to Joyent's Cloud Analytics or completely general.
QUESTION ONE
Right now, my main concern is to figure out how my application is utilizing the server that I have it running on. I want to know if my application has the right amount of resources allocated to it. Does the number of requests that it receives make the server it's on overkill, or does it warrant extra resources? What analytics are important to look at for a NodeJS app for that purpose? (using both MongoDB and Redis on separate servers if that makes a difference)
QUESTION TWO
What other statistics are generally really important to look at when managing a server that's in production? I'm used to programs that run once to do something specific (e.g. a raytracer that finishes running once it has computed an image), as opposed to web-apps which are continuously running and interacting with many clients. I'm sure there are many things that are obvious to long-time server administrators that aren't to newbies like me.
QUESTION THREE
What's important to look at when dealing with NodeJS specifically? What are statistics/analytics that become particularly critical when dealing with the single-threaded event loop of NodeJS versus more standard server systems?
I have other questions about how databases play into the equation, but I think this is enough for now...
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
Few facts.
node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
My suggestions.
use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
restart instances often, at least weekly
write poller that prints out memory stats with process module one per minute
Use supervisord and fabric for easy process management
Monitor cpu, reported memory stats and restart on threshold
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
How can you estimate the amount of users you expect?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
Even if I know the number of users, how can I estimate actual load?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
So the formula to work out daily server load is simple:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
And...
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.

Resources