Azure Scale Out on Memory keeps flapping (not scaling in) - azure

I'm trying to set up an auto-scale on Azure to scale out when 5-min average memory exceeds 90%.
Here's a 24-hour chart of 1-min memory usage:
I have a scale-out rule for 5-min average memory from 1 to 2. I have a scale-in rule when average 5 min memory is under 80%. Admittedly this is quite tight. However, the scale in NEVER seems to fire, it's always prevented by 'flapping'. Surely given the chart above there would be several places where it could scale in? (I don't even see where a 5-min average would be triggered for the scale up, given the chart is a 1-min average).

I've also discovered that scaling out based on memory percentage gets tricky with smaller instance sizes; that's because Azure fails to recognize that the greater part of the memory used is actually dedicated to the OS and other infrastructure...
Here you can find a quite exhaustive explanation: https://medium.com/#jonfinerty/flapping-and-anti-flapping-dcba5ba92a05.
There seem to be no workarounds and I'm considering letting it go (and only scale out and in based on CPU).

Related

Frequency scaling and CPU utilization

There are quite a few questions here on StackOverflow explaining how to calculate process CPU utilization (e.g. this). What I don't understand is how frequency scaling affects CPU utilization calculations. It seems to me, if I follow the formula recommended (and I actually also checked top's source code and it does the same), a process running on a CPU at the lowest frequency and a process running at the highest CPU frequency for the same duration will yield identical utilization rate. But this doesn't feel right to me, especially when CPU utilization is used as a stand-in to compare power consumption.
What am I missing?

Node app eating memory incrementally over time

I've just launched two Express servers on DigitalOcean along with an instance of mongodb. I'm using PM2 to keep both of them running.
When I use htop to see the memory usage, the total usage is usually around 220-235mb (out of a total 488mb). The only thing I can see changing is the blue bars which I assume is buffer memory, the actual green memory in usage seems to always be around same.
I look on DO's graph however and over the past 24 hours the memory graph has been climbing upwards slowly, say 0.5% of the total per hour, sometimes it drops but overall it's on the up, at the moment it has been hovering around 60-65% of the total memory for a few hours.
There has been almost no traffic on these node web servers yet the memory keeps increasing slowly. So my question is could this be a memory leak within one of my servers or is it the nature of the v8 engine to incrementally expand its memory?
If you are considering memory-leak, then why don't you check your theory by writing 2-3 heapdumps with 2-3 hours difference in time. Then you can answer surely on your question.
You may use this module to write heapdumps on disk and then simply compare it using Chrome Developer Tools. Moreover you will see what's exactly placed inside the heap.
FYI: snapshots comparison from official documentation

Azure auto scale decrease count by doesn't work

i've set up Azure auto scaling regarding some rules. Yesterday my website reached 65% memory usage , so new instance was added. But since yesterday night till now the memory usage is under 40% so i wondering why i still have 2 instances intead of 1 ( removing the 1 instance because of the rule memory < 40% decrease count by 1)
here are the screen capture
Auto-scaling rules
Even in the service management , only the rule increase count by 1 was reached as shown in the above capture
Service management
Your rules might be overlapping, did you check if both the CPU and the Memory are under 40% ? to make sure they are not, You can try and remove the CPU rules to test. for testing purpose you can set the cool down to a low value 1-5mins.
I hope it helps,

Azure - Extra small instance web role - ready for production?

I'm planning for a website running in Azure. I'm estimating max. 2000 users a day creating about 20.000 hits.
I know I'm kinda vague here, but is the extra small instance ready for this kind of site? I'm using MVC 3 to create the site. Thanks for any answers.
You'd have to do some load-testing to best judge that question. Remember that, to enjoy the benefits of Windows Azure Compute SLA, you'll need a minimum of 2 instances (so now you have instances in different fault domains, so your site remains running even if one of the instances recycles due to OS upgrade, hardware failure, etc.). The question then becomes: can two Extra Small instances handle 20,000 hits daily? This equates to approx. 10K hits per VM instance per day, or 416 hits per hour, or 7 per minute. And... even with one instance, a hit rate of 14 per minute is fairly low.
More than CPU, you might find yourself bottlenecked by bandwidth, since you'll only see about 5Mbps per instance, vs. around 100Mbps per Small instance.
You might want to run a quick test with something like LoadStorm, which provides Load Testing as a Service. This should give you a good idea of how well the XS will perform under load.
EDIT (March 2012): Extra Small instances are now $0.02 / hour vs $0.04, so you could run up to 6 XS instances for the same cost of a single Small. This makes the XS option even more compelling. See this blog post for the official announcement on the price drop (including Storage cuts as well).
I agree with David that this is very dependent on the load per request you generate (both in CPU and bandwidth resources)
I just wanted to share our own experience with the XS instances. We've found that these instances suffer from severe clock drift: http://blog.codingoutloud.com/2011/08/25/azure-faq-how-frequently-is-the-clock-on-my-windows-azure-vm-synchronized/
This could be as much as a minute of difference over the week between NTP syncs. For most applications this isn't necessarily a problem, but we used Oauth1.0a authentication with an allowed timestamp difference of 30 seconds which resulted in major headaches when using XS. The S and larger don't have shared cores and consequently suffer much less clock drift.
You get a better SLA with 2 small instances rather than 1 larger.
You should also look at your peak load. For example with 20,000 hits per day, do 50% come between 9 and 10 in the morning?
Instance storage is 20GB, if this is just your application code should not be a problem.
IO performance is low, if this is just reading your app code first time it compiles should not be a problem.
CPU single 1 GHz, if this is just web pages and little calculation should not be a problem. The time this will be really slow is during a JIT compile.
The memory is 768 MB, this could be a problem especially if you are caching data.
You save under 2 USD a day using the small instance. But that is a Latte every 2 days so maybe it is worth taking the risk and having to do an extra deploy.

Scaling in Windows Azure for IO Performance

Windows Azure advertises three types of IO performance levels:
Extra Small : Low
Small: Moderate
Medium and above: High
So, if I have an IO bound application (rather than CPU or Memory bound) and need at least 6 CPUs to process my work load - will I get better IO performance with 12-15 Extra Smalls, 6 Smalls, or 3 Mediums?
I'm sure this varies based on applications - is there an easy way to go about testing this? Are there any numbers that give a better picture of how much of an IO performance increase you get as you move to large instance roles?
It seems like the IO performance for smaller roles could be equivalent to the larger ones, they are just the ones that get throttled down first if the overall load becomes too great. Does that sound right?
Windows Azure compute sizes offer approx. 100Mbps per core. Extra Small instances are much lower, at 5Mbps. See this blog post for more details. If you're IO-bound, the 6-Small setup is going to offer far greater bandwidth than 12 Extra-Smalls.
When you talk about processing your workload, are you working off a queue? If so, multiple worker roles, each being Small instance, could then each work with a 100Mbps pipe. You'd have to do some benchmarking to determine if 3 Mediums gives you enough of a performance boost to justify the larger VM size, knowing that when workload is down, your "idle" cost footprint per hour is now 2 cores (medium, $0.24) vs 1 (small, $0.12).
As I understand it, the amount of IO allowed per-core is constant and supposed to be dedicated. But I haven't been able to get formal confirmation of this. This likely is different for x-small instances which operatin in a shared mode and not dedicated like the other Windows Azure vm instances.
I'd imagine what you suspect is in fact true, that even being IO-bound varies by application. I think you could accomplish your goal of timing by using Timers and writing the output to a file on storage you could then retrieve. Do some math to figure out you can process X number of work units / hour by cramming as many through a small then a medium instance as possible. If your work unit size drastically fluctuates, you might have to do some averaging too. I would always prefer smaller instances if possible and just spin up more copies as you have need for more firepower.

Resources