I am using SQL Azure SQL Server for my App. My app was in was working perfectly till recently and the MAX dtu usage has been 100% but the AVG DTU usage ois around 50%.
Which value should i monitor to scale the services, MAX or AVG?
I found on the net after lots of searching:
CPU max/min and average within that 1 minute. As 1 minute (60 seconds) is the finest granularity, if you chose for example max, if the CPU has touched 100% even for 1 second, it will be shown 100% for that entire minute. Perhaps the best is to use the Average. In this case the average CPU utilization from 60 seconds will be shown under that 1 minute metric.
which sorta helped me with what it all meant, but thanks to bradbury9 too for your input.
Related
I currently have an alert that triggers on average cpu over 80% over the last 15 minutes. However, we have instances that stay over 100% for hours or days while others are very look leaving the average under 80%. I want to find those instances that are higher than 80% for an extended period of time. But what I don't want is a spike and using the max seems only do just that, spend out alerts for spikes and not constants. Anyway I can get such an alert configured?
I set an alarm average cpu utilization (1 minutes) > 65 on my Node.js elasticbeanstalk. While installing Node.js dependencies, the EC2 instance is using a lot of CPU resources.
However, I found that the "average" CPU usage didn't exceed this threshold, while the "maximum" CPU utilization exceeded this threshold. Why the elasticbeanstalk alarm occurs even if the average cpu utilization doesn't exceed the threshold?
[
Why is it happening? I'm tired of false positive CPU alarms :(
How do I solve this problem?
I set an alarm average cpu utilization (1 minutes) > 65 on my Node.js
elasticbeanstalk.
It means that the cloud watch alarm trigger will take an average of 1 minute and trigger the alarm if it crosses 60.
In the first screenshot, it seems that the CPU utilization was high from almost 9:57 until 10:07 for 10 minutes.
In the second screenshot, it shows the average was a max of 30 during this period. Let's do some math to understand it:
Cpu utilization was not consistently high, the graph shows the peak recorded and if the CPU is utilised 90% for 3 seconds and 10% for 57 seconds, the average will be 27% high for 1 minute.
The above case is almost similar to it. That's why you see different graphs in maximum and average.
The graph shows cpu's max > 96%, but cpu's avg < 10%
How can this be the case? (I mean, shouldn't cpu's avg > 40, or at least >30?)
Not really, I estimated some of the values from the Graph, and put them in a spreadsheet and calculated a 5 Min Average, as well as calculated the Max CPU and the Average of the 5 Min Average. Below is what it looks like. When you are doing an Average over a time, it smooths out all the peaks and lows.
Max 5 Min Avg
85
40
20
5
25 35
40 26
5 19
10 17
99 35.8
Max Average
99 26.56
If it is continually at high CPU, then your overall average will start growing.
However that average does look rather low on your graph, but you aren't showing the Min CPU either, so it may be short burst where it is high, but more often low CPU usage, you should graph that as well.
Are you trying to configure alerts or scaling? Then you should be looking at the average over a small period e.g. 5 minutes, and if that exceeds a threshold (usually 75-80%) then you send the alert and or scale out.
I asked Microsoft Azure support about this. The answer I received was not great and essentially amounts to "Yes, it does that." They suggested only using the average statistic since (as we've noticed) "max" doesn't work. This is due to the way data gets aggregated internally. The Microsoft Product engineering team has a request (ID: 9900425) in their large list to get this fixed, so it may happen someday.
I did not find any documentation on how that aggregation works, nor would Microsoft provide any.
Existing somewhat useful docs:
Data sources: https://learn.microsoft.com/en-us/azure/azure-monitor/agents/data-sources#azure-resources
Metrics and data collection: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/data-platform-metrics#data-collection
If there are 10k busses and peak point is 9:30Am to 10:30Am, there will be 2% increase of Vusers every year then what is the throughput after 10 years?
Please help me how to solve this type of questions without using a tool.
Thanks in Advance.
The formula would be:
10000 * (1.02)^10 = 12190
With regards to implementation, 10000 busses (whatever it is) per our is 166 per minute or 2.7 per second which doesn't seem to be a very high load for me. Depending on your load testing tool there are different options on how to simulate it, for Apache JMeter it will be Constant Throughput Timer, for LoadRunner there are Pages per minute / Hits Per Second goals, etc.
Below shows the 'CPU Total' as displayed on the Azure Live Metrics page for our Web App which is scaled out to 4 x S3 instances.
It's not clear to me (despite much research) if this CPU Total is a percentage of the max CPU available for the instance or something else. I have noticed that the CPU Total has crept above 100% from time to time, which makes me question if it is a percentage of the total.
If this metric is not a % of the total: is there anywhere in the Azure portal that will show you the % CPU usage of your servers as a % and not of a % multiplied by core count or anything else?