Why ElasticBeanstalk CPU alarm checks maximum not average? - node.js

I set an alarm average cpu utilization (1 minutes) > 65 on my Node.js elasticbeanstalk. While installing Node.js dependencies, the EC2 instance is using a lot of CPU resources.
However, I found that the "average" CPU usage didn't exceed this threshold, while the "maximum" CPU utilization exceeded this threshold. Why the elasticbeanstalk alarm occurs even if the average cpu utilization doesn't exceed the threshold?
[
Why is it happening? I'm tired of false positive CPU alarms :(
How do I solve this problem?

I set an alarm average cpu utilization (1 minutes) > 65 on my Node.js
elasticbeanstalk.
It means that the cloud watch alarm trigger will take an average of 1 minute and trigger the alarm if it crosses 60.
In the first screenshot, it seems that the CPU utilization was high from almost 9:57 until 10:07 for 10 minutes.
In the second screenshot, it shows the average was a max of 30 during this period. Let's do some math to understand it:
Cpu utilization was not consistently high, the graph shows the peak recorded and if the CPU is utilised 90% for 3 seconds and 10% for 57 seconds, the average will be 27% high for 1 minute.
The above case is almost similar to it. That's why you see different graphs in maximum and average.

Related

High latency over time

I am running a nodejs application which uses redis and sequelize library(to connect mysql).The application runs on cluod run. Initally morning when the transactions starts the response is fast.But as time passes by, the response time for 50 percentile is less than 1 sec. Whereas my 99 percentile and 95 percentile response time is less than (15 secs) resulting in very high latency. But memory stays at 20% out of 512 MB. Also my 95 percentile and 99 percentile is more than 80% cpu but my 50 percentile is less than 30%. What could be the issue? Is it due to memory paging or any other rasons?

How to configure an Azure alert for the max cpu held over x time period

I currently have an alert that triggers on average cpu over 80% over the last 15 minutes. However, we have instances that stay over 100% for hours or days while others are very look leaving the average under 80%. I want to find those instances that are higher than 80% for an extended period of time. But what I don't want is a spike and using the max seems only do just that, spend out alerts for spikes and not constants. Anyway I can get such an alert configured?

Azure percent-cpu avg and max are too different

The graph shows cpu's max > 96%, but cpu's avg < 10%
How can this be the case? (I mean, shouldn't cpu's avg > 40, or at least >30?)
Not really, I estimated some of the values from the Graph, and put them in a spreadsheet and calculated a 5 Min Average, as well as calculated the Max CPU and the Average of the 5 Min Average. Below is what it looks like. When you are doing an Average over a time, it smooths out all the peaks and lows.
Max 5 Min Avg
85
40
20
5
25 35
40 26
5 19
10 17
99 35.8
Max Average
99 26.56
If it is continually at high CPU, then your overall average will start growing.
However that average does look rather low on your graph, but you aren't showing the Min CPU either, so it may be short burst where it is high, but more often low CPU usage, you should graph that as well.
Are you trying to configure alerts or scaling? Then you should be looking at the average over a small period e.g. 5 minutes, and if that exceeds a threshold (usually 75-80%) then you send the alert and or scale out.
I asked Microsoft Azure support about this. The answer I received was not great and essentially amounts to "Yes, it does that." They suggested only using the average statistic since (as we've noticed) "max" doesn't work. This is due to the way data gets aggregated internally. The Microsoft Product engineering team has a request (ID: 9900425) in their large list to get this fixed, so it may happen someday.
I did not find any documentation on how that aggregation works, nor would Microsoft provide any.
Existing somewhat useful docs:
Data sources: https://learn.microsoft.com/en-us/azure/azure-monitor/agents/data-sources#azure-resources
Metrics and data collection: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/data-platform-metrics#data-collection

azure what is Time Aggregation Max or AVG

I am using SQL Azure SQL Server for my App. My app was in was working perfectly till recently and the MAX dtu usage has been 100% but the AVG DTU usage ois around 50%.
Which value should i monitor to scale the services, MAX or AVG?
I found on the net after lots of searching:
CPU max/min and average within that 1 minute. As 1 minute (60 seconds) is the finest granularity, if you chose for example max, if the CPU has touched 100% even for 1 second, it will be shown 100% for that entire minute. Perhaps the best is to use the Average. In this case the average CPU utilization from 60 seconds will be shown under that 1 minute metric.
which sorta helped me with what it all meant, but thanks to bradbury9 too for your input.

Azure CPU metric (Live Metrics Stream)

Below shows the 'CPU Total' as displayed on the Azure Live Metrics page for our Web App which is scaled out to 4 x S3 instances.
It's not clear to me (despite much research) if this CPU Total is a percentage of the max CPU available for the instance or something else. I have noticed that the CPU Total has crept above 100% from time to time, which makes me question if it is a percentage of the total.
If this metric is not a % of the total: is there anywhere in the Azure portal that will show you the % CPU usage of your servers as a % and not of a % multiplied by core count or anything else?

Resources