From the qstat (Sun Grid Engine) manpage:
mem: The current accumulated memory usage of the job in Gbytes seconds.
What does that mean?
I couldn't find better documentation than the man page where that description can be found. I think 1 Gbyte second is 1 Gbyte of memory used for one second. So if your code uses 1 GB for 1 minute then 2 GB for two minutes, the accumulated memory usage is 1*60 + 2*120 = 300 GByte seconds.
The Gigabyte-second unit specifies the amount of memory allocated to a task per second that that task runs.
Example:
Task 1: 1 GB * 10 seconds = 10 gb-sec
Task 2: .128 MB * 10 seconds ~ 0.128 GB * 10 Seconds = 1.28 gb-sec
Task 3: 8 GB * 10 seconds = 80 gb-sec
In practice it's often employed as a devops metric or as a way of billing out services designed to run for short time periods and then terminate such as serverless function services on PaaS clouds.
Related
Is there a way to set a limit on total (not simultaneous) used resources (core hours/minutes) for specific User or Account in SLURM?
My total spent resources in seconds are for example 109 seconds usage of threads. I want to limit that just for my user not minding the sizes of submitted jobs until that limit is reached.
[root#cluster slurm]# sreport job SizesByAccount User=HPCUser -t Seconds start=2022-01-01 end=2022-07-01 Grouping=99999999 Partition=cpu
--------------------------------------------------------------------------------
Job Sizes 2022-01-01T00:00:00 - 2022-06-16T14:59:59 (14392800 secs)
Time reported in Seconds
--------------------------------------------------------------------------------
Cluster Account 0-99999998 CP >= 99999999 C % of cluster
--------- --------- ------------- ------------- ------------
cluster root 109 0 100.00%
I set an alarm average cpu utilization (1 minutes) > 65 on my Node.js elasticbeanstalk. While installing Node.js dependencies, the EC2 instance is using a lot of CPU resources.
However, I found that the "average" CPU usage didn't exceed this threshold, while the "maximum" CPU utilization exceeded this threshold. Why the elasticbeanstalk alarm occurs even if the average cpu utilization doesn't exceed the threshold?
[
Why is it happening? I'm tired of false positive CPU alarms :(
How do I solve this problem?
I set an alarm average cpu utilization (1 minutes) > 65 on my Node.js
elasticbeanstalk.
It means that the cloud watch alarm trigger will take an average of 1 minute and trigger the alarm if it crosses 60.
In the first screenshot, it seems that the CPU utilization was high from almost 9:57 until 10:07 for 10 minutes.
In the second screenshot, it shows the average was a max of 30 during this period. Let's do some math to understand it:
Cpu utilization was not consistently high, the graph shows the peak recorded and if the CPU is utilised 90% for 3 seconds and 10% for 57 seconds, the average will be 27% high for 1 minute.
The above case is almost similar to it. That's why you see different graphs in maximum and average.
I have a spark batch job that runs every minute and processes ~200k records per batch. The usual processing delay of the app is ~30 seconds. In the app, for each request, we make a write request to DynamoDB. At times, the server-side DDB write latency is ~5 ms instead of 3.5 ms (~30% increase w.r.t to usual latency 3.5ms). This is causing the overall delay of the app to bump by 6 times (~3 minutes).
How does sub-second latency of DDB call impact the overall latency of the app by 6 times?
PS: I have verified the root cause through overlapping the cloud-watch graphs of DDB put latency and the spark app processing delay.
Thanks,
Vinod.
Just a ballpark estimate:
If the average is 3.5 ms latency and about half of your 200k records are processed in 5ms instead of 3.5ms, this would leave us with:
200.000 * 0.5 * (5 - 3.5) = 150.000 (ms)
of total delay, which is 150 seconds or 2.5 minutes. I don't know how well the process is parallelized, but this seems to be within the expected delay.
I am using SQL Azure SQL Server for my App. My app was in was working perfectly till recently and the MAX dtu usage has been 100% but the AVG DTU usage ois around 50%.
Which value should i monitor to scale the services, MAX or AVG?
I found on the net after lots of searching:
CPU max/min and average within that 1 minute. As 1 minute (60 seconds) is the finest granularity, if you chose for example max, if the CPU has touched 100% even for 1 second, it will be shown 100% for that entire minute. Perhaps the best is to use the Average. In this case the average CPU utilization from 60 seconds will be shown under that 1 minute metric.
which sorta helped me with what it all meant, but thanks to bradbury9 too for your input.
I want to know how to calculate number of users, Think time, Pacing time and number of Iteration for load testing.
Requirement is:
I need to achieve 10000 transaction per hour.
Need to do 1 hour execution.
Need to specify think time and pacing time
Note:
My script "aircraft" contains 7 transactions.
Overall Response time is 16 sec without think time.
How to calculate how many users to be given so that I can achieve 10000 transaction per hour and how much think time and Pacing time and number of Iteration I need to specify?
If your only goal is to simulate a certain number of transactions in a certain time period, you can do that with quite few virtual users in the test.
If your average transaction time for 7 transactions is 16 seconds it means you can do 7/16 transactions per second, using a single virtual user.
To get 10,000 transactions in an hour you would have to use multiple concurrent virtual users.
VU = Number of virtual users
time = test time in seconds
TPS = transactions per second
VU * time * TPS = total_transactions
In this case we know total_transactions but not VU, so we rewrite it to:
total_transactions / (time * TPS) = VU
Using the numbers we have, we get:
10000 / (3600 * 7/16) = 6.3
I.e. you need more than 6 VUs to get 10k transactions in one hour. Maybe go for 10 VUs and insert some sleep time as necessary to hit the exact 10,000 transactions.
How much sleep time and how many iterations would you get then?
10 users executing at 7 transactions per 16 seconds for one hour would execute a total of 10 * 7/16 * 3600 = 15,750 transactions. We need to slow the users down a bit. We need to make sure they don't do the full 7/16 transactions per second. We can use the formula again:
VU * time * TPS = total_transactions
TPS = total_transactions / (VU *time)
TPS = 10000 / (10 * 3600) => TPS = 0.2777...
We need to make sure the VUs only do 0.28 TPS, rather than 7/16 (0.44) TPS.
TPS = transactions / time
Your script does 7 transactions in 16 seconds, to get 7/16 (0.44) TPS.
To find out how much time the script needs to take, we then change it to:
time = transactions / TPS
time = 7 / 0.277778 => time = 25.2 seconds
Currently, your script takes 16 seconds, but we need it to take 25 seconds, so you need to add 9 seconds of sleep time.
So:
10 VUs, executing 7 transactions in 25 seconds, over the course of an hour, would produce 10,000 transactions:
10 * 7/25 * 3600 = 10080
The number of script iterations each VU would perform would be:
3600 / 25 = 144 iterations
To sum up:
Number of VUs: 10
Total sleep time during one iteration: 9
Iterations/VU: 144
Note that this all assumes that transaction time is constant and does not increase as a result of generating the traffic. This setup will generate close to 3 transactions per second on the target system, and if you have not tested at that frequency before, you don't know if that will slow down the target system or not.
I have one question: you mentioned TPS:7/16 - on what basis it is 7/16? It's 16/7.
Otherwise, take this calculation: 10000 transactions per hour, then per sec 10000/3600 = 2.77; this and 7/16 are the same. I think your calculation is wrong. Please correct me.