understanding sacct's maxRSS -- Why are there two rows for a job? - slurm

I have the output from sacct --format="jobID,CPUTime,MaxRSS" -j 66930332_195. I know maxRSS reports a value roughly equivalent to max memory usage. However, what do the two different rows in maxRSS refer to?
JobID CPUTime MaxRSS
------------ ---------- ----------
66930332_195 00:05:15
66930332_19+ 00:05:15 4688356K
66930332_19+ 00:05:15 2376K
Thanks in advance! I haven't been able to find this documented anywhere

If you use %20 like this to show the JobID in its entirety,
sacct --format="jobID%20,CPUTime,MaxRSS"
you would probably see something like this:
JobID CPUTime MaxRSS
------------ ---------- ----------
66930332_195 00:05:15
66930332_195.0 00:05:15 4688356K
66930332_195.1 00:05:15 2376K
The first row corresponds to the job itself and the other ones correspond to the jobs steps. That should correspond to the number of srun calls you do in your submission script.

Related

How to find the average CPU of a process over X amount of time in Linux

I'm trying to find out how I can get \ calculate the CPU utilization of a specific process over X amount of time ( I write my code in python over a Linux based system ).
What I want to get for example is the average CPU of a process in the last hour\day\10 minutes...
Is there a command or a calculation I can run?
*I can't run a command like "top" in the background for X time and calculate the CPU, I need it to be in one set of commands or calculation.
I tried top research on top command but I didn't found useful info for my case.
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu - give the average consumption on the process lifetime
Can there be a way to use uptime or proc[pid]\stat to calculate this?
Thanks,
What about using pidstat?
$ pidstat -p 12345 10
Will output the stats for pid 12345 every 10 seconds. This include CPU%
From there you can run it on background, and redirect output to a file:
$ pidstat -p 12345 10 > my_pid_stats.txt &
Here is a Link with several examples. There is a lot of flexibility in the output, so you can probably customize it to better suit your needs:
https://www.thegeekstuff.com/2014/11/pidstat-examples/
pidstat is part of the sysstat package on ubuntu, in case you decide to install it.

Linux function profiler output

I would like to profile a code flow in kernel in order to understand where is the bottleneck. I found that function profiler does just that for me:
https://lwn.net/Articles/370423/
Unfortunately the output I see doesn't make sense to me.
From the link above, the output of the function profiler is:
Function Hit Time Avg
-------- --- ---- ---
schedule 22943 1994458706 us 86931.03 us
Where "Time" is the total time spend inside this function during the run. So if I have function_A that calls function_B, if I understood the output correctly, the "Time" measured for function_A includes the duration of function_B as well.
When I actualy run this on my pc I see another new column displayed for the output:
Function Hit Time Avg s^2
-------- --- ---- --- ---
__do_page_fault 3077 477270.5us 155.109 us 148746.9us
(more functions..)
What does s^2 stand for? It cant be standratd deviation because it's higher then the average...
I measured the total duration of this code flow from user space and got 400 ms. When summing up the s^2 column it came close to 400 ms. Which makes me think that perhaps is the "pure" time spent in __do_page_fault, that doest include the duration of the nested functions.
Is this correct? I didn't find any documentation of the s^2 column so I'm hesitant regarding my conclusions.
Thank you!
You can see the code that calculates the s^2 column here. It seems like this is the variance (standard deviation squared). If you take the root out of the number in your example, you get 385 us, which is closer to the average in the example.
The standard deviation is still greater than the mean, but that is fine.

In the sacct command in the Slurm workload manager, is there a way to find the maximum of the "Elapsed" column or sort it?

Currently, I am running a job array of 1000 using Slurm. When it is done, I use sacct to see how much time was actually taken. I would like to see what was the longest running job. It would be the largest value under the "Elapsed" column. Is there a way to sort it?
sacct -o reqmem,maxrss,averss,elapsed -j 44523498
ReqMem MaxRSS AveRSS Elapsed
---------- ---------- ---------- ----------
800Mn 02:24:15
800Mn 655756K 655756K 02:24:15
800Mn 844K 344K 02:24:17
800Mn 02:10:08
800Mn 631912K 631912K 02:10:08
800Mn 1032K 344K 02:10:08
800Mn 01:38:14
800Mn 635304K 635304K 01:38:14
800Mn 848K 348K 01:38:14
800Mn 02:28:04
This is what I have below. Thanks!
The easiest way would be to display the elapsed as the first column and use the sort command. sort sorts alphanumerically, which will work with the Slurm time formatting thanks to the 2-digit padding.
sacct -o elapsed,reqmem,maxrss,averss -j 44523498 | sort
You can additionally use the -n flag to avoid the header lines, if you do not want them to clutter the output.
Note that sort can be told to sort on a specific column, with the -k flag but here the number of column is different for job steps and job summary. You can remove the job step information (which is redundant w.r.t elapsed) with -X.

sacct: How could I convert {CPUTimeRAW and CPUTime} into real time as seconds?

My goal is to charge users based on the time(in seconds) they allocated the CPU. What is best parameter to measure it?
The way I run:
Example 1:
sbatch -N1 run.sh
Submitted batch job 20
scontrol update jobid=20 TimeLimit=0-00:01
sacct -o totalcpu,cputime,cputimeraw,Elapsed,SystemCPU,time -j 20
TotalCPU CPUTime CPUTimeRAW Elapsed SystemCPU Timelimit
---------- ---------- ---------- ---------- ---------- ----------
00:00:00 00:11:52 712 00:01:29 00:01:00
00:00:00 00:11:52 712 00:01:29
I had put a time limit as 1 minute, but it seems like it exceeds the time limit for 29 seconds. Is it normal?
Example 2:
sbatch -N1 run.sh
Submitted batch job 21
scontrol update jobid=21 TimeLimit=0-00:02
sacct -o totalcpu,cputime,cputimeraw,Elapsed,SystemCPU,time -j 21
TotalCPU CPUTime CPUTimeRAW Elapsed SystemCPU Timelimit
---------- ---------- ---------- ---------- ---------- ----------
00:00:00 00:18:56 1136 00:02:22 00:02:00
I had put a time limit as 2 minute, but it seems like it exceeds the time limit for 22 seconds. Is it normal?
How could I convert {CPUTimeRAW and CPUTime} into real time as seconds? Based on the examples I have shown, I wasn't able to find the relationship between them.
CPUTimeRaw = Units are cpu-seconds.
The small overrun of the time limit is normal, this is determined by the KillWait flag in slurm.conf:
The interval, in seconds, given to a job's processes between the
SIGTERM and SIGKILL signals upon reaching its time limit. If the job
fails to terminate gracefully in the interval specified, it will be
forcibly terminated. The default value is 30 seconds.
For charging users:
CPUTime = (Elapsed time) x (the number of CPUs allocated)
so CPUTime (or CPUTimeRaw, the same usage expressed in seconds) is what they actually used and what they can be charged for.

How to get GPU (GRES) Allocation Reports using SLURM

I read in slurm docs that we could use (after setting up the accounting)
sacct --format="JobID,AllocCPUS,**ReqGRES** to get the statistics of requests for GRES. I have also configured my GPUs (there are 2) with gres.conf but this command always returns 0 for ReqGRES or AllocGRES. Any ideas?
Thanks in advance
There are many reasons I think you are not root user the sacct display just the user's job login or you must add the option -a or you have problem with your configuration file slurm.conf or the log file of slurm it is necessary to check
sacct -a -X --format=JobID,AllocCPUS,Reqgres
It works.
I always find these reports very helpful from sreport. Just specify the TRES as done in gres.conf slurm.conf.
$sreport -tminper cluster utilization --tres="gres/gpu" start=2019-05-01T00:00:00
--------------------------------------------------------------------------------
Cluster Utilization 2019-05-01T00:00:00 - 2019-05-14T23:59:59
Usage reported in TRES Minutes/Percentage of Total
--------------------------------------------------------------------------------
Cluster TRES Name Allocated Down PLND Down Idle Reserved Reported
--------- -------------- ----------------- ----------------- ----------------- ----------------- ----------------- ------------------
gpugrid+ gres/gpu 8186500(70.06%) 17889(0.96%) 0(0.00%) 1289051(22.97%) 0(0.00%) 9693440(100.00%)
You can also do per user, per gres eg: --tres="gres/gpu:v100" (configure slurm.conf) etc.
Try using AllocTRES
sacct -X --format="JobID, State%-10, JobName%-30, Elapsed, AllocTRES%-42"
You can also use -e to look at the list of available fields that can be specified in the format option. You can also see the list here: https://slurm.schedmd.com/sacct.html#OPT_helpformat
sacct -e

Resources