CPU cores are underutilised while running n=-1 jobs - linux

I have read that the Green bar represents the normal user process and the Red bar represents the time spent in kernel(kernel, iowait etc).
I'm running a jupyter-notebook that runs n=-1 jobs, and by definition all my cores should be used as normal user process and the htop should show high per core usage. And the normal behaviour while running this notebook was that the average per core usage used to hit 98-99%.
But weirdly now running the same notebook, the per core usage is limited to a max of 18% with an abrupt increase in kernel time.
I want to understand why this is happening.

Related

htop shows that cpu usage of per core over 100%?

I'm using htop to monitor the CPU usage of my task. However, the CPU% value exceed 100% sometimes, which really confused me.
Some blogs explain that this is because I'm using a multi-core machine(this is true). If there are 8 (logic) cores, the max value of CPU% is gonna be 800%. CPU% over 100% means that my task is occupying more than one core.
But my question is: there is a column named CPU in htop window which shows the id of the core my task is running on. So how can the usage of this single core exceed 100%.
This is the screenshot. You can see the 84th core's usage is 375%!

Estimate Core capacity required based on load?

I have quad core ubuntu system. say If I see the load average as 60 in last 15 mins during peak time. Load average goes to 150 as well.
This loads happens generally only during peak time. Basically I want to know if there is any standard formula to derive the number of cores ideally required to handle the given load ?
Objective :-
If consider the load as 60 then it means 60 task were in queue on an average at any point of time in last 15 mins ? Adding cpu can help me to server the
request faster or save system from hang or crashing .
Linux load average (as printed by uptime or top) includes tasks in I/O wait, so it can have very little to do with CPU time that could potentially be used in parallel.
If all the tasks were purely CPU bound, then yes 150 sustained load average would mean that potentially 150 cores could be useful. (But if it's not sustained, then it might just be a temporary long queue that wouldn't get that long if you had better CPU throughput.)
If you're getting crashes, that's a huge problem that isn't explainable by high loads. (Unless it's from the out-of-memory killer kicking in.)
It might help to use vmstat or dstat to see how much CPU time is spent in user/kernel space when your load avg. is building up, or if it's probably mostly I/O.
Or of course you probably know what tasks are running on your machine, and whether one single task is I/O bound or CPU bound on an otherwise-idle machine. I/O throughput usually scales a bit positively with queue depth, except on magnetic hard drives when that turns sequential read/write into seek-heavy workloads.

Linux command that tracks statistics of CPU usage while running application on HPC/HTC

In my PBS script, I am running matlab and would like to know how many many cores were actually used during the time. Especially I would like to know the max number of cores used at a time.
If I only allocate x number of cores but at any time matlab uses more than x number of cores then my job will be stopped and cancelled by the HPC/HTC system.
Ideally the command and output would be as simple as
cpustats matlab -nojvm -r "someExperiment(params);exit()"
Max CPU usage: 12.5 cores
Average CPU usage: 6 cores
Min CPU usage: 0.5 cores
I can't monitor the progress manually because it is a batch script so I am planning on running once with plenty of cores and then modifying the rest so I don't have to wait so long.
I have searched and searched for a command like this but the following don't seem to be what I am looking for
top finds the current cpu usage which I don't have access to
ps finds cpu allotted to a process and not actual usage
watch might be useful to query random cpu times and output them but would like a continuous stream if possible
time is really close to what I want but doesn't keep track of peak CPU usage
The most similar question I could find was this one about peak memory usage

Weird EC2 CPU usage

I'm really confused. Why does the load average and %CPU does not match the process CPU usage below. It seems like the process is eating up a lot of CPU while the AWS EC2 meters only says 25% CPU is used.
%CPU -- CPU Usage : The percentage of your CPU that is being used by the process. By default, top displays this as a percentage
of a single CPU. On multi-core systems, you can have percentages
that are greater than 100%. For example, if 3 cores are at 60% use,
top will show a CPU use of 180%.
You can toggle this behavior by hitting Shift+i while top is running to show the overall percentage of available
CPUs in use.
load average: 22.56, 24.99, 26.51
From left to right, these numbers show you the average load over the last 1 minute, the last 5 minutes, and the last 15 minutes.
us -- User CPU time
The time the CPU has spent running users' processes that are not niced.
sy -- System CPU time
The time the CPU has spent running the kernel and its processes.
ni -- Nice CPU time
The time the CPU has spent running users' proccess that have been niced.
wa -- iowait
Amount of time the CPU has been waiting for I/O to complete.
hi -- Hardware IRQ
The amount of time the CPU has been servicing hardware interrupts.
si -- Software Interrupts
The amount of time the CPU has been servicing software interrupts.
st -- Steal Time
The amount of CPU 'stolen' from this virtual machine by the hypervisor for other tasks (such as running another virtual machine).
See more details from In Linux “top” command what are us, sy, ni, id, wa, hi, si and st (for CPU usage).
after you run command "top" you can press "1" on your keyboard to see individual CPU utilization, more details when you run command "man top"
Note process "msqld" can use CPU from several resources and its utilization % could easily go beyond 100% in "top" display.
Hi maybe your app using single core and other cores are free. I think your instance has 4 CPU core and one is utilizing 100%. can you please check utilization by each core.

Linux acceptable load average

I have a linux dedicated server machine(8cores 8gbRAM) where i run some crawler php scripts. The load on the system ends up being arround 200, which sounds a lot. Since i am not using the machine to host content, what could be the sideeffects of such high level of load for the purposes stated above.
Machines were made to work so there are no issues with high load average, per se. But, a high load average can be an indicator of a performance issue, often. Such investigation is usually application specific, but here is a very general guideline:
Since load average is a combined metric of (CPU, IO .. etc) you want to examine all separately. I would start with making sure the machine is not thrashing, by checking swap usage (vmstat comes in handy), and disk performance (using iostat). You may also check if your operations are CPU intensive.
You should read your load average value as a 3 component value (1 minute load, 5 minutes load and 15 minutes load respectively).
Take a look at the example taken from Wiki:
For example, one can interpret a load average of "1.73 0.60 7.98" on a single-CPU system as:
during the last minute, the system was overloaded by 73% on average (1.73 runnable processes, so that 0.73 processes had to wait for a turn for a single CPU system on average).
during the last 5 minutes, the CPU was idling 40% of the time on average.
during the last 15 minutes, the system was overloaded 698% on average (7.98 runnable processes, so that 6.98 processes had to wait for a turn for a single CPU system on average).
Full article
Please note that this value depends on the resources of your machine.
Cheers!

Resources