I submit jobs using headless NetLogo to a HPC server by the following code:
#!/bin/bash
#$ -N r20p
#$ -q all.q
#$ -pe mpi 24
/home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \
--model /home/abhishekb/models/corrected-rk4-20presults.nlogo \
--experiment test \
--table /home/abhishekb/csvresults/corrected-rk4-20presults.csv
Below is the snapshot of a cluster queue using:
qstat -g c
I wish to know can I increase the CQLOAD for my simulations and what does it signify too. I couldn't find an elucidate explanation online.
CPU USAGE CHECK:
qhost -u abhishekb
When I run the behaviour space on my PC through gui assigning high priority to the task makes it use nearly 99% of the CPU which makes it run faster. It uses a greater percentage of CPU processor. I wish to accomplish the same here.
EDIT:
EDIT 2;
A typical HPC environment, is designed to run only one MPI process (or OpenMP thread) per CPU core, which has therefore access to 100% of CPU time, and this cannot be increased further. In contrast, on a classical desktop/server machine, a number of processes compete for CPU time, and it is indeed possible to increase performance of one of them by setting the appropriate priorities with nice.
It appears that CQLOAD, is the mean load average for that computing queue. If you are not using all the CPU cores in it, it is not a useful indicator. Besides, even the load average per core for your runs just translates the efficiency of the code on this HPC cluster. For instance, a value of 0.7 per core, would mean that the code spends 70% of time doing calculations, while the remaining 30% are probably spent waiting to communicate with the other computing nodes (which is also necessary).
Bottom line, the only way you can increase the CPU percentage use on an HPC cluster is by optimising your code. Normally though, people are more concerned about the parallel scaling (i.e. how the time to solution decreases with the number of CPU cores) than with the CPU percentage use.
1. CPU percentage load
I agree with #rth answer regards trying to use linux job priority / renice to increase CPU percentage - it's
almost certain not to work
and, (as you've found)
you're unlikely to be able to do it as you won't have super user priveliges on the nodes (It's pretty unlikely you can even log into the worker nodes - probably only the head node)
The CPU usage of your model as it runs is mainly a function of your code structure - if it runs at 100% CPU locally it will probably run like that on the node during the time its running.
Here are some answers to the more specific parts of your question:
2. CQLOAD
You ask
CQLOAD (what does it mean too?)
The docs for this are hard to find, but you link to the spec of your cluster, which tells us that the scheduling engine for it is Sun's *Grid Engine". Man pages are here (you can access them locally too - in particular typing man qstat)
If you search through for qstat -g c, you will see the outputs described. In particular, the second column (CQLOAD) is described as:
OUTPUT FORMATS
...
an average of the normalized load average of all queue
hosts. In order to reflect each hosts different signifi-
cance the number of configured slots is used as a weight-
ing factor when determining cluster queue load. Please
note that only hosts with a np_load_value are considered
for this value. When queue selection is applied only data
about selected queues is considered in this formula. If
the load value is not available at any of the hosts '-
NA-' is printed instead of the value from the complex
attribute definition.
This means that CQLOAD gives an indication of how utilized the processors are in the queue. Your output screenshot above shows 0.84, so this indicator average load on (in-use) processors in all.q is 84%. This doesn't seem too low.
3. Number of nodes reserved
In a related question, you state colleagues are complaining that your processes are not using enough CPU. I'm not sure what that's based on, but I wonder the real problem here is that you're reserving a lot of nodes (even if just for a short time) for a job that they can see could work with fewer.
You might want to experiment with using fewer nodes (unless your results are very slow) - that is achieved by altering the line #$ -pe mpi 24 - maybe take the number 24 down. You can work out how many nodes you need (roughly) by timing how long 1 model run takes on your computer and then use
N = ((time to run 1 job) * number of runs in experiment) / (time you want the run to take)
So you want to make to make your program run faster on linux by giving it a higher priority than all other processes?
In that case you have to modify something called the program's niceness. This is normally done by invoking the command nice when you first start the program or the command renice while the program is already running. A process can have a niceness from -20 to 19 (inclusive) where lower values give the process a higher priority. Due to security reasons, you can only decrease a processes' niceness if you are the super user (root).
So if you want to make a process run with higher priority then from within bash do
[abhishekb#hpc ~]$ start_process &
[abhishekb#hpc ~]$ jobs -x sudo renice -n -20 -p %+
Or just use the last command and replace the %+ with the process id of the process you want to increase the priority for.
Related
I have a Red hat server where I can see the load average on the system is 23 24 23 (1min 5min 15min) using the top command. And i can see in /proc/cpuinfo there are 24 processors entry (0-23). But in each processor entry the cpu cores value is 6 and in each processor entry the physical id is either 1 or 0.
I want to know if my system is overloaded. Can anyone please tell me.
It seems you have a system with two processors, each with 6 cores. Each of the cores can likely run hyperthread => 2 x 6 x 2 = 24. In /proc/cpuinfo, top etc, you will see each hyperthread: that's the number of parallel processes or threads that your hardware can run.
The quick answer is that your system is probably not overloaded and that it processes a relatively stable amount of work over time (as the 1, 5 and 15 minute values are about the same). A rule of thumb is that the load average value should stay below the number of hyperthreads -- this is not, however, exactly true.
You'll find a more in-depth discussion here:
https://unix.stackexchange.com/questions/303699/how-is-the-load-average-interpreted-in-top-output-is-it-the-same-for-all-di
and here:
https://superuser.com/questions/23498/what-does-load-average-mean-on-unix-linux
and perhaps here:
https://linuxhint.com/load_average_linux/
However, please keep in mind that load average does not tell you everything about your system -- though it is usually a pretty good indicator in my experience. You'd have to check many other factors to determine overloadedness, like memory pressure, I/O wait times, I/O bandwidth utilization. It also depends on what kind of processing the system is performing.
In my PBS script, I am running matlab and would like to know how many many cores were actually used during the time. Especially I would like to know the max number of cores used at a time.
If I only allocate x number of cores but at any time matlab uses more than x number of cores then my job will be stopped and cancelled by the HPC/HTC system.
Ideally the command and output would be as simple as
cpustats matlab -nojvm -r "someExperiment(params);exit()"
Max CPU usage: 12.5 cores
Average CPU usage: 6 cores
Min CPU usage: 0.5 cores
I can't monitor the progress manually because it is a batch script so I am planning on running once with plenty of cores and then modifying the rest so I don't have to wait so long.
I have searched and searched for a command like this but the following don't seem to be what I am looking for
top finds the current cpu usage which I don't have access to
ps finds cpu allotted to a process and not actual usage
watch might be useful to query random cpu times and output them but would like a continuous stream if possible
time is really close to what I want but doesn't keep track of peak CPU usage
The most similar question I could find was this one about peak memory usage
I want to know if my program was run in parallel over multiple cores. I can get the perf tool to report how many cores were used in the computation, but not if they were used at the same time (in parallel).
How can this be done?
You can try using the command
top
in another terminal while the program is running. It will show the usage of all the cores on your machine.
A few possible solutions:
Use htop on another terminal as your program is being executed. htop shows the load on each CPU separately, so on an otherwise idle system you'd be able to tell if more than one core is involved in executing your program.
It is also able to show each thread separately, and the overall CPU usage of a program is aggregated, which means that parallel programs will often show CPU usage percentages over 100%.
Execute your program using the time command or shell builtin. For example, under bash on my system:
$ dd if=/dev/zero bs=1M count=100 2>/dev/null | time -p xz -T0 > dev/null
real 0.85
user 2.74
sys 0.14
It is obvious that the total CPU time (user+sys) is significantly higher than the elapsed wall-clock time (real). That indicates the parallel use of multiple cores. Keep in mind, however, that a program that is either inefficient or I/O-bound could have a low overall CPU usage despite using multiple cores at the same time.
Use top and monitor the CPU usage percentage. This method is even less specific than time and has the same weakness regarding parallel programs that do not make full use of the available processing power.
I need to maximize speed while converting videos using FFmpeg to h264
Any input format of source videos
User's machine can have any number of cores
Power and memory consumption are non-issues
Of course, there are a whole bunch of options that can be tweaked but this question is particularly about choosing the best -thread <count> option. I am trying to find an ideal thread count as a function of
no. of cores
input video format
h264-friendly values maybe?
anything else missed above?
I am aware the default -thread 0 follows one-thread-per-core approach which is supposed to be optimal. But I am not sure if this is time or space-optimized. Also, on certain testcases, I've seen more threads (say 4 threads on my dual core test machine) finishes quicker than the default.
Any other direction, say configure options w.r.t. threads, worth pursuing?
I have found that threads do not do a good job of utilizing all the cores, the hyper-threads do not get used at all. One solution I could come up with is to run a 3 to 4 ffmpeg processes in parallel, See: https://superuser.com/questions/538164/how-many-instances-of-ffmpeg-commands-can-i-run-in-parallel/547340#547340 This approach ends up using all the cores fully and is faster than the single input, multiple outputs in a single command option.
If your 'dual-core' has hyperthreading, then 2x cores would probably be correct. There's unlikely to be gain going beyond the number of virtual cores (inc. hyperthreading), but perhaps due to internal issues in FFmpeg it might be true.
I have experimented thoroughly with threads 0, 6, 12, 24 and it doesn't make a difference in frame rate, overall processing time or CPU utilization. Note my system has 12 physical cores too. Generally it seems to do a good job of using your processing power without specifying threads where my 12 cores are basically 98-99% utilized for the duration while watching top/system monitor.
I wish there was a magic bullet but for now there is no other way to speed things up as ffmpeg is currently optimized very well in my opinion. The only alternative is simply to get more computing power or to do distributed processing.
*Note all my tests were using ffmpeg version 3.3.1
My hosting provider (pairNetworks) has certain rules for scripts run on the server. I'm trying to compress a file for backup purposes, and would ideally like to use bzip2 to take advantage of its AWESOME compression rate. However, when trying to compress this 90 MB file, the process sometimes runs upwards of 1.5 minutes. One of the resource rules is that a script may only execute for 30 CPU seconds.
If I use the nice command to 'nicefy' the process, does that break up the amount of total CPU processing time? Is there a different command I could use in place of nice? Or will I have to use a different compression utility that doesn't take as long?
Thanks!
EDIT: This is what their support page says:
Run any process that requires more
than 16MB of memory space.
Run any
program that requires more than 30
CPU seconds to complete.
EDIT: I run this in a bash script from the command line
nice will change the process' priority, and thus will get its CPU seconds sooner (or later), so if the rule is really about CPU seconds as you state in your question, nice will not serve you at all, it'll just be killed at a different time.
As for a solution, you may try splitting the file in three 30 MB pieces (see split(1)) which you can compress in the allotted time. Then you uncompress and use cat to put the pieces together. Depending on if it's a binary or text you can use the -l or -b arguments to split.
nice won't help you - the amount of CPU seconds will still be the same, no matter how many actual seconds it takes.
You have to find the compromiss between compression ratio and CPU consumption. There are -1 ... -9 options to bzip2 - try to "tune" it (-1 is a fastest). Another way is to consult with your provider - may be it is possible to grant a special permissions to your script to run longer.
No, nice will only affect how your process is scheduled. Put simply, a process that takes 30 CPU seconds will always take 30 CPU seconds even if it's preempted for hours.
I always get a thrill when I load up all the cores of my machine with some hefty processing but have them all niced. I love seeing the CPU monitor maxed out while I surf the web without any noticeable lag.