Reducing CPU processing time by using nice?

Reducing CPU processing time by using nice? - linux

My hosting provider (pairNetworks) has certain rules for scripts run on the server. I'm trying to compress a file for backup purposes, and would ideally like to use bzip2 to take advantage of its AWESOME compression rate. However, when trying to compress this 90 MB file, the process sometimes runs upwards of 1.5 minutes. One of the resource rules is that a script may only execute for 30 CPU seconds.
If I use the nice command to 'nicefy' the process, does that break up the amount of total CPU processing time? Is there a different command I could use in place of nice? Or will I have to use a different compression utility that doesn't take as long?
Thanks!
EDIT: This is what their support page says:
Run any process that requires more
than 16MB of memory space.
Run any
program that requires more than 30
CPU seconds to complete.
EDIT: I run this in a bash script from the command line

nice will change the process' priority, and thus will get its CPU seconds sooner (or later), so if the rule is really about CPU seconds as you state in your question, nice will not serve you at all, it'll just be killed at a different time.
As for a solution, you may try splitting the file in three 30 MB pieces (see split(1)) which you can compress in the allotted time. Then you uncompress and use cat to put the pieces together. Depending on if it's a binary or text you can use the -l or -b arguments to split.

nice won't help you - the amount of CPU seconds will still be the same, no matter how many actual seconds it takes.

You have to find the compromiss between compression ratio and CPU consumption. There are -1 ... -9 options to bzip2 - try to "tune" it (-1 is a fastest). Another way is to consult with your provider - may be it is possible to grant a special permissions to your script to run longer.

No, nice will only affect how your process is scheduled. Put simply, a process that takes 30 CPU seconds will always take 30 CPU seconds even if it's preempted for hours.
I always get a thrill when I load up all the cores of my machine with some hefty processing but have them all niced. I love seeing the CPU monitor maxed out while I surf the web without any noticeable lag.

Related

Estimate Core capacity required based on load?

I have quad core ubuntu system. say If I see the load average as 60 in last 15 mins during peak time. Load average goes to 150 as well.
This loads happens generally only during peak time. Basically I want to know if there is any standard formula to derive the number of cores ideally required to handle the given load ?
Objective :-
If consider the load as 60 then it means 60 task were in queue on an average at any point of time in last 15 mins ? Adding cpu can help me to server the
request faster or save system from hang or crashing .

Linux load average (as printed by uptime or top) includes tasks in I/O wait, so it can have very little to do with CPU time that could potentially be used in parallel.
If all the tasks were purely CPU bound, then yes 150 sustained load average would mean that potentially 150 cores could be useful. (But if it's not sustained, then it might just be a temporary long queue that wouldn't get that long if you had better CPU throughput.)
If you're getting crashes, that's a huge problem that isn't explainable by high loads. (Unless it's from the out-of-memory killer kicking in.)
It might help to use vmstat or dstat to see how much CPU time is spent in user/kernel space when your load avg. is building up, or if it's probably mostly I/O.
Or of course you probably know what tasks are running on your machine, and whether one single task is I/O bound or CPU bound on an otherwise-idle machine. I/O throughput usually scales a bit positively with queue depth, except on magnetic hard drives when that turns sequential read/write into seek-heavy workloads.

How to optimize pigz?

I am using pigz to compress a large directory, which is nearly 50GB, I have an ec2 instance, with RedHat, the instance type is m4.xlarge, which has 4 CPUs, I am expecting the compression will eat up all my CPUs and have a better performance. but it didn't meet my expectation.
the command I am using:
tar -cf - lager-dir | pigz > dest.tar.gz
But when the compress is running, I use mpstat -P ALL to check my CPU status, the result shows a lot of %idle for other 3 CPUs, only nearly 2% are used by user space process for each CPU.
Also tried to use top to check that pigz only use less than 10% of the CPU.
Tried with -p 10 to increase the processes count, then it has a high usage for a few minutes, but dropped down when the output file reach to 2.7 GB.
So I have all CPU only used for the compression, I want to fully utilize all of my resources to gain the best performance, how can I get there?

If file compression apps aren't CPU bound, they are most likely sequential I/O bound.
You can investigate this further by using mpstat to look at the % of time the system is spending in iowait ('wa') using top or mpstat (check manpage for options if it isn't part of the default output).
If I'm right, most of the time the system isn't executing pigz is spent waiting on I/O.
You can also investigate this further using iostat, which can show disk IO. The ratio between reads and writes will vary over time depending on how compressible the input is at that moment, but combined IO should be fairly consistent. This assumes that amazon's storage provisioning provides consistent I/O now, something that didn't used to be the case.

Netlogo HPC CPU Percentage Use Increase

I submit jobs using headless NetLogo to a HPC server by the following code:
#!/bin/bash
#$ -N r20p
#$ -q all.q
#$ -pe mpi 24
/home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \
--model /home/abhishekb/models/corrected-rk4-20presults.nlogo \
--experiment test \
--table /home/abhishekb/csvresults/corrected-rk4-20presults.csv
Below is the snapshot of a cluster queue using:
qstat -g c
I wish to know can I increase the CQLOAD for my simulations and what does it signify too. I couldn't find an elucidate explanation online.
CPU USAGE CHECK:
qhost -u abhishekb
When I run the behaviour space on my PC through gui assigning high priority to the task makes it use nearly 99% of the CPU which makes it run faster. It uses a greater percentage of CPU processor. I wish to accomplish the same here.
EDIT:
EDIT 2;

A typical HPC environment, is designed to run only one MPI process (or OpenMP thread) per CPU core, which has therefore access to 100% of CPU time, and this cannot be increased further. In contrast, on a classical desktop/server machine, a number of processes compete for CPU time, and it is indeed possible to increase performance of one of them by setting the appropriate priorities with nice.
It appears that CQLOAD, is the mean load average for that computing queue. If you are not using all the CPU cores in it, it is not a useful indicator. Besides, even the load average per core for your runs just translates the efficiency of the code on this HPC cluster. For instance, a value of 0.7 per core, would mean that the code spends 70% of time doing calculations, while the remaining 30% are probably spent waiting to communicate with the other computing nodes (which is also necessary).
Bottom line, the only way you can increase the CPU percentage use on an HPC cluster is by optimising your code. Normally though, people are more concerned about the parallel scaling (i.e. how the time to solution decreases with the number of CPU cores) than with the CPU percentage use.

1. CPU percentage load
I agree with #rth answer regards trying to use linux job priority / renice to increase CPU percentage - it's
almost certain not to work
and, (as you've found)
you're unlikely to be able to do it as you won't have super user priveliges on the nodes (It's pretty unlikely you can even log into the worker nodes - probably only the head node)
The CPU usage of your model as it runs is mainly a function of your code structure - if it runs at 100% CPU locally it will probably run like that on the node during the time its running.
Here are some answers to the more specific parts of your question:
2. CQLOAD
You ask
CQLOAD (what does it mean too?)
The docs for this are hard to find, but you link to the spec of your cluster, which tells us that the scheduling engine for it is Sun's *Grid Engine". Man pages are here (you can access them locally too - in particular typing man qstat)
If you search through for qstat -g c, you will see the outputs described. In particular, the second column (CQLOAD) is described as:
OUTPUT FORMATS
...
an average of the normalized load average of all queue
hosts. In order to reflect each hosts different signifi-
cance the number of configured slots is used as a weight-
ing factor when determining cluster queue load. Please
note that only hosts with a np_load_value are considered
for this value. When queue selection is applied only data
about selected queues is considered in this formula. If
the load value is not available at any of the hosts '-
NA-' is printed instead of the value from the complex
attribute definition.
This means that CQLOAD gives an indication of how utilized the processors are in the queue. Your output screenshot above shows 0.84, so this indicator average load on (in-use) processors in all.q is 84%. This doesn't seem too low.
3. Number of nodes reserved
In a related question, you state colleagues are complaining that your processes are not using enough CPU. I'm not sure what that's based on, but I wonder the real problem here is that you're reserving a lot of nodes (even if just for a short time) for a job that they can see could work with fewer.
You might want to experiment with using fewer nodes (unless your results are very slow) - that is achieved by altering the line #$ -pe mpi 24 - maybe take the number 24 down. You can work out how many nodes you need (roughly) by timing how long 1 model run takes on your computer and then use
N = ((time to run 1 job) * number of runs in experiment) / (time you want the run to take)

So you want to make to make your program run faster on linux by giving it a higher priority than all other processes?
In that case you have to modify something called the program's niceness. This is normally done by invoking the command nice when you first start the program or the command renice while the program is already running. A process can have a niceness from -20 to 19 (inclusive) where lower values give the process a higher priority. Due to security reasons, you can only decrease a processes' niceness if you are the super user (root).
So if you want to make a process run with higher priority then from within bash do
[abhishekb#hpc ~]$ start_process &
[abhishekb#hpc ~]$ jobs -x sudo renice -n -20 -p %+
Or just use the last command and replace the %+ with the process id of the process you want to increase the priority for.

Linux acceptable load average

I have a linux dedicated server machine(8cores 8gbRAM) where i run some crawler php scripts. The load on the system ends up being arround 200, which sounds a lot. Since i am not using the machine to host content, what could be the sideeffects of such high level of load for the purposes stated above.

Machines were made to work so there are no issues with high load average, per se. But, a high load average can be an indicator of a performance issue, often. Such investigation is usually application specific, but here is a very general guideline:
Since load average is a combined metric of (CPU, IO .. etc) you want to examine all separately. I would start with making sure the machine is not thrashing, by checking swap usage (vmstat comes in handy), and disk performance (using iostat). You may also check if your operations are CPU intensive.

You should read your load average value as a 3 component value (1 minute load, 5 minutes load and 15 minutes load respectively).
Take a look at the example taken from Wiki:
For example, one can interpret a load average of "1.73 0.60 7.98" on a single-CPU system as:
during the last minute, the system was overloaded by 73% on average (1.73 runnable processes, so that 0.73 processes had to wait for a turn for a single CPU system on average).
during the last 5 minutes, the CPU was idling 40% of the time on average.
during the last 15 minutes, the system was overloaded 698% on average (7.98 runnable processes, so that 6.98 processes had to wait for a turn for a single CPU system on average).
Full article
Please note that this value depends on the resources of your machine.
Cheers!

How to print execution stack of threads while they are consuming lots of CPU?

Does anybody know how to achive the following task.
Application sometimes eats lots of CPU, ProcessExplorer (procexp.exe) shows periodical high kernel CPU load (~60-80). I see in procexp that some threads do something that consumes lots of kernel time. In that moment I would like to print execution stack of those busy threads.
Is there any monitoring tool that can show that kind of information or some WinDbg script, etc?

I would suggest using ProcDump.
A command like:
procdump -c 60 -s 3 -ma -n 5 -x Your.exe your.dmp
Which will take a full memory dump when the process exceeds 60% CPU utilization for 3 consecutive seconds and do that up to 5 times. This way you can compare the different dumps and see where the process is spending its time.

One opportunity is to use the ProcDump from sysinternals to take dumps when
CPU load exceed a limit you specify.
http://technet.microsoft.com/en-us/sysinternals/dd996900
Or you can look in the windbg help for “Tracking Down a Processor Hog”

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string