I have a command which is checkReplace test.txt and would like to assign more CPU and memory to this line of command (which I wrote it in a shell script for automation). Since the test.txt could be huge (10GB etc), anyone here could kindly suggest me how to increase the CPU and memory allocation to this command so as to increase the writing speed of test.txt (I suppose). Kindly advice, thanks
You can start the command with higher CPU priority by simply starting it with the nice command, like so:
# nice -n -20 checkReplace test.txt
The highest possible priority is -20, with the default being 0. However, unless you're running other heavily CPU intensive processes, I can't imagine increasing CPU priority of a process will help it write to disk too much faster. That depends largely on hard drive write speeds, etc.
If checkReplace is a shell script then re-writing it in another scripting language (python, perl, ruby, ...) or in a real programming language would significantly improve it's speed. Shell scripts are mainly for simple tasks. But if you need heavy computations or I/O operations they are usually way to inefficient.
Related
New member here but long time Perl programmer.
I have a process that I run on a Windows machine that iterates through combinations of records from arrays/lists to identify a maximum combination, following a set of criteria.
On an old Intel i3 machine, an example would take about 45 mins to run. I purchased a new AMD Ryzen 7 machine that on benchmarks is about 7 or 8 times faster than the old machine. But the execution time was only reduced from 45 to 22 minutes.
This new machine has crazy processor capabilities, but it does not appear that Perl takes advantage of these.
Are there Perl settings or ways of coding to take advantage of all of the processor speed that I have on my new machine? Threads, etc?
thanks
Perl by default will only use a single thread and thus only a single CPU core. This means it will only use a small part of what current multi-core systems offer. It has the ability to make use of multiple threads though and thus multiple CPU core. But this needs to be explicitly done, i.e. the implementation needs to be adapted to make use of parallel execution. This can involve major changes to the algorithm used to solve your problem. And not all problems can be easily parallelized.
Apart from the Perl is not the preferred languages if performance is the goal. There is lots of overhead due to being a dynamically typed language and no explicitly control over memory allocation. Languages like C, C++ or Rust which are closer to the hardware start with significantly less overhead and then allow even more low-level control to further reuse overhead. But they don't magically parallelize either.
I have been trying to develop an application that would model a system using Graph Theory (See [1]) Graph theory, basically, can be used in order to model runnables in order to figure out their partitions (grouped runnables) and can be used in order to map them to cores.
In order to achieve this, we need many information. Since I dont know about how linux (Raspbian in particular for us) OS schedules everything in detail and I'm interested in finding out how our algorithm will improve core utilization, I thought I could obtain the information of processes and try to model them myself.
For that purpose, we need:
Instruction size, how many instructions CPU runs to complete the task (very important)
Memory needed for the process, physical memory and virtual memory
Core load for debugging the processes.
Read/write accesses, which process is it communicating with, is it a read or write acces, what kind of interface is it, and what is the instruction size and memory needed to read and/or write.
I think I can extract some of these information by using 'top' command in linux. It gives the core load, memory usage, virtual and physical memory. I also think I should mention that I'm intending to use 'taskset' in order to place processes to cores to see their information (See [2]).
Now the first question I have is how do I effectively obtain the instruction sizes, r/w accesses and the things I listed above?
Second question is there any possible way to see runnables of a process, i.e. simple functions it runs. And also their information and r/w accesses with each other? This question is simply about finding out a way to model a process itself, rather than the interactions of processes?
Any help is greately appreciated as it will help our open-source Multi-core platform research.
Thank you very much in advance.
[1] http://math.tut.fi/~ruohonen/GT_English.pdf
[2] To place a process to a core, I use:
pid = $(pgrep -u root -f $process_name -n)
sudo taskset -pc $core $pid &&
echo "Process $process_name with PID=$pid has been placed on core $core"
I want to know if my program was run in parallel over multiple cores. I can get the perf tool to report how many cores were used in the computation, but not if they were used at the same time (in parallel).
How can this be done?
You can try using the command
top
in another terminal while the program is running. It will show the usage of all the cores on your machine.
A few possible solutions:
Use htop on another terminal as your program is being executed. htop shows the load on each CPU separately, so on an otherwise idle system you'd be able to tell if more than one core is involved in executing your program.
It is also able to show each thread separately, and the overall CPU usage of a program is aggregated, which means that parallel programs will often show CPU usage percentages over 100%.
Execute your program using the time command or shell builtin. For example, under bash on my system:
$ dd if=/dev/zero bs=1M count=100 2>/dev/null | time -p xz -T0 > dev/null
real 0.85
user 2.74
sys 0.14
It is obvious that the total CPU time (user+sys) is significantly higher than the elapsed wall-clock time (real). That indicates the parallel use of multiple cores. Keep in mind, however, that a program that is either inefficient or I/O-bound could have a low overall CPU usage despite using multiple cores at the same time.
Use top and monitor the CPU usage percentage. This method is even less specific than time and has the same weakness regarding parallel programs that do not make full use of the available processing power.
I submit jobs using headless NetLogo to a HPC server by the following code:
#!/bin/bash
#$ -N r20p
#$ -q all.q
#$ -pe mpi 24
/home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \
--model /home/abhishekb/models/corrected-rk4-20presults.nlogo \
--experiment test \
--table /home/abhishekb/csvresults/corrected-rk4-20presults.csv
Below is the snapshot of a cluster queue using:
qstat -g c
I wish to know can I increase the CQLOAD for my simulations and what does it signify too. I couldn't find an elucidate explanation online.
CPU USAGE CHECK:
qhost -u abhishekb
When I run the behaviour space on my PC through gui assigning high priority to the task makes it use nearly 99% of the CPU which makes it run faster. It uses a greater percentage of CPU processor. I wish to accomplish the same here.
EDIT:
EDIT 2;
A typical HPC environment, is designed to run only one MPI process (or OpenMP thread) per CPU core, which has therefore access to 100% of CPU time, and this cannot be increased further. In contrast, on a classical desktop/server machine, a number of processes compete for CPU time, and it is indeed possible to increase performance of one of them by setting the appropriate priorities with nice.
It appears that CQLOAD, is the mean load average for that computing queue. If you are not using all the CPU cores in it, it is not a useful indicator. Besides, even the load average per core for your runs just translates the efficiency of the code on this HPC cluster. For instance, a value of 0.7 per core, would mean that the code spends 70% of time doing calculations, while the remaining 30% are probably spent waiting to communicate with the other computing nodes (which is also necessary).
Bottom line, the only way you can increase the CPU percentage use on an HPC cluster is by optimising your code. Normally though, people are more concerned about the parallel scaling (i.e. how the time to solution decreases with the number of CPU cores) than with the CPU percentage use.
1. CPU percentage load
I agree with #rth answer regards trying to use linux job priority / renice to increase CPU percentage - it's
almost certain not to work
and, (as you've found)
you're unlikely to be able to do it as you won't have super user priveliges on the nodes (It's pretty unlikely you can even log into the worker nodes - probably only the head node)
The CPU usage of your model as it runs is mainly a function of your code structure - if it runs at 100% CPU locally it will probably run like that on the node during the time its running.
Here are some answers to the more specific parts of your question:
2. CQLOAD
You ask
CQLOAD (what does it mean too?)
The docs for this are hard to find, but you link to the spec of your cluster, which tells us that the scheduling engine for it is Sun's *Grid Engine". Man pages are here (you can access them locally too - in particular typing man qstat)
If you search through for qstat -g c, you will see the outputs described. In particular, the second column (CQLOAD) is described as:
OUTPUT FORMATS
...
an average of the normalized load average of all queue
hosts. In order to reflect each hosts different signifi-
cance the number of configured slots is used as a weight-
ing factor when determining cluster queue load. Please
note that only hosts with a np_load_value are considered
for this value. When queue selection is applied only data
about selected queues is considered in this formula. If
the load value is not available at any of the hosts '-
NA-' is printed instead of the value from the complex
attribute definition.
This means that CQLOAD gives an indication of how utilized the processors are in the queue. Your output screenshot above shows 0.84, so this indicator average load on (in-use) processors in all.q is 84%. This doesn't seem too low.
3. Number of nodes reserved
In a related question, you state colleagues are complaining that your processes are not using enough CPU. I'm not sure what that's based on, but I wonder the real problem here is that you're reserving a lot of nodes (even if just for a short time) for a job that they can see could work with fewer.
You might want to experiment with using fewer nodes (unless your results are very slow) - that is achieved by altering the line #$ -pe mpi 24 - maybe take the number 24 down. You can work out how many nodes you need (roughly) by timing how long 1 model run takes on your computer and then use
N = ((time to run 1 job) * number of runs in experiment) / (time you want the run to take)
So you want to make to make your program run faster on linux by giving it a higher priority than all other processes?
In that case you have to modify something called the program's niceness. This is normally done by invoking the command nice when you first start the program or the command renice while the program is already running. A process can have a niceness from -20 to 19 (inclusive) where lower values give the process a higher priority. Due to security reasons, you can only decrease a processes' niceness if you are the super user (root).
So if you want to make a process run with higher priority then from within bash do
[abhishekb#hpc ~]$ start_process &
[abhishekb#hpc ~]$ jobs -x sudo renice -n -20 -p %+
Or just use the last command and replace the %+ with the process id of the process you want to increase the priority for.
How does a tool like Net-SNMP captures CPU usage?
And what would be the least intrusive way to do it under Linux?
Less intrusive in the way that doing so would consume the least amount of machine resources (both cpu and ram) in order to do it. Eventually the data will be saved into a file.
There is no other way to calculate the current CPU utilization than reading /proc except for the kernel itself. All common tools like ps, top etc. are also just reading /proc, either /proc/stat for an overall CPU usage or /proc/<pid>/stat for a per-process CPU usage. However as /proc is a virtual file system directly provided by the kernel the overhead for reading files in it is way smaller than for regular files.
If you don't want to read /proc yourself try to use a tool that does only little extra computations, like ps as mentioned by #deep.
Have you tried using the $top command?
in fact, here is a list of methods including the $top one, try these :)
http://www.cyberciti.biz/tips/how-do-i-find-out-linux-cpu-utilization.html
try this:
ps -eo pcpu,pid | less
This will show the CPU usage along with the PIDs