Prerequisites
I have a (physical) server running multiple (virtual) servers. There are 11 servers in total, number 0 through 9 are invoked by
servinit XXXXn
Where XXXXn is the port number and n is the server number. The other server is invoked by
apiinit
And runs on port 8080. In conclusion, there are 11 processes, 10 with the binary name servinit and one with apiinit.
Goal
The servinit processes must always be responsive, in other words, the apiinit process must never consume all CPU time. I want to limit the total CPU time for apiinit to a percentage number, lets say 90, so that the servinit processes always have 10 percent CPU headroom to operate flawlessly.
What is the most efficient way of handling this?
Software
The physical server runs
Ubuntu Desktop
Release 12.04 (precise) 64-bit
Kernel: 3.14.32-xxxx-std-ipv6-64
Since you run a 3.14+ Linux kernel you can easily constrain the CPU share of a running application through the SCHED_DEADLINE policy. This policy allows you to set the CPU share of an application by setting a budget and a period (the emaining is that the application is not allowed to consume more than its budget on a period of time). For example, if budget is 3msec and period is 10msec, the application can at most consume at most 30% of the CPU. In particular, the system will guarantee 3msec every 10msec.
Solved using the optional package cpulimit.
Example, limiting apiinit to 50 percent CPU time (on a dual-core CPU):
sudo cpulimit -e apiinit -l 100 &
Related
I am running Windriver Linux on a MIPS (octeon) based hardware.
Linux runs on 16 cores and we have koftirqd/0 to ksoftirq/15 running.
I observe the following behavior of load balancing on high incoming traffic ( like ping flood):
First, kostfirqd/0 takes all the load until it reaches some where around 96-97% of cpu.
Once cpu0 reaches 96-97% of usage, koftirqd/1 starts taking load and % of CPU for cpu1 starts increasing.
On more traffic being pumped in, cpu 1 reaches 96 -97% and cpu2 starts taking load. And it goes on till ksoftirqd/15 takes 96-97% as the incoming traffic increases.
Is this an expected behaviour?
Could you please let me know whether it is the default linux behavior or a possible improvement done by Windriver.
Thanks a lot,
Vasudev
Cavium Mips ethernet driver has the logic to send inter processor interrupt to other cores to take the load given the conditions.
When ever backlog crosses certain limit, then IPI is sent to other cores. And the handler for the IPI in turn is nothing but the NAPI poll logic.
Hence the behavior.
Here is my situation:
my company need to run tests on tons of test samples. But if we start a single process on a windows PC machine, this test could last for hours, even days. so we try to split the test set and start a process to test each one of the slices on a multi-core linux server.
we expect a linear performance improvement for the server solution, but the truth is we could only observe a 2~3 times improvement when the test task finished by 10~20 processes.
I tried several means to locate the problem:
disable hyper-threading;
use max-performance power policy
use taskset to pin each process on different core
but no luck, the problem remains.
Why does this happen? which is the root cause, our code, OS or hardware?
here is the info of my pc and server:
PC: os: win10; cpu: i5-4570, 2 physical core; mem : 16gb
server: os: redhat 6.5 cpu: E5-2630 v3, 2 physical core; mem : 32gb
Edit:
About CPU: the server has 2 processors, and each of them has 8 physical cores. check this link for more information.
About My Test: it's handwriting recognition related(that's why it's a cpu-sensitive task).
About IO: the performance check points do not involve much IO if logging doesn't count.
we expect a linear performance improvement for the server solution,
but the truth is we could only observe a 2~3 times improvement when
the test task finished by 10~20 processes.
This seems very logical considering there are only 2 cores on the system. Starting 10-20 processes will only add some overhead due to task switching.
Also, I/O could be a bottleneck here too, if multiple processes are reading from disk at the same time.
Ideally, the number of running threads should not exceed 2 x the number of cores.
A program I'm working on needs to process certain objects upon arrival from network in real-time. The throughput is good, but I have occasional drops in the input queue due to unexpected delays.
My analysis shows that most probably the source of the delay is outside my program; something like another process being scheduled on my process's CPU core (I set the affinity of the process to a certain core) or a hardware interrupt arriving (perhaps a network interrupt).
My problem is I don't know the source of the delay for sure. Is there a tool or a method to find how a CPU core was used exactly during a certain period of time? (Like for example telling me that core 0 was used by process 19494 99.1 percent of the time, process 20001 0.8 percent of the time and process 8110 0.1 percent of the time.)
I use Ubuntu 14.04 Server Edition on an HP server with a Xeon CPU.
could be CPU, diskspeed, networkspeed or memory.
Memory usage and CPU is easy to spot using htop . (use the sort option, F6)
HD speed could be an issue. for example if you use low-energy disks (they slow down when not in use). Do you have a database running on the same system?
use iotop , it might give a clue.
I submit jobs using headless NetLogo to a HPC server by the following code:
#!/bin/bash
#$ -N r20p
#$ -q all.q
#$ -pe mpi 24
/home/abhishekb/netlogo/netlogo-5.1.0/netlogo-headless.sh \
--model /home/abhishekb/models/corrected-rk4-20presults.nlogo \
--experiment test \
--table /home/abhishekb/csvresults/corrected-rk4-20presults.csv
Below is the snapshot of a cluster queue using:
qstat -g c
I wish to know can I increase the CQLOAD for my simulations and what does it signify too. I couldn't find an elucidate explanation online.
CPU USAGE CHECK:
qhost -u abhishekb
When I run the behaviour space on my PC through gui assigning high priority to the task makes it use nearly 99% of the CPU which makes it run faster. It uses a greater percentage of CPU processor. I wish to accomplish the same here.
EDIT:
EDIT 2;
A typical HPC environment, is designed to run only one MPI process (or OpenMP thread) per CPU core, which has therefore access to 100% of CPU time, and this cannot be increased further. In contrast, on a classical desktop/server machine, a number of processes compete for CPU time, and it is indeed possible to increase performance of one of them by setting the appropriate priorities with nice.
It appears that CQLOAD, is the mean load average for that computing queue. If you are not using all the CPU cores in it, it is not a useful indicator. Besides, even the load average per core for your runs just translates the efficiency of the code on this HPC cluster. For instance, a value of 0.7 per core, would mean that the code spends 70% of time doing calculations, while the remaining 30% are probably spent waiting to communicate with the other computing nodes (which is also necessary).
Bottom line, the only way you can increase the CPU percentage use on an HPC cluster is by optimising your code. Normally though, people are more concerned about the parallel scaling (i.e. how the time to solution decreases with the number of CPU cores) than with the CPU percentage use.
1. CPU percentage load
I agree with #rth answer regards trying to use linux job priority / renice to increase CPU percentage - it's
almost certain not to work
and, (as you've found)
you're unlikely to be able to do it as you won't have super user priveliges on the nodes (It's pretty unlikely you can even log into the worker nodes - probably only the head node)
The CPU usage of your model as it runs is mainly a function of your code structure - if it runs at 100% CPU locally it will probably run like that on the node during the time its running.
Here are some answers to the more specific parts of your question:
2. CQLOAD
You ask
CQLOAD (what does it mean too?)
The docs for this are hard to find, but you link to the spec of your cluster, which tells us that the scheduling engine for it is Sun's *Grid Engine". Man pages are here (you can access them locally too - in particular typing man qstat)
If you search through for qstat -g c, you will see the outputs described. In particular, the second column (CQLOAD) is described as:
OUTPUT FORMATS
...
an average of the normalized load average of all queue
hosts. In order to reflect each hosts different signifi-
cance the number of configured slots is used as a weight-
ing factor when determining cluster queue load. Please
note that only hosts with a np_load_value are considered
for this value. When queue selection is applied only data
about selected queues is considered in this formula. If
the load value is not available at any of the hosts '-
NA-' is printed instead of the value from the complex
attribute definition.
This means that CQLOAD gives an indication of how utilized the processors are in the queue. Your output screenshot above shows 0.84, so this indicator average load on (in-use) processors in all.q is 84%. This doesn't seem too low.
3. Number of nodes reserved
In a related question, you state colleagues are complaining that your processes are not using enough CPU. I'm not sure what that's based on, but I wonder the real problem here is that you're reserving a lot of nodes (even if just for a short time) for a job that they can see could work with fewer.
You might want to experiment with using fewer nodes (unless your results are very slow) - that is achieved by altering the line #$ -pe mpi 24 - maybe take the number 24 down. You can work out how many nodes you need (roughly) by timing how long 1 model run takes on your computer and then use
N = ((time to run 1 job) * number of runs in experiment) / (time you want the run to take)
So you want to make to make your program run faster on linux by giving it a higher priority than all other processes?
In that case you have to modify something called the program's niceness. This is normally done by invoking the command nice when you first start the program or the command renice while the program is already running. A process can have a niceness from -20 to 19 (inclusive) where lower values give the process a higher priority. Due to security reasons, you can only decrease a processes' niceness if you are the super user (root).
So if you want to make a process run with higher priority then from within bash do
[abhishekb#hpc ~]$ start_process &
[abhishekb#hpc ~]$ jobs -x sudo renice -n -20 -p %+
Or just use the last command and replace the %+ with the process id of the process you want to increase the priority for.
I have a linux dedicated server machine(8cores 8gbRAM) where i run some crawler php scripts. The load on the system ends up being arround 200, which sounds a lot. Since i am not using the machine to host content, what could be the sideeffects of such high level of load for the purposes stated above.
Machines were made to work so there are no issues with high load average, per se. But, a high load average can be an indicator of a performance issue, often. Such investigation is usually application specific, but here is a very general guideline:
Since load average is a combined metric of (CPU, IO .. etc) you want to examine all separately. I would start with making sure the machine is not thrashing, by checking swap usage (vmstat comes in handy), and disk performance (using iostat). You may also check if your operations are CPU intensive.
You should read your load average value as a 3 component value (1 minute load, 5 minutes load and 15 minutes load respectively).
Take a look at the example taken from Wiki:
For example, one can interpret a load average of "1.73 0.60 7.98" on a single-CPU system as:
during the last minute, the system was overloaded by 73% on average (1.73 runnable processes, so that 0.73 processes had to wait for a turn for a single CPU system on average).
during the last 5 minutes, the CPU was idling 40% of the time on average.
during the last 15 minutes, the system was overloaded 698% on average (7.98 runnable processes, so that 6.98 processes had to wait for a turn for a single CPU system on average).
Full article
Please note that this value depends on the resources of your machine.
Cheers!