Assumptions:
RT Linux patch is applied
POSIX RT threads are available which can have core affinity applied
Using a multi-core processor, for simplicity a dual core with cores labelled A and B
No interrupts
In a system which is designed to be Real Time, at least Soft Real Time rather than Hard Real Time. I was curious if you could apply core affinity for the background linux tasks that can "disrupt" the real time tasks to a single core, core A, and leave the real time tasks to run on the other core, B.
This way the linux kernel can execute its low priority tasks whilst being able to perform real time task execution.
This is quite a simplistic explanation but I'm curious simply if it is possible to do so. Granted there is other factors at play which can reduce determinism within a system such as interrupts but assumption removes those from this current scenario.
Any guidance to material or corrections on this question are welcome
CPUSETS
A very easy approach to isolate cores inside Linux is to use CPUSETS.
See the excellent documentation there for further information.
The following example script assumes a cpu consisting of four cpu-cores. It will use cores 0-2 for all available tasks and reserve core 3 for tasks that are specifically attached to that core.
This is done by creating two cpusets "rt0" and "system".
Global load-balancing gets disabled and only reenabled on the "system" set, which makes the remaining core inaccessible to the load-balancer.
To attach tasks to the reserved core you have to write its pid to the corresponding tasks file.
$ mkdir /dev/cpuset
$ mount -t cgroup -o cpuset cpuset /dev/cpuset
$ cd /dev/cpuset
$ /bin/echo 1 > cpuset.cpu_exclusive
$ /bin/echo 0 > cpuset.sched_load_balance
$ mkdir rt0
$ /bin/echo 3 > rt0/cpuset.cpus
$ /bin/echo 0 > rt0/cpuset.mems
$ /bin/echo 1 > rt0/cpuset.cpu_exclusive
$ /bin/echo 0 > rt0/cpuset.sched_load_balance
$ /bin/echo $RT_PROC_PID > rt0/tasks
$ mkdir system
$ /bin/echo 0-2 > system/cpuset.cpus
$ /bin/echo 0 > system/cpuset.mems
$ /bin/echo 1 > system/cpuset.cpu_exclusive
$ /bin/echo 1 > system/cpuset.sched_load_balance
$ for pid in $(cat tasks); do /bin/echo $pid > system/tasks; done
Systemd slices
A newer approach to deal with resource partitioning is systemd-slices. As far as I know it only works with Control Group v2 and obviously systemd. Since I did never really work with it I cannot provide further information here, but I thought it is worth mentioning.
I am trying to parallelize a python program (program_to_parallelize.py) into 16 subprocesses on my 16 cores machine. I use this code, which is part of a Python script :
import subprocess
subprocess.call("mpiexec -n 16 python program_to_parallelize.py", shell=True)
This runs without any error but when I look at CPUs usage, I see that all subprocesses are running on one single CPU. (Click
here to see what I get when typing "top 1" in command line) But I would prefer that the 16 processes each take 100% of one CPU rather than all sharing the first one.
I am working on a 16 cores Ubuntu 16.04.6 LTS.
I use version 3.0.3 of mpi4py
I use version 3.3.2 of mpiexec
I figured it out actually. One solution is to bind each process to a CPU after starting the execution. To do this, you can use this command :
taskset -pc [CPU number] [process ID]
for example :
taskset -pc 2 3039
You can find more details about how to assign a process to a CPU on this website : https://www.hecticgeek.com/2012/03/assign-process-cpu-ubuntu-linux/
I have the following code in a bash script:
echo "bash pid => $$";
echo "processor affinity before => $(taskset -p $$)"
taskset -cp ${AN_INTEGER} $$
echo "processor affinity after => $(taskset -p $$)"
I get this output:
processor affinity before => pid 5047's current affinity mask: ff
pid 5047's current affinity list: 0-7
pid 5047's new affinity list: 1
processor affinity after => pid 5047's current affinity mask: 2
does anyone know what this means?
The reason I started messing with processor affinity is because I would launch multiple bash child processes, and all the bash child process affinities had the value "ff" so it seemed like they were all targeting the same CPU.
The affinity mask controls the set of processors that a process may run on - not a single specific processor. Bits that are a 1 in this mask mean represent a processor that the process can run on. Since you specified that you want this process run on only CPU 1, the affinity mask is now 0b00000010, or 2.
I am trying to run several shell commands in a parallel way without making background processes using "&".
Also, I want to assign one job to one CPU (in a fair way)
For example, if I have four cores,
I want assign four cmd1 to cmd4 as follows:
CPU #1: cmd1
CPU #2: cmd2
CPU #3: cmd3
CPU #4: cmd4
Could you please let me know ways doing that?
I've found "parallel" command, but I could not figure out how to use it.
Also, I've tried the following command: ./cmd1 | ./cmd2 | ./cmd3 | ./cmd4
It seems like that four commands (cmd1 to cmd4) are running in parallel, but I am not sure the jobs are assigned to cores as I said in the above.
Thank you!
Sorry. I am running the commands on linux.
First, if you want processes to be executed in parallel, they have to be background jobs. What do have you against using &?
Second, you can use taskset to bind a process to a CPU core, or a set of cores. For example:
taskset -c 0 cmd1 &
taskset -c 1 cmd2 &
taskset -c 2 cmd3 &
taskset -c 3 cmd4 &
This might not be a good idea though; if one process is idle for long periods of time the other 3 cannot use the core it's assigned to.
It is possible to use sched_setaffinity to pin a thread to a cpu, increasing performance (in some situations)
From the linux man page:
Restricting a process to run on a single CPU also avoids the
performance cost caused by the cache invalidation that occurs when a
process ceases to execute on one CPU and then recommences execution on
a different CPU
Further, if I desire a more real-time response, I can change the scheduler policy for that thread to SCHED_FIFO, and up the priority to some high value (up to sched_get_priority_max), meaning the thread in question should always pre-empt any other thread running on its cpu when it becomes ready.
However, at this point, the thread running on the cpu which the real-time thread just pre-empted will possibly have evicted much of the real-time thread's level-1 cache entries.
My questions are as follows:
Is it possible to prevent the scheduler from scheduling any threads onto a given cpu? (eg: either hide the cpu completely from the scheduler, or some other way)
Are there some threads which absolutely have to be able to run on that cpu? (eg: kernel threads / interrupt threads)
If I need to have kernel threads running on that cpu, what is a reasonable maximum priority value to use such that I don't starve out the kernel threads?
The answer is to use cpusets. The python cpuset utility makes it easy to configure them.
Basic concepts
3 cpusets
root: present in all configurations and contains all cpus (unshielded)
system: contains cpus used for system tasks - the ones which need to run but aren't "important" (unshielded)
user: contains cpus used for "important" tasks - the ones we want to run in "realtime" mode (shielded)
The shield command manages these 3 cpusets.
During setup it moves all movable tasks into the unshielded cpuset (system) and during teardown it moves all movable tasks into the root cpuset.
After setup, the subcommand lets you move tasks into the shield (user) cpuset, and additionally, to move special tasks (kernel threads) from root to system (and therefore out of the user cpuset).
Commands:
First we create a shield. Naturally the layout of the shield will be machine/task dependent. For example, say we have a 4-core non-NUMA machine: we want to dedicate 3 cores to the shield, and leave 1 core for unimportant tasks; since it is non-NUMA we don't need to specify any memory node parameters, and we leave the kernel threads running in the root cpuset (ie: across all cpus)
$ cset shield --cpu 1-3
Some kernel threads (those which aren't bound to specific cpus) can be moved into the system cpuset. (In general it is not a good idea to move kernel threads which have been bound to a specific cpu)
$ cset shield --kthread on
Now let's list what's running in the shield (user) or unshielded (system) cpusets: (-v for verbose, which will list the process names) (add a 2nd -v to display more than 80 characters)
$ cset shield --shield -v
$ cset shield --unshield -v -v
If we want to stop the shield (teardown)
$ cset shield --reset
Now let's execute a process in the shield (commands following '--' are passed to the command to be executed, not to cset)
$ cset shield --exec mycommand -- -arg1 -arg2
If we already have a running process which we want to move into the shield (note we can move multiple processes by passing a comma separated list, or ranges (any process in the range will be moved, even if there are gaps))
$ cset shield --shield --pid 1234
$ cset shield --shield --pid 1234,1236
$ cset shield --shield --pid 1234,1237,1238-1240
Advanced concepts
cset set/proc - these give you finer control of cpusets
Set
Create, adjust, rename, move and destroy cpusets
Commands
Create a cpuset, using cpus 1-3, use NUMA node 1 and call it "my_cpuset1"
$ cset set --cpu=1-3 --mem=1 --set=my_cpuset1
Change "my_cpuset1" to only use cpus 1 and 3
$ cset set --cpu=1,3 --mem=1 --set=my_cpuset1
Destroy a cpuset
$ cset set --destroy --set=my_cpuset1
Rename an existing cpuset
$ cset set --set=my_cpuset1 --newname=your_cpuset1
Create a hierarchical cpuset
$ cset set --cpu=3 --mem=1 --set=my_cpuset1/my_subset1
List existing cpusets (depth of level 1)
$ cset set --list
List existing cpuset and its children
$ cset set --list --set=my_cpuset1
List all existing cpusets
$ cset set --list --recurse
Proc
Manage threads and processes
Commands
List tasks running in a cpuset
$ cset proc --list --set=my_cpuset1 --verbose
Execute a task in a cpuset
$ cset proc --set=my_cpuset1 --exec myApp -- --arg1 --arg2
Moving a task
$ cset proc --toset=my_cpuset1 --move --pid 1234
$ cset proc --toset=my_cpuset1 --move --pid 1234,1236
$ cset proc --toset=my_cpuset1 --move --pid 1238-1340
Moving a task and all its siblings
$ cset proc --move --toset=my_cpuset1 --pid 1234 --threads
Move all tasks from one cpuset to another
$ cset proc --move --fromset=my_cpuset1 --toset=system
Move unpinned kernel threads into a cpuset
$ cset proc --kthread --fromset=root --toset=system
Forcibly move kernel threads (including those that are pinned to a specific cpu) into a cpuset (note: this may have dire consequences for the system - make sure you know what you're doing)
$ cset proc --kthread --fromset=root --toset=system --force
Hierarchy example
We can use hierarchical cpusets to create prioritised groupings
Create a system cpuset with 1 cpu (0)
Create a prio_low cpuset with 1 cpu (1)
Create a prio_met cpuset with 2 cpus (1-2)
Create a prio_high cpuset with 3 cpus (1-3)
Create a prio_all cpuset with all 4 cpus (0-3) (note this the same as root; it is considered good practice to keep a separation from root)
To achieve the above you create prio_all, and then create subset prio_high under prio_all, etc
$ cset set --cpu=0 --set=system
$ cset set --cpu=0-3 --set=prio_all
$ cset set --cpu=1-3 --set=/prio_all/prio_high
$ cset set --cpu=1-2 --set=/prio_all/prio_high/prio_med
$ cset set --cpu=1 --set=/prio_all/prio_high/prio_med/prio_low
There are two other ways I can think of doing this (though not as elegant as cset, which doesn't seem to have a fantastic level of support from Redhat):
1) Taskset everything including PID 1 - nice and easy (but, alledgly -- I've never seen any issues myself -- may cause inefficiencies in the scheduler). The script below (which must be run as root) runs taskset on all the running processes, including init (pid 1); this will pin all the running processes to one or more 'junk cores', and by also pinning init, it will ensure that any future processes are also started in the list of 'junk cores':
#!/bin/bash
if [[ -z $1 ]]; then
printf "Usage: %s '<csv list of cores to set as junk in double quotes>'", $0
exit -1;
fi
for i in `ps -eLfad |awk '{ print $4 } '|grep -v PID | xargs echo `; do
taskset -pc $1 $i;
done
2) use the isolcpus kernel parameter (here's the documentation from https://www.kernel.org/doc/Documentation/kernel-parameters.txt):
isolcpus= [KNL,SMP] Isolate CPUs from the general scheduler.
Format:
<cpu number>,...,<cpu number>
or
<cpu number>-<cpu number>
(must be a positive range in ascending order)
or a mixture
<cpu number>,...,<cpu number>-<cpu number>
This option can be used to specify one or more CPUs
to isolate from the general SMP balancing and scheduling
algorithms. You can move a process onto or off an
"isolated" CPU via the CPU affinity syscalls or cpuset.
<cpu number> begins at 0 and the maximum value is
"number of CPUs in system - 1".
This option is the preferred way to isolate CPUs. The
alternative -- manually setting the CPU mask of all
tasks in the system -- can cause problems and
suboptimal load balancer performance.
I've used these two plus the cset mechanisms for several projects (incidentally, please pardon the blatant self promotion :-)), I've just filed a patent for a tool called Pontus Vision ThreadManager that comes up with optimal pinning strategies for any given x86 platform to any given software work loads; after testing it in a customer site, I got really good results (270% reduction in peak latencies), so it's well worth doing pinning and CPU isolation.
Here's how to do it the old-fashioned way, using cgroups. I have a Fedora 28 machine and RedHat/Fedora want you to use systemd-run, but I wasn't able to find this functionality in there. I would love to know how to do it using systemd-run, if anyone would care to enlighten me.
Let's say I want to exclude my fourth CPU (of CPUs 0-3) from scheduling, and move all existing processes to CPUs 0-2. Then I want to put a process on CPU 3 all by itself.
sudo su -
cgcreate -g cpuset:not_cpu_3
echo 0-2 > /sys/fs/cgroup/cpuset/not_cpu_3/cpuset.cpus
# This "0" is the memory node. See https://utcc.utoronto.ca/~cks/space/blog/linux/NUMAMemoryInfo
# for more information *
echo 0 > /sys/fs/cgroup/cpuset/not_cpu_3/cpuset.mems
Specifically, on your machine you'll want to review /proc/zoneinfo and the /sys/devices/system/node heirarchy. Getting the proper node information is left as an exercise for the reader.
Now that we have our cgroup, we need to create our isolated CPU 3 cgroup:
cgcreate -g cpuset:cpu_3
echo 3 > /sys/fs/cgroup/cpuset/cpu_3/cpuset.cpus
# Again, the memory node(s) you want to specify.
echo 0 > /sys/fs/cgroup/cpuset/cpu_3/cpuset.mems
Put all processes/threads on the not_cpu_3 cgroup:
for pid in $(ps -eLo pid) ; do cgclassify -g cpuset:not_cpu_3 $pid; done
Review:
ps -eL k psr o psr,pid,tid,args | sort | cut -c -80
NOTE! Processes currently in sleep will not move. They must be awakened so that the scheduler will put them on a different CPU. To see this, choose your favorite sleeping process in the above list- a process, say a web browser, that you thought should be on CPU 0-2 but it's still on 3. Using its thread ID from the above list, perform:
kill -CONT <thread_id>
example
kill -CONT 9812
Rerun the ps command, and note that it's moved to another CPU.
DOUBLE NOTE! Some kernel threads cannot and will not move! For example, you may note that every CPU has a kernel thread [kthreadd] on it. Assigning processes to cgroups works for userspace processes, not for kernel threads. This is life in the multitasking world.
Now to move a process and all its children to control group cpu_3:
pid=12566 # for example
cgclassify -g cpuset:cpu_3 $pid
taskset -c -p 3 $pid
Again, if $pid is sleeping, you'll need to wake it up for the CPU move to actually take place.
To undo all of this, simply delete the cgroups you've created. Everybody will be stuck back into the root cgroup:
cgdelete -r cpuset:cpu_3
cgdelete -r cpuset:not_cpu_3
No need to reboot.
(Sorry, I don't understand the 3rd question from the original poster. I can't comment on that.)
If you are using rhel instance you can use Tuna for this (May be available for other linux distros also, but not sure about that). It can easily installed from yum command. Tuna can be used to isolate a cpu core and it dynamically moves processes run in that particular cpu to neighboring cpu. The command to isolate a cpu core is as follow,
# tuna --cpus=CPU-LIST --isolate
You can use htop to see how tuna isolate the cpu cores in real-time.