Run TBB code on specific number of cpu - linux

I'm running a tbb code on linux and I want to run my code using a portion of my CPU (2 out of 8). Is there another way except disabling cores.

taskset(1) allows you to run a command on a specific subset of cores on the system.
taskset -c 0,1 ./a.out

TBB respects the process affinity mask (on Linux: affinity mask of the [main] thread where TBB was initialized for the first time). So that by default, it will create worker threads in the quantity that takes into account the number of bits set to 1 in the affinity mask. For example, use taskset or numactl for setting the affinity mask. E.g.:
numactl --physcpubind=1,2 path/application arg1 arg2
It is like disabling the cores but for a specific process only.
You can also control the number of threads in the code using old tbb::task_scheduler_init or new tbb::global_control API. But it will not assign affinity mask to TBB threads, it just changes the number of threads.
If you want to manually assign affinity mask to the worker threads that TBB creates, derive you own class from tbb::task_scheduler_observer in order to define your custom actions for worker threads creation as described in this blog.

Related

linux taskset: Does a thread of a multi-thread process always run on a particular core?

I use the taskset to set a multi-thread process to run on a Linux host as below:
task -c 1,2 ./myprocess
Will a particular thread always run on a particular CPU, for example, thread 1 always run on c1? or it will run on c1 or c2 at different times?
No, the filter is applied to the whole process and threads can move between (the restricted list of) cores. If you want threads not to move, then you need set the affinity of each thread separately (eg. using pthread_setaffinity_np for example). Note that you can check the affinity of threads of a given process with the great hwloc tool (hwloc-ps -t).
Note that some libraries/frameworks have ways to do that more easily. This is the case for OpenMP programs where you can use environment variables like OMP_PLACES to set the affinity of each thread.

How to bind a process to a set of cpu in golang?

I use os/exec pkg to have a process run. I want to check it cpu affinity and modify it to bind the process to a specific cpu set. I find
func SchedSetaffinity(pid int, set *CPUSet) error
This function is in golang.org/x/sys/unix package. However, it says it just bind a thread to a specific cpu. I don't know wheter it works on process. And I wonder how to get the CPUSet. Is it a value I need to define?
Taskset : To enable a process run on a specific CPU, you use the command 'taskset' in linux. Accordingly you can arrive on a logic based on "taskset -p [mask] [pid]" where the mask represents the cores in which the particular process shall run, provided the whole program runs with GOMAXPROCS=1.
pthread_setaffinity_np : You can use cgo and arrive on a logic that calls pthread_setaffinity_np, as Go uses pthreads in cgo mode. (The pthread_attr_setaffinity_np() function sets the CPU affinity mask attribute of the thread attributes object referred to by attr to the value specified in cpuset. )
Go helps in incorporation of affinity control via "SchedSetaffinity" that can be checked for confining a thread to specific cores. Accordingly , you can arrive on a logic for usage of "SchedSetaffinity(pid int, set *CPUSet)" that sets the CPU affinity mask of the thread specified by pid. If pid is 0 the calling thread is used.
It should be noted that GOMAXPROCS variable limits the number of operating system threads that can execute user-level Go code simultaneously. If it is > 1 then, you may use runtime.LockOSThread of Go that shall pin the current goroutine to the current thread that is is running on . The calling goroutine will always execute in that thread, and no other goroutine will execute in it, until the calling goroutine has made as many calls to UnlockOSThread as to LockOSThread.
cgroups : There is also option of using cgroups that helps in organizing the processes hierarchically and distribution of system resources along the hierarchy in a controlled and configurable manner. Here, there is subsystem termed as cpuset that enables assigning individual CPUs (on a multicore system) and memory nodes to process in a cgroup. The cpuset lists CPUs to be used by tasks within this cgroup. The CPU numbers are comma-separated numbers or ranges. For example:
#cat cpuset.cpus
0-4,6,8-10
A process is confined to run only on the CPUs in the cpuset it belongs to, and to allocate memory only on the memory nodes in that cpuset. It should be noted that all processes are put in the cgroup that the parent process belongs to at the time on creation and a process can be migrated to another cgroup. Migration of a process doesn't affect already existing descendant processes.

How to bind certain kernel threads to a given core?

I have a number of kernel threads that I want to get off of a given core for performance reasons. Some of these I am able to move using taskset however there are others I cannot.
In particular I see processes like migration, watchdog, rcuc, etc. that do not respond to my attempt to rebind them.
For example, if I try to rebind the watchdog process, I get the following:
# taskset -pc 0 207
pid 207's current affinity list: 0
sched_setaffinity: Invalid argument
failed to set pid 207's affinity.
How can I get these off of the cores so I can properly isolate them for performance reasons?
I suspect these processes are interfering with my full dynticks mode.
Several kernel threads are tied to a specific core, in order to effect capabilities needed by the SMP infrastructure, such as synchronization, interrupt handling and so on. The kworker, migration and ksoftirqd threads, for example, usually have one instance per virtual processor (e.g. 8 threads on a 4-core 8-thread CPU).
You cannot (and should not be able to) move those threads - without them that processor would not be fully usable by the system any more.
Why exactly do you want to move those threads anyway?

force scheduler to allocate thread to specific processor

Consider a case where we have multiple processor/cores and two threads. Is it possible to force the linux scheduler to always schedule the specific thread(both) to a specific processor at every instance of its execution. Is setting processor affinity to the threads, while creation, sufficient for this purpose
If you look at the man page for taskset you can see the following statement:
The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs.
This means that setting the CPU affinity for a particular process will make sure that it's always run on that CPU.
There exist API's that allow you to set thread affinity for particular threads, and I would imagine that this too will be honored by the OS scheduler.
If you look at sched_setaffinity you'll see a line that says:
These restrictions on the actual set of CPUs on which the process will run are silently imposed by the kernel.
which means this will make sure your threads are only run on CPU's set by this function.

Linux process and threads scheduling

I'm playing around with linux scheduling with sched.h and bump into some questions.
From what I know, Linux scheduler do not treat threads and processes differently when scheduling. Threads are just like process which shares common resources.
Ok, Say I have Process-A and it is set to CPU core 0 and SCHED_FIFO property is set; which will runs until another higher priority tasks kicks in. If Process-A creates a new thread, will the thread inherit the same property? (i.e. be bind to CPU 0 with SCHED_FIFO properties or will it be DEFAULT?)
Thanks!
You can actually test this with a simple program, but from various man pages:
sched_setaffinity:
A child created via fork(2) inherits its parent's CPU affinity mask.
The affinity mask is preserved across an execve(2).
pthread_create:
The new thread inherits copies of the calling thread's capability sets
(see
capabilities(7)) and CPU affinity mask (see sched_setaffinity(2)).
sched_setscheduler:
Child processes inherit the scheduling policy and parameters across a
fork(2).
The scheduling policy and parameters are preserved across execve(2).

Resources