I boot my kernel
with isolcpus=3-7
and I want to run a thread on one of those isolated CPU cores.
Based on this idea, I do:
ctx->workq = create_singlethread_workqueue("my_work");
struct workqueue_attrs *attr = alloc_workqueue_attrs(GFP_KERNEL);
alloc_cpumask_var(&attr->cpumask, GFP_KERNEL);
cpumask_clear(attr->cpumask);
cpumask_set_cpu(5, attr->cpumask);
apply_workqueue_attrs(ctx->workq, attr);
INIT_WORK(&ctx->work, my_work);
But it's not working. The following code reports 0:
static void my_work(struct work_struct *w) {
printk("CPU is: %d\n", get_cpu());
put_cpu();
How can I run this workqueue thread on a specific core (if possible an isolated one)?
There is already one API schedule_work_on in the mainline kernel which you can use to run your workqueue thread on a specific core.
Few years ago I have used same API for the same purpose.
Have a look in the sample code.
static void
myworkmod_work_handler(struct work_struct *w)
{
printk(KERN_ERR "CPU is: %d\n", get_cpu());
pr_info("work %u jiffies\n", (unsigned)onesec);
put_cpu();
}
static int myworkmod_init(void)
{
onesec = msecs_to_jiffies(1000);
pr_info("module loaded: %u jiffies\n", (unsigned)onesec);
if (!wq)
wq = create_singlethread_workqueue("myworkmod");
if (wq)
queue_delayed_work_on(2,wq, &myworkmod_work, onesec); //2 CPU no
return 0;
}
In your case I think you are using the schedule_work API which always hold the default CPU number. That is why you are getting the CPU 0. So you have to try the below one:
schedule_work_on(cpu_nr, &ctx->work); //cpu_nr will the CPU no to be used.
Related
I am writing a linux device driver (on linux kernel version 5.10.11-v7l+) and I am using O'Reilly's Linux Device Drivers, second edition. In the section on blocking I/O, it explains the difference between interruptible_sleep_on and wait_event_interruptible and how the former is interruptible by a signal and the latter is interruptible by evaluating a condition. For my driver, I have an ISR that signals to a thread that the device is ready to be read from and it seems to me that interruptible_sleep_on would be preferable as it just sends a signal to the wait queue and doesn't need a condition variable to be set by the ISR and then cleared by the thread (which, at least it seems to me, would require another access control method for that variable).
But, interruptible_sleep_on is no longer in the kernel (at least in my version) and wait_event_interruptible is the only option. My question is: is there a way to duplicate the behavior of interruptible_sleep_on with wait_event_interruptible and essentially eschew using a condition variable and just rely on the signal being pushed to the wait queue?
Relevant code:
static int data_ready = 0;
static irqreturn_t irq_handler(int irq, void* dev_id)
{
static unsigned long flags = 0;
local_irq_restore(flags);
data_ready = 1;
wake_up_interruptible(&dataint);
return IRQ_HANDLED;
}
static int data_thread_fn(void* param)
{
allow_signal(SIGKILL);
while(!kthread_should_stop())
{
wait_event_interruptible(dataint, data_ready == 1);
data_lock(); //lock to share with main thread
data_buffer(); //read data from device
data_unlock();
wake_up_interruptible(&databuffer); //tell poll that data is ready
data_ready = 0;
if (signal_pending(data_thread))
break;
}
do_exit(0);
return 0;
}
I am currently using sched_setaffinity to pin a thread to a particular CPU core.
void setCpuAffinity(int id) {
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(id, &cpuset);
assert(sched_setaffinity(0, sizeof(cpuset), &cpuset) == 0);
}
Unfortunately, I cannot find a corresponding function to unpin the thread from any given core, and allow the kernel to schedule the thread on any core.
How can I reverse the effect of CPU pinning and allow free movement of a thread once again?
I am aware that one hack would be to use CPU_OR and add every CPU ID to the allowable set, but I am looking for a less hacky approach that restores the state of a thread to the state before sched_setaffinity was ever called.
Using #phs's suggestion, I wrote the following two functions, which are called before and after cpu pinning to unpin the thread. This successfully unpins the CPU.
void setCpuAffinity(cpu_set_t cpuset) {
assert(sched_setaffinity(0, sizeof(cpuset), &cpuset) == 0);
}
cpu_set_t getCpuAffinity() {
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
assert(sched_getaffinity(0, sizeof(cpuset), &cpuset) == 0);
return cpuset;
}
I know I can set thread name (the one visible in gdb and htop) in Linux using prctl(). But with another OSes this most likely won't work. Also, I could try using pthread_setname_np(), which is a bit more available across POSIX systems, but still lacks full compatibility.
So I'd like to have some more portable way, maybe something QThread provides which I've not found. Is there any such way?
There's nothing in the QThread API to manually manage the system name of the thread, however, since version 4.8.3, Qt will automatically set the name of your thread to the name of the thread object (QObject::objectName()).
This is handled in the implementations of QThread as described below.
You have something like this in qthread_unix.cpp:
#if (defined(Q_OS_LINUX) || defined(Q_OS_MAC) || defined(Q_OS_QNX))
static void setCurrentThreadName(pthread_t threadId, const char *name)
{
# if defined(Q_OS_LINUX) && !defined(QT_LINUXBASE)
Q_UNUSED(threadId);
prctl(PR_SET_NAME, (unsigned long)name, 0, 0, 0);
# elif defined(Q_OS_MAC)
Q_UNUSED(threadId);
pthread_setname_np(name);
# elif defined(Q_OS_QNX)
pthread_setname_np(threadId, name);
# endif
}
#endif
/*
* [...]
*/
QString objectName = thr->objectName();
if (Q_LIKELY(objectName.isEmpty()))
setCurrentThreadName(thr->d_func()->thread_id, thr->metaObject()->className());
else
setCurrentThreadName(thr->d_func()->thread_id, objectName.toLocal8Bit());
And the equivalent in qthread_win.cpp:
typedef struct tagTHREADNAME_INFO
{
DWORD dwType; // must be 0x1000
LPCSTR szName; // pointer to name (in user addr space)
HANDLE dwThreadID; // thread ID (-1=caller thread)
DWORD dwFlags; // reserved for future use, must be zero
} THREADNAME_INFO;
void qt_set_thread_name(HANDLE threadId, LPCSTR threadName)
{
THREADNAME_INFO info;
info.dwType = 0x1000;
info.szName = threadName;
info.dwThreadID = threadId;
info.dwFlags = 0;
__try
{
RaiseException(0x406D1388, 0, sizeof(info)/sizeof(DWORD), (const ULONG_PTR*)&info);
}
__except (EXCEPTION_CONTINUE_EXECUTION)
{
}
}
/*
* [...]
*/
QByteArray objectName = thr->objectName().toLocal8Bit();
qt_set_thread_name((HANDLE)-1, objectName.isEmpty() ? thr->metaObject()->className() : objectName.constData());
Note that on Windows, the above code won't be executed if QT_NO_DEBUG is set, thus it won't work in Release mode.
In Qt documentation you can find:
To choose the name that your thread will be given (as identified by
the command ps -L on Linux, for example), you can call setObjectName()
before starting the thread. If you don't call setObjectName(), the
name given to your thread will be the class name of the runtime type
of your thread object (for example, "RenderThread" in the case of the
Mandelbrot Example, as that is the name of the QThread subclass). Note
that this is currently not available with release builds on Windows.
I'd like to specify the cpu-affinity of a particular pthread. All the references I've found so far deal with setting the cpu-affinity of a process (pid_t) not a thread (pthread_t). I tried some experiments passing pthread_t's around and as expected they fail. Am I trying to do something impossible? If not, can you send a pointer please? Thanks a million.
This is a wrapper I've made to make my life easier. Its effect is that the calling thread gets "stuck" to the core with id core_id:
// core_id = 0, 1, ... n-1, where n is the system's number of cores
int stick_this_thread_to_core(int core_id) {
int num_cores = sysconf(_SC_NPROCESSORS_ONLN);
if (core_id < 0 || core_id >= num_cores)
return EINVAL;
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(core_id, &cpuset);
pthread_t current_thread = pthread_self();
return pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset);
}
Assuming linux:
The interface to setting the affinity is - as you've probably already discovered:
int sched_setaffinity(pid_t pid,size_t cpusetsize,cpu_set_t *mask);
Passing 0 as the pid, and it'll apply to the current thread only, or have other threads report their kernel pid with the linux-specific call pid_t gettid(void); and pass that in as the pid.
Quoting the man page
The affinity mask is actually a per-thread attribute that can be
adjusted independently for each of the
threads in a thread group. The value
returned from a call to gettid(2) can
be passed in the argument pid.
Specifying pid as 0 will set the
attribute for the calling thread, and
passing the value returned from a call
to getpid(2) will set the attribute
for the main thread of the thread
group. (If you are using the POSIX
threads API, then use
pthread_setaffinity_np (3) instead of
sched_setaffinity().)
//compilation: gcc -o affinity affinity.c -lpthread
#define _GNU_SOURCE
#include <sched.h> //cpu_set_t , CPU_SET
#include <pthread.h> //pthread_t
#include <stdio.h>
void *th_func(void * arg);
int main(void) {
pthread_t thread; //the thread
pthread_create(&thread,NULL,th_func,NULL);
pthread_join(thread,NULL);
return 0;
}
void *th_func(void * arg)
{
//we can set one or more bits here, each one representing a single CPU
cpu_set_t cpuset;
//the CPU we whant to use
int cpu = 2;
CPU_ZERO(&cpuset); //clears the cpuset
CPU_SET( cpu , &cpuset); //set CPU 2 on cpuset
/*
* cpu affinity for the calling thread
* first parameter is the pid, 0 = calling thread
* second parameter is the size of your cpuset
* third param is the cpuset in which your thread will be
* placed. Each bit represents a CPU
*/
sched_setaffinity(0, sizeof(cpuset), &cpuset);
while (1);
; //burns the CPU 2
return 0;
}
In POSIX environment you can use cpusets to control
which CPUs can be used by processes or pthreads.
This type of control is called CPU affinity.
The function 'sched_setaffinity' receives pthread IDs and
a cpuset as parameter.
When you use 0 in the first parameter, the calling thread
will be affected
Please find the below example program to cpu-affinity of a particular pthread.
Please add appropriate libs.
double waste_time(long n)
{
double res = 0;
long i = 0;
while (i <n * 200000) {
i++;
res += sqrt(i);
}
return res;
}
void *thread_func(void *param)
{
unsigned long mask = 1; /* processor 0 */
/* bind process to processor 0 */
if (pthread_setaffinity_np(pthread_self(), sizeof(mask),
&mask) <0) {
perror("pthread_setaffinity_np");
}
/* waste some time so the work is visible with "top" */
printf("result: %f\n", waste_time(2000));
mask = 2; /* process switches to processor 1 now */
if (pthread_setaffinity_np(pthread_self(), sizeof(mask),
&mask) <0) {
perror("pthread_setaffinity_np");
}
/* waste some more time to see the processor switch */
printf("result: %f\n", waste_time(2000));
}
int main(int argc, char *argv[])
{
pthread_t my_thread;
if (pthread_create(&my_thread, NULL, thread_func, NULL) != 0) {
perror("pthread_create");
}
pthread_exit(NULL);
}
Compile above program with -D_GNU_SOURCE flag.
The scheduler will change the cpu affinity as it sees fit; to set it persistently please see cpuset in /proc file system.
http://man7.org/linux/man-pages/man7/cpuset.7.html
Or you can write a short program that sets the cpu affinity periodically (every few seconds) with sched_setaffinity
What's the relation between CPU registers and CPU cache when it comes to cache coherence protocols such as MESI? If a certain value is stored in the CPU's cache, and is also stored in a register, then what will happen if the cache line will be marked as "dirty"? To my understanding there is no guarantee that the register will update it's value even though the cache was updated (due to MESI).
Hench this code:
static void Main()
{
bool complete = false;
var t = new Thread (() =>
{
bool toggle = false;
while (!complete) toggle = !toggle;
});
t.Start();
Thread.Sleep (1000);
complete = true;
t.Join(); // Blocks indefinitely
}
(let's assume the compiler didn't optimized the load for 'complete' outside the loop)
to my understanding, the update to "complete" isn't visible to the second thread since it's value is held inside a register (CPU 2's cache was update however).
Does placing a memory barrier forces to "flush" all of the registers? What's the relation of registers to the cache? and what about registers and memory barriers?
There is no relationship. Use the "volatile" keyword.
Modern C++ | C++11
You should use std::atomic
#include <thread>
#include <atomic>
#include <chrono>
std::atomic_bool g_exit{ false }, g_exited{ false };
using namespace std::chrono_literals;
void fn()
{
while (!g_exit)
{
// do something (lets say it took 5s)
std::this_thread::sleep_for(5s);
}
g_exited = true;
}
int main()
{
std::thread wt(fn);
wt.detach();
// do something (lets say it took 2s)
std::this_thread::sleep_for(2s);
// Exit
g_exit = true;
for (int i = 0; i < 5; i++) {
std::this_thread::sleep_for(1s);
if (g_exited) {
break;
}
}
}
MESI protocol used in x86 platform guarantees cache coherence, i.e. changes in one CPU cache are automatically propagated to other CPU caches.
Therefore volatile keyword on x86 and x64 is useful only to prevent reordering.