Race condition between wait_event and wake_up in Linux kernel

Race condition between wait_event and wake_up in Linux kernel - linux

I'm a kernel newbie. I just got this question when reading the source code.
In the implementation of wait_event(), the kernel does something like this:
...
prepare_to_wait(); /* enqueue current thread to the wait queue */
...
schedule(); /* invoke deactivate_task() inside, which will dequeue current thread from the runqueue */
...
in the implementation of "wake_up()", the kernel does the follows:
...
try_to_wake_up(); /* invoke activate_task() inside, which will enqueue the target thread into the runqueue */
...
in a concurrent execution, what if the above functions are invoked in the following order:
...
prepare_to_wait(); /* thread A adds itself to the wait queue */
...
try_to_wake_up(); /* thread B wakes up A and enqueues it into the runqueue */
...
schedule(); /* thread A dequeues itself from the runqueue and yields the CPU */
...
Thread A is not in either the runqueue or the wait queue. Does that mean we lost thread A? The kernel must have some mechanism to prevent this from happening. Could someone tell me what I missed here? Thanks!

I found the answer in the article, Kernel Korner - Sleeping in the Kernel in Issue 137 of the Linux Journal dated Jul 28, 2005 by Kedar Sovani.
In a nutshell, this is the lost wakeup issue. The Linux kernel solves it by setting the task state to TASK_INTERRUPTIBLE. This causes calls to schedule() to wake immediately, even if someone has called a wake up function prior to the schedule() call [as well as the normal during].

One of the mechanisms from kernel is wait_event*() macros. It works in the way as explained in Kernel Korner - Sleeping in the Kernel

Related

down_interruptible in Linux

I am having some difficulty in understanding the below statement from LDD3. "down_interruptible - it allows a user-space process that is waiting on a semaphore to be interrupted by the user".
A userspace application wont be direclty making the down_interruptible call. Lets say a device driver does and the application is put to sleep by the device driver triggered by a call to down_interruptible. Now how does a signal to the user space application invokes the application from sleep because its the device driver which called the down_interruptible not the application.
Somebody please clarify this to me.

Any device driver does not run of its own, device driver run on behalf of a process via system calls.
Suppose any device driver invokes down_interruptible();, it means if semaphore is not available the respective process will be put on the semaphore wait-queue.
And task state will be change to TASK_INTERRUPTIBLE and scheduler will invoked to run any other process. Now the sleeping process can be wake up either by the event for it waiting (semaphore) or by the signal.
Example: kill -SIGINT <pid> will cause the process to change its state to TASK_RUNNING and will add the process to run queue.
Here it is the pseudo code of wait queue, how a process wait for any event.
/* ‘q’ is the wait queue we wish to sleep on */
DEFINE_WAIT(wait);
add_wait_queue(q, &wait);
while (!condition) /* condition is the event that we are waiting for */
{
prepare_to_wait(&q, &wait, TASK_INTERRUPTIBLE);
if (!signal_pending(current))
{
schedule();
continue;
}
ret = -ERESTARTSYS;
}
finish_wait(&q, &wait);
In your example the process is added to wait queue and waiting for the condition to release it. Meanwhile it checks for any pending signal also, if there is any, it will return -ERESTARTSYS, otherwise again go for sleep.

Linux pthread mutex and kernel scheduler

With a friend of mine, we disagree on how synchronization is handled at userspace level (in the pthread library).
a. I think that during a pthread_mutex_lock, the thread actively waits. Meaning the linux scheduler rises this thread, let it execute his code, which should looks like:
while (mutex_resource->locked);
Then, another thread is scheduled which potentially free the locked field, etc.
So this means that the scheduler waits for the thread to complete its schedule time before switching to the next one, no matter what the thread is doing.
b. My friend thinks that the waiting thread somehow tells the kernel "Hey, I'm asleep, don't wait for me at all".
In this case, the kernel would schedule the next thread right away, without waiting for the current thread to complete its schedule time, being aware this thread is sleeping.
From what I see in the code of pthread, it seems there is loop handling the lock. But maybe I missed something.
In embedded systems, it could make sense to prevent the kernel from waiting. So he may be right (but I hope he does not :D).
Thanks!

a. I think that during a pthread_mutex_lock, the thread actively waits.
Yes, glibc's NPTL pthread_mutex_lock have active wait (spinning),
BUT the spinning is used only for very short amount of time and only for some types of mutexes. After this amount, pthread_mutex_lock will go to sleep, by calling linux syscall futex with WAIT argument.
Only mutexes with type PTHREAD_MUTEX_ADAPTIVE_NP will spin, and default is PTHREAD_MUTEX_TIMED_NP (normal mutex) without spinning. Check MAX_ADAPTIVE_COUNT in __pthread_mutex_lock sources).
If you want to do infinite spinning (active waiting), use pthread_spin_lock function with pthread_spinlock_t-types locks.
I'll consider the rest of your question as if you are using pthread_spin_lock:
Then, another thread is scheduled which potentially free the locked field, etc. So this means that the scheduler waits for the thread to complete its schedule time before switching to the next one, no matter what the thread is doing.
Yes, if there is contention for CPU cores, the your thread with active spinning may block other thread from execute, even if the other thread is the one who will unlock the mutex (spinlock) which is needed by your thread.
But if there is no contention (no thread oversubscribing), and threads are scheduled on different cores (by coincidence, or by manual setting of cpu affinity with sched_setaffinity or pthread_setaffinity_np), spinning will enable you to proceed faster, then using OS-based futex.
b. My friend thinks that the waiting thread somehow tells the kernel "Hey, I'm asleep, don't wait for me at all". In this case, the kernel would schedule the next thread right away, without waiting for the current thread to complete...
Yes, he is right.
futex is the modern way to say OS that this thread is waiting for some value in memory (for opening some mutex); and in current implementation futex also puts our thread to sleep. It is not needed to wake it to do spinning, if kernel knows when to wake up this thread. How it knows? The lock owner, when doing pthread_mutex_unlock, will check, is there any other threads, sleeping on this mutex. If there is any, lock owner will call futex with FUTEX_WAKE, telling OS to wake some thread, registered as sleeper on this mutex.
There is no need to spin, if thread registers itself as waiter in OS.

Some debuging with gdb for this test program:
#include <pthread.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
pthread_mutex_t x = PTHREAD_MUTEX_INITIALIZER;
void* thr_func(void *arg)
{
pthread_mutex_lock(&x);
}
int main(int argc, char **argv)
{
pthread_t thr;
pthread_mutex_lock(&x);
pthread_create(&thr, NULL, thr_func, NULL);
pthread_join(thr,NULL);
return 0;
}
shows that a call to pthread_mutex_lock on a mutex results in a calling a system call futex with the op parameter set to FUTEX_WAIT (http://man7.org/linux/man-pages/man2/futex.2.html)
And this is description of FUTEX_WAIT:
FUTEX_WAIT
This operation atomically verifies that the futex address
uaddr still contains the value val, and sleeps awaiting FUTEX_WAKE on
this futex address. If the timeout argument is
non-NULL, its contents describe the maximum duration of the wait,
which is infinite otherwise. The arguments uaddr2 and val3 are
ignored.
So from this description I can say that if a mutex is locked then a thread will sleep and not actively wait. And it will sleep until futex with op equal to FUTEX_WAKE is called.

What is a safe and easy way to exchange data from a threaded ISR? (Raspberry Pi)

I'm trying to develop a C/C++ userspace application on the Raspberry Pi which processes data coming from an SPI device. I'm using the WiringPi Library (function wiringPiISR) which registers a function (the real interrupt handler) that will be called from a pthreaded interrupt handler on an IRQ event.
I heard that STL containers aren't thread safe, but is it enough to have a mutex lock while executing my callback function and of course a lock in the main thread while accessing the buffer/container there?
My "real interrupt handler" which is registered through wiringPiISR looks like this
std::deque<uint8_t> buffer;
static void irq_handler()
{
uint8_t data;
while (digitalRead(IRQ_PIN)==0)
{
data = spi_txrx(CMD_READBYTE);
pthread_mutex_lock(&mutex1);
callback(data);
pthread_mutex_unlock(&mutex1);
}
}
static void callback(uint8_t byte)
{
buffer.push_back(byte);
}
Or is there an easier way to achieve the data exchange between a threaded ISR and main thread?

Is that a real ISR ?
Anyway mutex are not a good fit for ISR, because they lead to priority inversion.
Let's look at normal mutex usage, with two thread :
Thread A runs and take the mmutex
for some reason, thread A is preempted, and thread B executes.
thread B try to take the mutex, but can't.
thread B is put to sleep, allowing another thread to run, for instance thread C or thread A
...
At some point, thread A wille be rescheduled, will resume it's operation, and release the mutex.
When thread B is scheduled again, takes the mutex.
Now the scenario is very different when it comes to ISR. ISR won't be put to sleep in favor of a lower priority thread, so the mutex owning thread will not run while you are in the ISR, and you will never get out of point three.
So the real question is, "When running an IRQ handler, is it possible for other code to run ?" Otherwise you are in deadlock !

How are threads terminated during a linux crash?

If you have a multithreaded program (Linux 2.26 kernel), and one thread does something that causes a segfault, will the other threads still be scheduled to run? How are the other threads terminated? Can someone explain the process shutdown procedure with regard to multithreaded programs?

When a fatal signal is delivered to a thread, either the do_coredump() or the do_group_exit() function is called. do_group_exit() sets the thread group exit code and then signals all the other threads in the thread group to exit with zap_other_threads(), before exiting the current thread. (do_coredump() calls coredump_wait() which similarly calls zap_threads()).
zap_other_threads() posts a SIGKILL for every other thread in the thread group and wakes it up with signal_wake_up(). signal_wake_up() calls kick_process(), which will boot the thread into kernel mode so that it can recieve the signal, using an IPI1 if necessary (eg. if it's executing on another CPU).
1. Inter-Processor Interrupt

Will the other thread still be scheduled to run?
No. The SEGV is a process-level issue. Unless you've handled the SEGV (which is almost always a bad idea) your whole process will exit, and all threads with it.
I suspect that the other threads aren't handled very nicely. If the handler calls exit() or _exit() thread cleanup handlers won't get called. This may be a good thing if your program is severely corrupted, it's going to be hard to trust much of anything after a seg fault.
One note from the signal man page:
According to POSIX, the behaviour of a process is undefined after it ignores a SIGFPE, SIGILL, or SIGSEGV signal that was not generated by the kill(2) or the raise(3) functions.
After a segfault you really don't want to be doing anything other than getting the heck out of that program.

How do I suspend another thread (not the current one)?

I'm trying to implement a simulation of a microcontroller. This simulation is not meant to do a clock cycle precise representation of one specific microcontroller but check the general correctness of the code.
I thought of having a "main thread" executing normal code and a second thread executing ISR code. Whenever an ISR needs to be run, the ISR thread suspends the "main thread".
Of course, I want to have a feature to block interrupts.
I thought of solving this with a mutex that the ISR thread holds whenever it executes ISR code while the main thread holds it as long as "interrupts are blocked".
A POR (power on reset) can then be implemented by not only suspending but killing the main thread (and starting a new one executing the POR function).
The windows API provides the necessary functions.
But it seems to be impossible to do the above with posix threads (on linux).
I don't want to change the actual hardware independent microcontroller code. So inserting anything to check for pending interrupts is not an option.
Receiving interrupts at non well behaved points is desirable, as this also happens on microcontrollers (unless you block interrupts).
Is there a way to suspend another thread on linux? (Debuggers must use that option somehow, I think.)
Please, don't tell me this is a bad idea. I know that is true in most circumstances. But the main code does not use standard libs or lock/mutexes/semaphores.

SIGSTOP does not work - it always stops the entire process.
Instead you can use some other signals, say SIGUSR1 for suspending and SIGUSR2 for resuming:
// at process start call init_pthread_suspending to install the handlers
// to suspend a thread use pthread_kill(thread_id, SUSPEND_SIG)
// to resume a thread use pthread_kill(thread_id, RESUME_SIG)
#include <signal.h>
#define RESUME_SIG SIGUSR2
#define SUSPEND_SIG SIGUSR1
static sigset_t wait_mask;
static __thread int suspended; // per-thread flag
void resume_handler(int sig)
{
suspended = 0;
}
void suspend_handler(int sig)
{
if (suspended) return;
suspended = 1;
do sigsuspend(&wait_mask); while (suspended);
}
void init_pthread_suspending()
{
struct sigaction sa;
sigfillset(&wait_mask);
sigdelset(&wait_mask, SUSPEND_SIG)
sigdelset(&wait_mask, RESUME_SIG);
sigfillset(&sa.sa_mask);
sa.sa_flags = 0;
sa.sa_handler = resume_handler;
sigaction(RESUME_SIG, &sa, NULL);
sa.sa_handler = suspend_handler;
sigaction(SUSPEND_SIG, &sa, NULL);
}
I am very annoyed by replies like "you should not suspend another thread, that is bad".
Guys why do you assume others are idiots and don't know what they are doing? Imagine that others, too, have heard about deadlocking and still, in full consciousness, want to suspend other threads.
If you don't have a real answer to their question why do you waste your and the readers' time.
An yes, IMO pthreads are very short-sighted api, a disgrace for POSIX.

The Hotspot JAVA VM uses SIGUSR2 to implement suspend/resume for JAVA threads on linux.
A procedure based on on a signal handler for SIGUSR2 might be:
Providing a signal handler for SIGUSR2 allows a thread to request a lock
(which has already been acquired by the signal sending thread).
This suspends the thread.
As soon as the suspending thread releases the lock, the signal handler can
(and will?) get the lock. The signal handler releases the lock immediately and
leaves the signal handler.
This resumes the thread.
It will probably be necessary to introduce a control variable to make sure that the main thread is in the signal handler before starting the actual processing of the ISR.
(The details depend on whether the signal handler is called synchronously or asynchronously.)
I don't know, if this is exactly how it is done in the Java VM, but I think the above procedure does what I need.

Somehow I think sending the other thread SIGSTOP works.
However, you are far better off writing some thread communication involving senaogires.mutexes and global variables.
You see, if you suspend the other thread in malloc() and you call malloc() -> deadlock.
Did I mention that lots of C standard library functions, let alone other libraries you use, will call malloc() behind your back?
EDIT:
Hmmm, no standard library code. Maybe use setjmp/longjump() from signal handler to simulate the POR and a signal handier to simulate interrupt.
TO THOSE WHO KEEP DOWNVOTING THIS: The answer was accepted for the contents after EDIT, which is a specific scenario that cannot be used in any other scenario.

Solaris has the thr_suspend(3C) call that would do what you want. Is switching to Solaris a possibility?
Other than that, you're probably going to have to do some gymnastics with mutexes and/or semaphores. The problem is that you'll only suspend when you check the mutex, which will probably be at a well-behaved point. Depending on what you're actually trying to accomplish, this might now be desirable.

It makes more sense to have the main thread execute the ISRs - because that's how the real controller works (presumably). Just have it check after each emulated instruction if there is both an interrupt pending, and interrupts are currently enabled - if so, emulate a call to the ISR.
The second thread is still used - but it just listens for the conditions which cause an interrupt, and mark the relevant interrupt as pending (for the other thread to later pick up).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string