need to know how to interrupt all pthreads - multithreading

In Linux, I am emulating an embedded system that has one thread that gets messages delivered to the outside world. If some thread detects an insurmountable problem, my goal is to stop all the other threads in their tracks (leaving useful stack traces) and allow only the message delivery thread to continue. So in my emulation environment, I want to "pthread_kill(tid, SIGnal)" each "tid". (I have a list. I'm using SIGTSTP.) Unfortunately, only one thread is getting the signal. "sigprocmask()" is not able to unmask the signal. Here is my current (non-working) handler:
void
wait_until_death(int sig)
{
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, sig);
sigprocmask(SIG_UNBLOCK, &mask, NULL);
for (;;)
pause();
}
I get verification that all the pthread_kill()'s get invoked, but only one thread has the handler in the stack trace. Can this be done?

This minimal example seems to function in the manner you want - all the threads except the main thread end up waiting in wait_until_death():
#include <stdio.h>
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
#define NTHREADS 10
pthread_barrier_t barrier;
void
wait_until_death(int sig)
{
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, sig);
sigprocmask(SIG_UNBLOCK, &mask, NULL);
for (;;)
pause();
}
void *thread_func(void *arg)
{
pthread_barrier_wait(&barrier);
for (;;)
pause();
}
int main(int argc, char *argv[])
{
const int thread_signal = SIGTSTP;
const struct sigaction sa = { .sa_handler = wait_until_death };
int i;
pthread_t thread[NTHREADS];
pthread_barrier_init(&barrier, NULL, NTHREADS + 1);
sigaction(thread_signal, &sa, NULL);
for (i = 0; i < NTHREADS; i++)
pthread_create(&thread[i], NULL, thread_func, NULL);
pthread_barrier_wait(&barrier);
for (i = 0; i < NTHREADS; i++)
pthread_kill(thread[i], thread_signal);
fprintf(stderr, "All threads signalled.\n");
for (;;)
pause();
return 0;
}
Note that unblocking the signal in the wait_until_death() isn't required: the signal mask is per-thread, and the thread that is executing the signal handler isn't going to be signalled again.
Presumably the problem is in how you are installing the signal handler, or setting up thread signal masks.

This is impossible. The problem is that some of the threads you stop may hold locks that the thread you want to continue running requires in order to continue making forward progress. Just abandon this idea entirely. Trust me, this will only cause you great pain.
If you literally must do it, have all the other threads call a conditional yielding point at known safe places where they hold no lock that can prevent any other thread from reaching its next conditional yielding point. But this is very difficult to get right and is very prone to deadlock and I strongly advise not trying it.

Related

Pause thread execution without using condition variable or other various synchronization pritmives

Problem
I wish to be able to pause the execution of a thread from a different thread. Note the thread paused should not have to cooperate. The pausing of the target thread does not have to occur as soon as the pauser thread wants to pause. Delaying the pausing is allowed.
I cannot seem to find any information on this, as all searches yielded me results that use condition variables...
Ideas
use the scheduler and kernel syscalls to stop the thread from being scheduled again
use debugger syscalls to stop the target thread
OS-agnostic is preferable, but not a requirement. This likely will be very OS-dependent, as messing with scheduling and threads is a pretty low-level operation.
On a Unix-like OS, there's pthread_kill() which delivers a signal to a specified thread. You can arrange for that signal to have a handler which waits until told in some manner to resume.
Here's a simple example, where the "pause" just sleeps for a fixed time before resuming. Try on godbolt.
#include <unistd.h>
#include <pthread.h>
#include <signal.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
void safe_print(const char *s) {
int saved_errno = errno;
if (write(1, s, strlen(s)) < 0) {
exit(1);
}
errno = saved_errno;
}
void sleep_msec(int msec) {
struct timespec t = {
.tv_sec = msec / 1000,
.tv_nsec = (msec % 1000) * 1000 * 1000
};
nanosleep(&t, NULL);
}
void *work(void *unused) {
(void) unused;
for (;;) {
safe_print("I am running!\n");
sleep_msec(100);
}
return NULL;
}
void handler(int sig) {
(void) sig;
safe_print("I am stopped.\n");
sleep_msec(500);
}
int main(void) {
pthread_t thr;
pthread_create(&thr, NULL, work, NULL);
sigset_t empty;
sigemptyset(&empty);
struct sigaction sa = {
.sa_handler = handler,
.sa_flags = 0,
};
sigemptyset(&sa.sa_mask);
sigaction(SIGUSR1, &sa, NULL);
for (int i = 0; i < 5; i++) {
sleep_msec(1000);
pthread_kill(thr, SIGUSR1);
}
pthread_cancel(thr);
pthread_join(thr, NULL);
return 0;
}

synchronising lock step execution of threads

I have a top level controller, which schedules n sub threads,
and waits for all of them to complete before scheduling them all over again. These threads go on forever, so the threads do not need to be joined.
So the pseudo-code is something like this (assuming n=2):
Top:
loop:
1. initiate T1 and T2
2. wait for completion of both T1 and T2
T1: (similarly for T2)
loop:
1. wait for lock-1
2. do something
3. send completion signal
I am thinking of the following code for this, where Top,T1,T2 are
separate threads:
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define NUM_PROCS 2
pthread_mutex_t m_1, m_2; // for scheduling T1,T2
int count;
pthread_mutex_t m_count; // for completion-signal
pthread_cond_t c_count;
pthread_attr_t attr; // for threads
pthread_t thread[NUM_PROCS+1];
void *Top(void *t) {
count=0;
while(1) {
pthread_mutex_unlock(&m_1);
pthread_mutex_unlock(&m_2);
// not sure if this the correct way to wait for T1&T2
pthread_mutex_lock(&m_count);
while(count < 2) {
pthread_cond_wait(&c_count, &m_count);
}
count=0;
pthread_mutex_unlock(&m_count);
}
}
void *T1(void *t) { // similarly for T2
while(1) {
pthread_mutex_lock(&m_1); // use m_2 for T2
sleep(1);
pthread_mutex_lock(&m_count);
count++;
pthread_mutex_unlock(&m_count);
pthread_cond_signal(&c_count);
}
}
void *T2(void *t) {
while(1) {
pthread_mutex_lock(&m_2);
sleep(1);
pthread_mutex_lock(&m_count);
count++;
pthread_mutex_unlock(&m_count);
pthread_cond_signal(&c_count);
}
}
int main() {
int rc;
int t[NUM_PROCS+1] = {0,1,2}; // thread numbers
pthread_mutex_init(&m_1, NULL); // initializations
pthread_mutex_init(&m_2, NULL);
pthread_mutex_init(&m_count, NULL);
pthread_cond_init(&c_count, NULL);
pthread_mutex_lock(&m_1); // to allow Top to start first
pthread_mutex_lock(&m_2);
pthread_attr_init(&attr); // initiate the threads
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
rc = pthread_create(&thread[0], &attr, Top, (void *)&t[0]);
rc = pthread_create(&thread[1], &attr, T1, (void *)&t[1]);
rc = pthread_create(&thread[2], &attr, T2, (void *)&t[2]);
}
My questions on the above code:
Is the above code correct?
Usually, lock and unlock are both done by the same thread.
So my solution, of T1 locking m_1 and Top unlocking it,
seems a bit weird. Is there a better way of doing this?
Is semaphore a more efficient way to do this synchronization?
Will the code change (except main() of course) if I implement
this as separate processes with shared memory, instead of as
threads? And will that be less efficient than the threads version?
A thread that has not locked a pthread mutex may not unlock it. If you need to create a lock that one thread can acquire and another thread can release, you have to do so with your own code. A standard mutex is not such a lock.

Dead lock in the mutex, condition variable code?

I'm reading the book, Modern Operation Systems by AS TANENBAUM and it gives an example explaining condition variable as below. It looks to me there is a deadlock and not sure what I miss.
Lets assume consumer thread starts first. Right after the_mutex is locked, consumer thread is blocked waiting for the condition variable, condc.
If producer is running at this time, the_mutex will still be locked, because consumer never releases it. So producer will also be blocked.
This looks to me a textbook deadlock issue. Did I miss something here? Thx
#include <stdio.h>
#include <pthread.h>
#define MAX 10000000000 /* Numbers to produce */
pthread_mutex_t the_mutex;
pthread_cond_t condc, condp;
int buffer = 0;
void* consumer(void *ptr) {
int i;
for (i = 1; i <= MAX; i++) {
pthread_mutex_lock(&the_mutex); /* lock mutex */
/*thread is blocked waiting for condc */
while (buffer == 0) pthread_cond_wait(&condc, &the_mutex);
buffer = 0;
pthread_cond_signal(&condp);
pthread_mutex_unlock(&the_mutex);
}
pthread_exit(0);
}
void* producer(void *ptr) {
int i;
for (i = 1; i <= MAX; i++) {
pthread_mutex_lock(&the_mutex); /* Lock mutex */
while (buffer != 0) pthread_cond_wait(&condp, &the_mutex);
buffer = i;
pthread_cond_signal(&condc);
pthread_mutex_unlock(&the_mutex);
}
pthread_exit(0);
}
int main(int argc, char **argv) {
pthread_t pro, con;
//Simplified main function, ignores init and destroy for simplicity
// Create the threads
pthread_create(&con, NULL, consumer, NULL);
pthread_create(&pro, NULL, producer, NULL);
}
When you wait on a condition variable, the associated mutex is released for the duration of the wait (that's why you pass the mutex to pthread_cond_wait).
When pthread_cond_wait returns, the mutex is always locked again.
Keeping this in mind, you can follow the logic of the example.

pthread_cancel and cancellation point

I'm learning the pthread_cancel function and testing whether thread would be cancelled when it doesn't reach cancellation point. Thread is created by default attribute and make it running in add loop. But when cancellation request was sent and thread exit immediately. It doesn't reach cancellation point and I think it should not respond to the request immediately.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
void *thread_func(void *arg)
{
int i;
int j;
int k;
k = 1;
/* add operation */
for (i=0; i<1000; ++i) {
for (j=0; j<10000;++j) {
++k; // maybe for(z=0; z<10000; ++z) added would
// be better
}
}
return (void *)10;
}
int main(void)
{
char *retval;
pthread_t tid;
if (pthread_create(&tid, NULL, thread_func, NULL) != 0) {
printf("create error\n");
}
if (pthread_cancel(tid) != 0) { // cancel thread
printf("cancel error\n");
}
pthread_join(tid, (void **)retval);
printf("main thread exit\n");
return 0;
}
To have a "cancellation point" you need to use pthread_setcancelstate() to disable cancellation at the start of your thread function and then enable it when you want. When a new thread is spawned, it has the cancel state "enabled" meaning it can be canceled immediately at any time.
Perhaps more to the point, you probably shouldn't use pthread_cancel() at all. For more on that, see here: Cancelling a thread using pthread_cancel : good practice or bad
Cancelling a thread never means that it will immediately cancel anything which is running. It would just post a request to that thread. pthread_cancel only cancels a thread at a cancellation point. The list of cancellation points are defined in the man page of pthreads. In the above thread, you don't have any code which is a cancellation point. So the thread will always complete and will never get canceled. You can increase the loop or put a print statement at the last line of your thread and you will see that it is always completing the thread.
But, if you change the below code to add usleep (it is one of the cancellation point as defined in the man pages), you can see that the thread terminates after usleep. Even if you run any number of times, the thread will only get terminated at the cancellation point that is immediately after usleep and not any other point.
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
void *thread_func(void *arg)
{
int i;
int j;
int k;
k = 1;
/* add operation */
for (i=0; i<1000; ++i) {
printf("Before - %d\n", i);
usleep(1);
printf("After - %d\n", i);
for (j=0; j<10000;++j) {
++k; // maybe for(z=0; z<10000; ++z) added would
// be better
}
printf("Never - %d\n", i);
}
printf("Normal Exit of thread\n");
return (void *)10;
}
int main(void)
{
char *retval;
pthread_t tid;
if (pthread_create(&tid, NULL, thread_func, NULL) != 0) {
printf("create error\n");
}
usleep(1000);
if (pthread_cancel(tid) != 0) { // cancel thread
printf("cancel error\n");
}
pthread_join(tid, (void **)retval);
printf("main thread exit\n");
return 0;
}

Closing a file descriptor that is being polled

If I have two threads (Linux, NPTL), and I have one thread that is polling on one or more of file descriptors, and another is closing one of them, is that a reasonable action? Am I doing something that I shouldn't be doing in MT environment?
The main reason I consider doing that, is that I don't necessarily want to communicate with the polling thread, interrupt it, etc., I instead would like to just close the descriptor for whatever reasons, and when the polling thread wakes up, I expect the revents to contain POLLNVAL, which would be the indication that the file descriptor should just be thrown away by the thread before the next poll.
I've put together a simple test, which does show that the POLLNVAL is exactly what's going to happen. However, in that case, POLLNVAL is only set when the timeout expires, closing the socket doesn't seem to make the poll() return. If that's the case, I can kill the thread to make poll() restart to return.
#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <poll.h>
#include <errno.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
static pthread_t main_thread;
void * close_some(void*a) {
printf("thread #2 (%d) is sleeping\n", getpid());
sleep(2);
close(0);
printf("socket closed\n");
// comment out the next line to not forcefully interrupt
pthread_kill(main_thread, SIGUSR1);
return 0;
}
void on_sig(int s) {
printf("signal recieved\n");
}
int main(int argc, char ** argv) {
pthread_t two;
struct pollfd pfd;
int rc;
struct sigaction act;
act.sa_handler = on_sig;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
sigaction(SIGUSR1, &act, 0);
main_thread = pthread_self();
pthread_create(&two, 0, close_some, 0);
pfd.fd = 0;
pfd.events = POLLIN | POLLRDHUP;
printf("thread 0 (%d) polling\n", getpid());
rc = poll(&pfd, 1, 7000);
if (rc < 0) {
printf("error : %s\n", strerror(errno));
} else if (!rc) {
printf("time out!\n");
} else {
printf("revents = %x\n", pfd.revents);
}
return 0;
}
For Linux at least, this seems risky. The manual page for close warns:
It is probably unwise to close file descriptors while they may be in
use by system calls in other threads in the same process. Since a
file descriptor may be reused, there are some obscure race conditions
that may cause unintended side effects.
Since you're on Linux, you could do the following:
Set up an eventfd and add it to the poll
Signal the eventfd (write to it) when you want to close a fd
In the poll, when you see activity on the eventfd you can immediately close a fd and remove it from poll
Alternatively you could simply establish a signal handler and check for errno == EINTR when poll returns. The signal handler would only need to set some global variable to the value of the fd you're closing.
Since you're on Linux you might want to consider epoll as a superior albeit non-standard alternative to poll.

Resources