Closing a file descriptor that is being polled - linux

If I have two threads (Linux, NPTL), and I have one thread that is polling on one or more of file descriptors, and another is closing one of them, is that a reasonable action? Am I doing something that I shouldn't be doing in MT environment?
The main reason I consider doing that, is that I don't necessarily want to communicate with the polling thread, interrupt it, etc., I instead would like to just close the descriptor for whatever reasons, and when the polling thread wakes up, I expect the revents to contain POLLNVAL, which would be the indication that the file descriptor should just be thrown away by the thread before the next poll.
I've put together a simple test, which does show that the POLLNVAL is exactly what's going to happen. However, in that case, POLLNVAL is only set when the timeout expires, closing the socket doesn't seem to make the poll() return. If that's the case, I can kill the thread to make poll() restart to return.
#define _GNU_SOURCE
#include <stdio.h>
#include <pthread.h>
#include <poll.h>
#include <errno.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <signal.h>
static pthread_t main_thread;
void * close_some(void*a) {
printf("thread #2 (%d) is sleeping\n", getpid());
sleep(2);
close(0);
printf("socket closed\n");
// comment out the next line to not forcefully interrupt
pthread_kill(main_thread, SIGUSR1);
return 0;
}
void on_sig(int s) {
printf("signal recieved\n");
}
int main(int argc, char ** argv) {
pthread_t two;
struct pollfd pfd;
int rc;
struct sigaction act;
act.sa_handler = on_sig;
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
sigaction(SIGUSR1, &act, 0);
main_thread = pthread_self();
pthread_create(&two, 0, close_some, 0);
pfd.fd = 0;
pfd.events = POLLIN | POLLRDHUP;
printf("thread 0 (%d) polling\n", getpid());
rc = poll(&pfd, 1, 7000);
if (rc < 0) {
printf("error : %s\n", strerror(errno));
} else if (!rc) {
printf("time out!\n");
} else {
printf("revents = %x\n", pfd.revents);
}
return 0;
}

For Linux at least, this seems risky. The manual page for close warns:
It is probably unwise to close file descriptors while they may be in
use by system calls in other threads in the same process. Since a
file descriptor may be reused, there are some obscure race conditions
that may cause unintended side effects.
Since you're on Linux, you could do the following:
Set up an eventfd and add it to the poll
Signal the eventfd (write to it) when you want to close a fd
In the poll, when you see activity on the eventfd you can immediately close a fd and remove it from poll
Alternatively you could simply establish a signal handler and check for errno == EINTR when poll returns. The signal handler would only need to set some global variable to the value of the fd you're closing.
Since you're on Linux you might want to consider epoll as a superior albeit non-standard alternative to poll.

Related

Catching SIGUSR1 with sigtimedwait()

I'm not new to programming, but pretty new to Linux. I'm trying to use signals to asynchronously catch a push on a button, like this:
Run a worker thread which raises SIGUSR1 when the button is pushed.
Run a loop (main thread) around sigtimedwait() that will rotate info every two seconds (as long as the button is not pushed) or break (when the button is pushed).
According to the notes on sigtimedwait(), one should block the signals you want to wait for, then call sigtimedwait(). But I never see sigtimedwait() catching the blocked signals. I have run the code below in a few ways to see what happens with different scenarios:
Call to pthread_sigmask() disabled, call to signal() disabled,
result: programs exits with message "User defined signal 1".
Call to pthread_sigmask() disabled, call to signal() enabled, result:
message "Button 1 pressed sync1 hit" appears, sigtimedwait() always
returns EAGAIN.
Call to pthread_sigmask() enabled, call to signal() disabled, result:
message "Button 1 pressed" appears, sigtimedwait() always returns
EAGAIN.
Call to pthread_sigmask() enabled, call to signal() enabled, result
of course same as previous because the handler will not be called.
All as expected, except for the fact that sigtimedwait() doesn't seem to catch the signal when it's pending.
I've looked into similar code, e.g. this. But I don't understand how that particular code could work: SIGUSR1 isn't blocked, so raising that should immediately terminate the program (the default action for SIGUSR1).
It looks like I'm missing something here. What am I doing wrong? Or is the whole idea of using raise() in a worker thread wrong? I'm running this on a Raspberry Pi 3 with Raspbian Stretch (Debian 9.1), could there be a problem in that?
[I know printf() shouldn't be used in a signal handler, but for this purpose it works]
Any help appreciated, thx!
#include <stdio.h>
#include <stdlib.h>
#include <bcm2835.h>
#include <signal.h>
#include <pthread.h>
#include <errno.h>
#define PIN_BUTTON1 RPI_V2_GPIO_P1_22 // GPIO #24
// Thread function
void* check_button1(void* param)
{
while (true)
{
if (bcm2835_gpio_lev(PIN_BUTTON1) == HIGH)
{
printf("Button 1 pressed ");
raise(SIGUSR1);
}
delay(250);
}
}
// Signal handler, if applied
volatile sig_atomic_t usr_interrupt = 0;
void sync1(int sig)
{
printf("sync1 hit ... ");
usr_interrupt = 1;
}
int main(int argc, char** argv)
{
if (!bcm2835_init())
{
printf("Failed to initialize BCM2835 GPIO library.");
return 1;
}
bcm2835_gpio_fsel(PIN_BUTTON1, BCM2835_GPIO_FSEL_INPT);
sigset_t sigusr;
sigemptyset(&sigusr);
sigaddset(&sigusr, SIGUSR1);
pthread_sigmask(SIG_BLOCK, &sigusr, NULL);
signal(SIGUSR1, sync1);
// Start the threads to read the button pin state
pthread_t th1;
pthread_create(&th1, NULL, check_button1, NULL);
// Create a two second loop
struct timespec timeout = { 0 };
timeout.tv_sec = 2;
usr_interrupt = 0;
int nLoopCount = 0;
while (true)
{
printf("Loop %d, waiting %d seconds ... ", ++nLoopCount, timeout.tv_sec);
int nResult = sigtimedwait(&sigusr, NULL, &timeout);
if (nResult < 0)
{
switch (errno)
{
case EAGAIN: printf("EAGAIN "); break; // Time out, no signal raised, next loop
case EINTR: printf("EINTR "); break; // Interrupted by a signal other than SIGCHLD.
case EINVAL: printf("EINVAL "); exit(1); // Invalid timeout
default: printf("Result=%d Error=%d ", nResult, errno); break;
}
printf("\n");
continue;
}
printf("Signal %d caught\n", nResult);
}
return 0;
}
ADDENDUM: In the meantime, I got this working by replacing raise(SIGUSR1) by kill(getpid(), SIGUSR1). Strange, because according to the manual raise(x) is equivalent to kill(getpid, x) in single-threaded programs and to pthread_kill(pthread_self(), x) in multi-threaded ones. Replacing raise(SIGUSR1) by pthread_kill(pthread_self, SIGUSR1) has no effect. If anyone could explain this to me ...

need to know how to interrupt all pthreads

In Linux, I am emulating an embedded system that has one thread that gets messages delivered to the outside world. If some thread detects an insurmountable problem, my goal is to stop all the other threads in their tracks (leaving useful stack traces) and allow only the message delivery thread to continue. So in my emulation environment, I want to "pthread_kill(tid, SIGnal)" each "tid". (I have a list. I'm using SIGTSTP.) Unfortunately, only one thread is getting the signal. "sigprocmask()" is not able to unmask the signal. Here is my current (non-working) handler:
void
wait_until_death(int sig)
{
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, sig);
sigprocmask(SIG_UNBLOCK, &mask, NULL);
for (;;)
pause();
}
I get verification that all the pthread_kill()'s get invoked, but only one thread has the handler in the stack trace. Can this be done?
This minimal example seems to function in the manner you want - all the threads except the main thread end up waiting in wait_until_death():
#include <stdio.h>
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
#define NTHREADS 10
pthread_barrier_t barrier;
void
wait_until_death(int sig)
{
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, sig);
sigprocmask(SIG_UNBLOCK, &mask, NULL);
for (;;)
pause();
}
void *thread_func(void *arg)
{
pthread_barrier_wait(&barrier);
for (;;)
pause();
}
int main(int argc, char *argv[])
{
const int thread_signal = SIGTSTP;
const struct sigaction sa = { .sa_handler = wait_until_death };
int i;
pthread_t thread[NTHREADS];
pthread_barrier_init(&barrier, NULL, NTHREADS + 1);
sigaction(thread_signal, &sa, NULL);
for (i = 0; i < NTHREADS; i++)
pthread_create(&thread[i], NULL, thread_func, NULL);
pthread_barrier_wait(&barrier);
for (i = 0; i < NTHREADS; i++)
pthread_kill(thread[i], thread_signal);
fprintf(stderr, "All threads signalled.\n");
for (;;)
pause();
return 0;
}
Note that unblocking the signal in the wait_until_death() isn't required: the signal mask is per-thread, and the thread that is executing the signal handler isn't going to be signalled again.
Presumably the problem is in how you are installing the signal handler, or setting up thread signal masks.
This is impossible. The problem is that some of the threads you stop may hold locks that the thread you want to continue running requires in order to continue making forward progress. Just abandon this idea entirely. Trust me, this will only cause you great pain.
If you literally must do it, have all the other threads call a conditional yielding point at known safe places where they hold no lock that can prevent any other thread from reaching its next conditional yielding point. But this is very difficult to get right and is very prone to deadlock and I strongly advise not trying it.

Fail to wake up from epoll_wait when other process closes fifo

I'm seeing different epoll and select behavior in two different binaries and was hoping for some debugging help. In the following, epoll_wait and select will be used interchangeably.
I have two processes, one writer and one reader, that communicate over a fifo. The reader performs an epoll_wait to be notified of writes. I would also like to know when the writer closes the fifo, and it appears that epoll_wait should notify me of this as well. The following toy program, which behaves as expected, illustrates what I'm trying to accomplish:
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/epoll.h>
#include <sys/stat.h>
#include <unistd.h>
int
main(int argc, char** argv)
{
const char* filename = "tempfile";
char buf[1024];
memset(buf, 0, sizeof(buf));
struct stat statbuf;
if (!stat(filename, &statbuf))
unlink(filename);
mkfifo(filename, S_IRUSR | S_IWUSR);
pid_t pid = fork();
if (!pid) {
int fd = open(filename, O_WRONLY);
printf("Opened %d for writing\n", fd);
sleep(3);
close(fd);
} else {
int fd = open(filename, O_RDONLY);
printf("Opened %d for reading\n", fd);
static const int MAX_LENGTH = 1;
struct epoll_event init;
struct epoll_event evs[MAX_LENGTH];
int efd = epoll_create(MAX_LENGTH);
int i;
for (i = 0; i < MAX_LENGTH; ++i) {
init.data.u64 = 0;
init.data.fd = fd;
init.events |= EPOLLIN | EPOLLPRI | EPOLLHUP;
epoll_ctl(efd, EPOLL_CTL_ADD, fd, &init);
}
while (1) {
int nfds = epoll_wait(efd, evs, MAX_LENGTH, -1);
printf("%d fds ready\n", nfds);
int nread = read(fd, buf, sizeof(buf));
if (nread < 0) {
perror("read");
exit(1);
} else if (!nread) {
printf("Child %d closed the pipe\n", pid);
break;
}
printf("Reading: %s\n", buf);
}
}
return 0;
}
However, when I do this with another reader (whose code I'm not privileged to post, but which makes the exact same calls--the toy program is modeled on it), the process does not wake when the writer closes the fifo. The toy reader also gives the desired semantics with select. The real reader configured to use select also fails.
What might account for the different behavior of the two? For any provided hypotheses, how can I verify them? I'm running Linux 2.6.38.8.
strace is a great tool to confirm that the system calls are invoked correctly (i.e. parameters are passed correctly and they don't return any unexpected errors).
In addition to that I would recommend using lsof to check that no other process has that FIFO still opened.

poll() can't detect event when socket is closed locally?

I'm working on a project that will port a TCP/IP client program onto an embedded ARM-Linux controller board. The client program was originally written in epoll(). However, the target platform is quite old; the only kernel available is 2.4.x, and epoll() is not supported. So I decided to rewrite the I/O loop in poll().
But when I'm testing code, I found that poll() does not act as I expected : it won't return when a TCP/IP client socket is closed locally, by another thread. I've wrote a very simple codes to do some test:
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <fcntl.h>
#include <pthread.h>
#include <poll.h>
struct pollfd fdList[1];
void *thread_runner(void *arg)
{
sleep(10);
close(fdList[0].fd);
printf("socket closed\n");
pthread_exit(NULL);
}
int main(void)
{
struct sockaddr_in hostAddr;
int sockFD;
char buf[32];
pthread_t handle;
sockFD = socket(AF_INET, SOCK_STREAM, 0);
fcntl(sockFD,F_SETFL,O_NONBLOCK|fcntl(sockFD,F_GETFL,0));
inet_aton("127.0.0.1",&(hostAddr.sin_addr));
hostAddr.sin_family = AF_INET;
hostAddr.sin_port = htons(12345);
connect(sockFD,(struct sockaddr *)&hostAddr,sizeof(struct sockaddr));
fdList[0].fd = sockFD;
fdList[0].events = POLLOUT;
pthread_create(&handle,NULL,thread_runner,NULL);
while(1) {
if(poll(fdList,1,-1) < 1) {
continue;
}
if(fdList[0].revents & POLLNVAL ) {
printf("POLLNVAL\n");
exit(-1);
}
if(fdList[0].revents & POLLOUT) {
printf("connected\n");
fdList[0].events = POLLIN;
}
if(fdList[0].revents & POLLHUP ) {
printf("closed by peer\n");
close(fdList[0].fd);
exit(-1);
}
if(fdList[0].revents & POLLIN) {
if( read(fdList[0].fd, buf, sizeof(buf)) < 0) {
printf("closed by peer\n");
close(fdList[0].fd);
exit(-1);
}
}
}
return 0;
}
In this code I first create a TCP client socket, set to non-blocking mode, add to poll(), and close() the socket in another thread. And the result is: "POLLNVAL" is never printed while the socket is closed.
Is that an expected behavior of poll() ? Will it help if I choose select() instead of poll() ?
Yes, this is expected behavior. You solve this by using shutdown() on the socket instead of close().
See e.g. http://www.faqs.org/faqs/unix-faq/socket/ section 2.6
EDIT: The reason this is expected is that poll() and select() reacts to events happening on one of their fd's. close() removes the fd, it does not exist at all anymore, and thus it can't have any events associated with it.

Multithreading Semaphore

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#include <semaphore.h>
void *thread_function(void *arg);
sem_t bin_sem;
#define WORK_SIZE 1024
char work_area[WORK_SIZE];
int main() {
int res;
pthread_t a_thread;
void *thread_result;
res = sem_init(&bin_sem, 0, 0);
if (res != 0) {
perror(“Semaphore initialization failed”);
exit(EXIT_FAILURE);
}
res = pthread_create(&a_thread, NULL, thread_function, NULL);
if (res != 0) {
perror(“Thread creation failed”);
exit(EXIT_FAILURE);
}
printf(“Input some text. Enter ‘end’ to finish\n”);
while(strncmp(“end”, work_area, 3) != 0) {
fgets(work_area, WORK_SIZE, stdin);
sem_post(&bin_sem);
}
printf(“\nWaiting for thread to finish...\n”);
res = pthread_join(a_thread, &thread_result);
if (res != 0) {
perror(“Thread join failed”);
exit(EXIT_FAILURE);
}
printf(“Thread joined\n”);
sem_destroy(&bin_sem);
exit(EXIT_SUCCESS);
}
void *thread_function(void *arg) {
sem_wait(&bin_sem);
while(strncmp(“end”, work_area, 3) != 0) {
printf(“You input %d characters\n”, strlen(work_area) -1);
sem_wait(&bin_sem);}
pthread_exit(NULL);
}
In the program above, when the semaphore is released using sem_post(), is it
possible that the fgets and the counting function in thread_function execute
simultaneously .And I think this program fails in allowing the second thread
to count the characters before the main thread reads the keyboard again.
Is that right?
The second thread will only read characters after sem_wait has returned, signaling that a sem_post has been called somewhere, so I think that is fine.
As for fgets and the counting function, those two could be running simultaneously.
I would recommend a mutex lock on the work_area variable in this case, because if the user is editing the variable in one thread while it is being read in another thread, problems will occur.
You can either use a mutex or you can use a semaphore and set the initial count on it to 1.
If you implement a mutex or use a semaphore like that though, make sure to put the mutex_lock after sema_wait, or else a deadlock may occur.
In this example you want to have a mutex around the read & writes of the shared memory.
I know this is an example, but the following code:
fgets(work_area, WORK_SIZE, stdin);
Should really be:
fgets(work_area, sizeof(work_area), stdin);
If you change the size of work_area in the future (to some other constant, etc), it's quite likely that changing this second WORK_SIZE could be missed.

Resources