robust_list not calling FUTEX_WAKE - linux

The Linux robust_list mechanism is a tool used by robust mutexes to support automatic unlocking in the event that the lock owner fails to unlock before terminating, maybe due to unexpected death. According to man set_robust_list:
The purpose of the robust futex list is to ensure that if a thread accidentally fails to unlock a futex before terminating or calling execve(2), another thread that is waiting on that futex is notified that the former owner of the futex has died. This notification consists of two pieces: the FUTEX_OWNER_DIED bit is set in the futex word, and the kernel performs a futex(2) FUTEX_WAKE operation on one of the threads waiting on the futex.
This is not the behavior I'm seeing.
I'm seeing the futex replaced with FUTEX_OWNER_DIED, not ored with.
And I'm not getting the FUTEX_WAKE call.
#include <chrono>
#include <thread>
#include <linux/futex.h>
#include <stdint.h>
#include <stdio.h>
#include <syscall.h>
#include <unistd.h>
using ftx_t = uint32_t;
struct mtx_t {
mtx_t* next;
mtx_t* prev;
ftx_t ftx;
};
thread_local robust_list_head robust_head;
void robust_init() {
robust_head.list.next = &robust_head.list;
robust_head.futex_offset = offsetof(mtx_t, ftx);
robust_head.list_op_pending = NULL;
syscall(SYS_set_robust_list, &robust_head.list, sizeof(robust_head));
}
void robust_op_start(mtx_t* mtx) {
robust_head.list_op_pending = (robust_list*)mtx;
__sync_synchronize();
}
void robust_op_end() {
__sync_synchronize();
robust_head.list_op_pending = NULL;
}
void robust_op_add(mtx_t* mtx) {
mtx_t* old_first = (mtx_t*)robust_head.list.next;
mtx->prev = (mtx_t*)&robust_head;
mtx->next = old_first;
__sync_synchronize();
robust_head.list.next = (robust_list*)mtx;
if (old_first != (mtx_t*)&robust_head) {
old_first->prev = mtx;
}
}
int futex(ftx_t* uaddr,
int futex_op,
int val,
uintptr_t timeout_or_val2,
ftx_t* uaddr2,
int val3) {
return syscall(SYS_futex, uaddr, futex_op, val, timeout_or_val2, uaddr2, val3);
}
int ftx_wait(ftx_t* ftx, int confirm_val) {
return futex(ftx, FUTEX_WAIT, confirm_val, 0, NULL, 0);
}
int main() {
mtx_t mtx = {0};
std::thread t0{[&]() {
fprintf(stderr, "t0 start\n");
ftx_wait(&mtx.ftx, 0);
fprintf(stderr, "t0 done\n");
}};
std::this_thread::sleep_for(std::chrono::milliseconds(100));
std::thread t1{[&]() {
fprintf(stderr, "t1 start\n");
robust_init();
robust_op_start(&mtx);
__sync_bool_compare_and_swap(&mtx.ftx, 0, syscall(SYS_gettid));
robust_op_add(&mtx);
robust_op_end();
fprintf(stderr, "t1 ftx: %x\n", mtx.ftx);
fprintf(stderr, "t1 done\n");
}};
t1.join();
std::this_thread::sleep_for(std::chrono::milliseconds(100));
fprintf(stderr, "ftx: %x\n", mtx.ftx);
t0.join();
}
Running
g++ -o ./example ~/example.cpp -lpthread && ./example
prints something like:
t0 start
t1 start
t1 ftx: 12ea65
t1 done
ftx: 40000000
and hangs.
I would expect the final value of the futex to be 4012ea65 and for thread 0 to unblock after thread 1 completes.

Related

Pause thread execution without using condition variable or other various synchronization pritmives

Problem
I wish to be able to pause the execution of a thread from a different thread. Note the thread paused should not have to cooperate. The pausing of the target thread does not have to occur as soon as the pauser thread wants to pause. Delaying the pausing is allowed.
I cannot seem to find any information on this, as all searches yielded me results that use condition variables...
Ideas
use the scheduler and kernel syscalls to stop the thread from being scheduled again
use debugger syscalls to stop the target thread
OS-agnostic is preferable, but not a requirement. This likely will be very OS-dependent, as messing with scheduling and threads is a pretty low-level operation.
On a Unix-like OS, there's pthread_kill() which delivers a signal to a specified thread. You can arrange for that signal to have a handler which waits until told in some manner to resume.
Here's a simple example, where the "pause" just sleeps for a fixed time before resuming. Try on godbolt.
#include <unistd.h>
#include <pthread.h>
#include <signal.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
void safe_print(const char *s) {
int saved_errno = errno;
if (write(1, s, strlen(s)) < 0) {
exit(1);
}
errno = saved_errno;
}
void sleep_msec(int msec) {
struct timespec t = {
.tv_sec = msec / 1000,
.tv_nsec = (msec % 1000) * 1000 * 1000
};
nanosleep(&t, NULL);
}
void *work(void *unused) {
(void) unused;
for (;;) {
safe_print("I am running!\n");
sleep_msec(100);
}
return NULL;
}
void handler(int sig) {
(void) sig;
safe_print("I am stopped.\n");
sleep_msec(500);
}
int main(void) {
pthread_t thr;
pthread_create(&thr, NULL, work, NULL);
sigset_t empty;
sigemptyset(&empty);
struct sigaction sa = {
.sa_handler = handler,
.sa_flags = 0,
};
sigemptyset(&sa.sa_mask);
sigaction(SIGUSR1, &sa, NULL);
for (int i = 0; i < 5; i++) {
sleep_msec(1000);
pthread_kill(thr, SIGUSR1);
}
pthread_cancel(thr);
pthread_join(thr, NULL);
return 0;
}

How could futex_wake return 0

I implemented semaphore using futex. The following program often fails at the assertion in sem_post(). While the return value is supposed to be 1, it sometimes returns 0. How can this happen?
When I use POSIX semaphore the program always finishes successfully.
I'm using Linux 2.6.32-642.6.1.el6.x86_64
#include <cstdio>
#include <cstdlib>
#include <cassert>
#include <ctime>
#include <linux/futex.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
#include <errno.h>
using namespace std;
#if 0
#include <semaphore.h>
#else
typedef volatile int sem_t;
void sem_init(sem_t* sem, int shared, int value)
{
*sem = value;
}
void sem_post(sem_t* sem)
{
while (1)
{
int value = *sem;
if (__sync_bool_compare_and_swap(sem, value, value >= 0 ? value+1 : 1))
{
if (value < 0) // had contender
{
int r = syscall(SYS_futex, sem, FUTEX_WAKE, 1, NULL, 0, 0);
if (r != 1)
fprintf(stderr, "post r=%d err=%d sem=%d %d\n", r,errno,value,*sem);
assert(r == 1);
}
return;
}
}
}
int sem_wait(sem_t* sem)
{
while (1)
{
int value = *sem;
if (value > 0 // positive means no contender
&& __sync_bool_compare_and_swap(sem, value, value-1))
return 0;
if (value <= 0
&& __sync_bool_compare_and_swap(sem, value, -1))
{
int r= syscall(SYS_futex, sem, FUTEX_WAIT, -1, NULL, 0, 0);
if (!r) {
assert(__sync_fetch_and_sub(sem, 1) > 0);
return 0;
}
printf("wait r=%d errno=%d sem=%d %d\n", r,errno, value,*sem);
}
}
}
void sem_getvalue(sem_t* sem, int* value)
{
*value = *sem;
}
#endif
// return current time in ns
unsigned long GetTime()
{
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
return ts.tv_sec*1000000000ul + ts.tv_nsec;
}
void Send(sem_t* sem, unsigned count)
{
while (count--)
sem_post(sem);
}
void Receive(sem_t* sem, unsigned count)
{
while (count--)
sem_wait(sem);
}
int main()
{
sem_t* sem = reinterpret_cast<sem_t*>(mmap(NULL, sizeof(sem_t), PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_SHARED, -1, 0));
assert(sem != MAP_FAILED);
sem_init(sem, 1, 0);
unsigned count = 10485760;
int pid = fork();
assert(pid != -1);
if (!pid) // child
{
Send(sem, count);
_exit(EXIT_SUCCESS);
}
else // parent
{
unsigned long t0 = GetTime();
Receive(sem, count);
printf("t=%g ms\n", (GetTime()-t0)*1e-6);
wait(NULL);
int v;
sem_getvalue(sem, &v);
assert(v == 0);
}
}
The call to syscall(SYS_futex, sem, FUTEX_WAKE, 1, NULL, 0, 0) will return 0 when there is no thread waiting on sem. In your code this is possible because you call that futex line in sem_post when *sem is negative which can be the case without that any thread is actually sleeping:
If *sem is zero when calling sem_wait you continue to execute __sync_bool_compare_and_swap(sem, value, -1) which sets *sem to -1. At that point this thread is not yet sleeping however. So, when another thread calls sem_post at that point (before the thread that is calling sem_wait enters the futex syscall) your assert failure will happen.
it seems that __sync_bool_compare_and_swap(sem, value, -1) and __sync_fetch_and_sub(sem, 1) are problematic. We need to keep in mind that sem_wait may be called concurrently by multiple threads (although in your test case there is only one thread calling it).
If we can afford the overhead of busy polling, we can remove the futex and result in the following code. It is also faster than the futex version (t=347 ms, while the futex version is t=914 ms).
void sem_post(sem_t* sem)
{
int value = __sync_fetch_and_add(sem, 1);
}
int sem_wait(sem_t* sem)
{
while (1)
{
int value = *sem;
if (value > 0) // positive means no contention
{
if (__sync_bool_compare_and_swap(sem, value, value-1)) {
return 0; // success
}
}
// yield the processor to avoid deadlock
sched_yield();
}
}
The code works as follows: The shared variable *sem is always non-negative. When a thread posts the semaphore from 0 to 1, all threads waiting on the semaphore may try, but exactly one thread will succeed in compare_and_swap.

Is this a bug in linux kernel concerning write to /proc/self/loginuid?

There is a possibility that i found a bug in linux kernel. Let's consider application that write to /proc/self/loginuid from main thread and one auxiliary thread. The code is below:
#include <stdio.h>
#include <pthread.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
void write_loginuid(char *str)
{
int fd;
printf("%s\n", str);
fd = open("/proc/self/loginuid", O_RDWR);
if (fd < 0) {
perror(str);
return;
}
if (write(fd, "0", 2) != 2) {
printf("write\n");
perror(str);
}
close(fd);
}
void *thread_function(void *arg)
{
fprintf(stderr, "Hello from thread! my pid = %u, tid = %u, parent pid = %u\n", getpid(), syscall(SYS_gettid), getppid());
write_loginuid("thread");
return NULL;
}
int main()
{
pthread_t thread;
pthread_create(&thread, NULL, thread_function, NULL);
write_loginuid("main process");
fprintf(stderr, "test my pid = %u, tid = %u, parent pid = %u\n", getpid(), syscall(SYS_gettid), getppid());
pthread_join(thread, NULL);
return 0;
}
After executing this application we get:
main process
test my pid = 3487, tid = 3487, parent pid = 3283
Hello from thread! my pid = 3487, tid = 3488, parent pid = 3283
thread
write
thread: Operation not permitted
That tells us the thread write failed by -EPERM.
Looking at the kernel file fs/proc/base.c and function proc_loginuid_write() we see at the beginning check:
static ssize_t proc_loginuid_write(struct file * file, const char __user * buf,
size_t count, loff_t *ppos)
{
struct inode * inode = file_inode(file);
uid_t loginuid;
kuid_t kloginuid;
int rv;
/* this is the probably buggy check */
rcu_read_lock();
if (current != pid_task(proc_pid(inode), PIDTYPE_PID)) {
rcu_read_unlock();
return -EPERM;
}
rcu_read_unlock();
So, looking at the code above we see that only for exact PID (checked by me with printks) we pass through.Thread doesn't satisfy the condition, because compared pids differs.
So my question is: is this a bug ? Why to not allow thread's of particular process to change the loginuid? I encountered this in login application that spawned another thread for PAM login.
Whether this is bug or not i written a fix that extends writing permission to this file by threads:
rcu_read_lock();
/*
* I changed the condition that it checks now the tgid as returned in sys_getpid()
* rather than task_struct pointers
*/
if (task_tgid_vnr(current) != task_tgid_vnr(pid_task(proc_pid(inode), PIDTYPE_PID))) {
rcu_read_unlock();
return -EPERM;
}
rcu_read_unlock();
What do you think about it? Does it affects security?

Cygwin: interrupting blocking read

I've written the program which spawns a thread that reads in a loop from stdin in a blocking fashion. I want to make the thread return from blocked read immediately. I've registered my signal handler (with sigaction and without SA_RESTART flag) in the reading thread, send it a signal and expect read to exit with EINTR error. But it doesn't happen. Is it issue or limitation of Cygwin or am I doing something wrong?
Here is the code:
#include <stdio.h>
#include <errno.h>
#include <pthread.h>
pthread_t thread;
volatile int run = 0;
void root_handler(int signum)
{
printf("%s ENTER (thread is %x)\n", __func__, pthread_self());
run = 0;
}
void* thr_func(void*arg)
{ int res;
char buffer[256];
printf("%s ENTER (thread is %x)\n", __func__, pthread_self());
struct sigaction act;
memset (&act, 0, sizeof(act));
act.sa_sigaction = &root_handler;
//act.sa_flags = SA_RESTART;
if (sigaction(SIGUSR1, &act, NULL) < 0) {
perror ("sigaction error");
return 1;
}
while(run)
{
res = read(0,buffer, sizeof(buffer));
if(res == -1)
{
if(errno == EINTR)
{
puts("read was interrupted by signal");
}
}
else
{
printf("got: %s", buffer);
}
}
printf("%s LEAVE (thread is %x)\n", __func__, pthread_self());
}
int main() {
run = 1;
printf("root thread: %x\n", pthread_self());
pthread_create(&thread, NULL, &thr_func, NULL);
printf("thread %x started\n", thread);
sleep(4);
pthread_kill(thread, SIGUSR1 );
//raise(SIGUSR1);
pthread_join(thread, NULL);
return 0;
}
I'm using Cygwin (1.7.32(0.274/5/3)).
I've just tried to do the same on Ubuntu and it works (I needed to include signal.h, though, even though in Cygwin it compiled as it is). It seems to be peculiarity of Cygwin's implementation.

pclose() returns SIGPIPE intermittently

When the following C program is executed, and SIGUSR1 is sent to the running process repeatedly, the pclose() call will sometimes return 13. 13 corresponds to SIGPIPE on my system.
Why does this happen?
I am using while true; do kill -SIGUSR1 <process-id>; done to send SIGUSR1 to the program. The program is executed on Ubuntu 14.04.
#include <pthread.h>
#include <signal.h>
#include <unistd.h>
#include <stdio.h>
void handler(int i) {}
void* task(void*)
{
FILE *s;
char b [BUFSIZ];
while (1) {
if ((s = popen("echo hello", "r")) == NULL) {
printf("popen() failed\n");
}
while (fgets(b, BUFSIZ, s) != NULL) ;
if (int r = pclose(s)) {
printf("pclose() failed (%d)\n", r);
}
}
return 0;
}
int main(int argc, char **argv)
{
struct sigaction action;
action.sa_handler = handler;
sigemptyset(&action.sa_mask);
action.sa_flags = 0;
sigaction(SIGUSR1, &action, NULL);
pthread_t tid;
pthread_create(&tid, 0, task, NULL);
pthread_join(tid, NULL);
}
This happens when fgets gets interrupted by the signal. The program doesn't read the pipe to the end and closes it. The other program then SIGPIPEs.
The correct pipe reading operation is:
do {
while (fgets(b, BUFSIZ, s) != NULL) ;
} while (errno == EINTR);

Resources