how to check if pthread_mutex is based on robust futex

how to check if pthread_mutex is based on robust futex - linux

I am trying to use robust futex based pthread mutex in Linux because I need to be both fast and robust (recover the "dead" lock). How can I check if the pthread mutex library on any Linux system is based off robust futex?
Thanks!

If you have the futex(2) system call and if it is used (just strace(1) a 10 line application that uses mutexes) then you have the robust feature as the futex(2) system call only entered the kernel after robustness has been built into it. This does not mean that you are using robust futexes, just that you have the feature in the kernel.
Next you want to know that your libc supports it. Any version above 2.9 supports it. Just check your version.
If you are writing a multi-threaded application then you don't really need the robustness of the futexes since you control the threads and can make sure that threads release the mutexes they use before they die or register a cleanup function to do the lock releasing (there is a pthread api for that). If you are still worried see my notes below about using robust mutexes anyway.
I just want to make it plain & clear that you are going to pay in performance if you want to use robust futexes in a multi-threaded application. The main use of robust futexes is to use them as synchronization primitives in multi-process applications where the chance of one component dying without killing the rest of the components is high compared to the same chance in a multi-threaded application where the abnormal death of a thread means the death of the entire application.
To use robust futexes in either a multi-threaded or a multi-process application you need to mark the futexes as robust by using the undocumented function pthread_mutexattr_setrobust(3). I've submitted a bug report to the manual pages maintainers to add documentation about that function. You need to pass PTHREAD_MUTEX_ROBUST to that function as opposed to PTHREAD_MUTEX_STALLED which is the default.
In a multi-threaded application marking the mutex as robust is all you have to do.
To use robust futexes in a multi-process application you need to also mark the futex as being shared across processes by calling the (fortunately documented) function pthread_mutexattr_setpshared(3) and pass PTHREAD_PROCESS_SHARED to it. This is opposed to the default PTHREAD_PROCESS_PRIVATE.
Actually in strace(1) you will not see acquisition and release of the locks but you will see calls to set_robust_list(2) if your futex is robust.
I hope this helps.

Related

How safe is pthread robust mutex?

I m thinking to use Posix robust mutexes to protect shared resource among different processes (on Linux). However there are some doubts about safety in difference scenarios. I have the following questions:
Are robust mutexes implemented in the kernel or in user code?
If latter, what would happen if a process happens to crash while in a call to pthread_mutex_lock or pthread_mutex_unlock and while a shared pthread_mutex datastructure is getting updated?
I understand that if a process locked the mutex and dies, a thread in another process will be awaken and return EOWNERDEAD. However, what would happen if the process dies (in unlikely case) exactly when the pthread_mutex datastructure (in shared memory) is being updated? Will the mutex get corrupted in that case? What would happen to another process that is mapped to the same shared memory if it were to call a pthread_mutex function?
Can the mutex still be recovered in this case?
This question applies to any pthread object with PTHREAD_PROCESS_SHARED attribute. Is it safe to call functions like pthread_mutex_lock, pthread_mutex_unlock, pthread_cond_signal, etc. concurrently on the same object from different processes? Are they thread-safe across different processes?

From the man-page for pthreads:
Over time, two threading implementations have been provided by the
GNU C library on Linux:
LinuxThreads
This is the original Pthreads implementation. Since glibc
2.4, this implementation is no longer supported.
NPTL (Native POSIX Threads Library)
This is the modern Pthreads implementation. By comparison
with LinuxThreads, NPTL provides closer conformance to the
requirements of the POSIX.1 specification and better
performance when creating large numbers of threads. NPTL is
available since glibc 2.3.2, and requires features that are
present in the Linux 2.6 kernel.
Both of these are so-called 1:1 implementations, meaning that each
thread maps to a kernel scheduling entity. Both threading
implementations employ the Linux clone(2) system call. In NPTL,
thread synchronization primitives (mutexes, thread joining, and so
on) are implemented using the Linux futex(2) system call.
And from man futex(7):
In its bare form, a futex is an aligned integer which is touched only
by atomic assembler instructions. Processes can share this integer
using mmap(2), via shared memory segments or because they share
memory space, in which case the application is commonly called
multithreaded.
An additional remark found here:
(In case you’re wondering how they work in shared memory: Futexes are keyed upon their physical address)
Summarizing, Linux decided to implement pthreads on top of their "native" futex primitive, which indeed lives in the user process address space. For shared synchronization primitives, this would be shared memory and the other processes will still be able to see it, after one process dies.
What happens in case of process termination? Ingo Molnar wrote an article called Robust Futexes about just that. The relevant quote:
Robust Futexes
There is one race possible though: since adding to and removing from the
list is done after the futex is acquired by glibc, there is a few
instructions window for the thread (or process) to die there, leaving
the futex hung. To protect against this possibility, userspace (glibc)
also maintains a simple per-thread 'list_op_pending' field, to allow the
kernel to clean up if the thread dies after acquiring the lock, but just
before it could have added itself to the list. Glibc sets this
list_op_pending field before it tries to acquire the futex, and clears
it after the list-add (or list-remove) has finished
Summary
Where this leaves you for other platforms, is open-ended. Suffice it to say that the Linux implementation, at least, has taken great care to meet our common-sense expectation of robustness.
Seeing that other operating systems usually resort to Kernel-based synchronization primitives in the first place, it makes sense to me to assume their implementations would be even more naturally robust.

Following the documentation from here: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_getrobust.html, it does read that in a fully POSIX compliant OS, shared mutex with the robust flag will behave in the way you'd expect.
The problem obviously is that not all OS are fully POSIX compliant. Not even those claiming to be. Process shared mutexes and in particular robust ones are among those finer points that are often not part of an OS's implementation of POSIX.

What is the downside of XInitThreads()?

I know XInitThreads() will allow me to do calls to the X server from threads other than the main thread, and that concurrent thread support in Xlib is necessary if I want to make OpenGL calls from secondary threads using Qt. I have such a need for my application, but in a very rare context. Unfortunately, XInitThreads() needs to be called at the very beginning of my application's execution and will therefore affect it whether I need it or not for a particular run (I have no way of knowing before I run the app if I do need multithreaded OpenGL support or not).
I'm pretty sure the overall behavior the the application will remain unchanged if I uselessly call XInitThread(), but programming is all about tradeoffs, and I'm pretty sure there's a reason multithread support is not the default behavior for Xlib.
The man page says that it is recommended that single-threaded programs not call this function, but it doesn't say why. What's the tradeoff when calling XInitThreads()?

It creates a global lock and also one on each Display, so each call into Xlib will do a lock/unlock. If you instead do your own locking (keep your Xlib usage in a single thread at a time with your own locks) then you could in theory have less locking overhead by taking your lock, doing a lot of Xlib stuff at once, and then dropping your lock. Or in practice most toolkits use a model where they don't lock at all but just require apps to use X only in the main UI thread.
Since many Xlib calls are blocking (they wait for a reply from the server) lock contention could be an issue.
There are also some semantic pains with locking per-Xlib-call on Display; between each Xlib call on thread A, in theory any other Xlib call could have been made on thread B. So threads can stomp on each other in terms of confusing the display state, even though only one of them makes an Xlib request at a time. That is, XInitThreads() is preventing crashes/corruption due to concurrent Display access, but it isn't really taking care of any of the semantic considerations of having multiple threads sharing an X server connection.
I think the need to make your own semantic sense out of concurrent display access is one reason that people don't bother with XInitThreads per-Xlib-call locking. Since they end up with application-level or toolkit-level locks anyway.
One approach here is to open a separate Display connection in each thread, which could make sense depending on what you are doing.
Another approach is to use the newer xcb API rather than Xlib; xcb is designed from scratch to be multithreaded and nonblocking.
Historically there have also been some bugs in XInitThreads() in some OS/Xlib versions, though I don't remember any specifics, I do remember seeing those go by.

Replacing system calls (syscalls) in Linux 2.6+

I'm looking into writing a userland threading library, since there seems to be no active work in this area, and I believe the C++0x promises and futures may give this model some power. Unfortunately, in order to make this model work, it is essential to ensure a context switch on blocking calls. As such, I would like to intercept every syscall in order to replace it with an asynchronous version. There are some caveats:
I know there are asynchronous syscalls for just about every regular syscall, but for backwards compatibility reasons this is not a viable solution.
I know that in Linux 2.4 or earlier it was possible to directly change the sys_call_table, but this has vanished.
As I would like my library to be statically linked if desired, the LD_PRELOAD trick isn't viable.
Similarly, kernel modules are not an option because this is supposed to be a userland library.
Finally, ptrace() is also not an option for similar reasons. I can't have my library forking a new process just in order to be used.
Is this possible?

I'm looking into writing a userland threading library, since there seems to be no active work in this area
You might want to take a look at the thread libraries Marcel (and its publications) and MPC, which implement hybrid (kernel and user-level) threads, mainly in the purpose of High-Performance Computing, so they had to find some solution for this blocking system calls.
So as to avoid the blocking of kernel threads when the application
makes blocking system calls, Marcel uses Scheduler Activations when
they are available, or just intercepts such blocking calls at dynamic
symbols level.

Futex based locking mechanism

Somebody can tell me an example of using locking mechanism based on futex? (for muticore x86 CPU, CentOS)

Pthreads' mutexes are implemented using futexes on recent versions of Linux. Pthreads is the standard C threading API on Linux, and is part of the Posix standard, so you can easily port your program to other Unix-like systems. You should avoid using futexes directly unless you have very unusual needs, because they're very hard to use correctly - use pthreads, or a higher-level, language-specific API (which will almost certainly use pthreads itself).

Have a look at https://github.com/avsm/ipc-bench. They use futex in shared memory pipe implementation.
Specifically, you can check this code.

working example: pthreads mutex use futex locks.
code example: These were made within months of this post in '10 but are still up-to-date.
http://meta-meta.blogspot.com/2010/11/linux-threading-primitives-futex.html
https://github.com/lcapaldo/futexexamples
use case example: IPC and inter-process synchronization are the only example of why one should use a futex in userspace. pthread mutexes will work for multi-thread except for extreme cases, but multi-process is lacking in high performance locking mechanisms as well as lock types.

What interprocess locking calls should I monitor?

I'm monitoring a process with strace/ltrace in the hope to find and intercept a call that checks, and potentially activates some kind of globally shared lock.
While I've dealt with and read about several forms of interprocess locking on Linux before, I'm drawing a blank on what to calls to look for.
Currently my only suspect is futex() which comes up very early on in the process' execution.
Update0
There is some confusion about what I'm after. I'm monitoring an existing process for calls to persistent interprocess memory or equivalent. I'd like to know what system and library calls to look for. I have no intention call these myself, so naturally futex() will come up, I'm sure many libraries will implement their locking calls in terms of this, etc.
Update1
I'd like a list of function names or a link to documentation, that I should monitor at the ltrace and strace levels (and specifying which). Any other good advice about how to track and locate the global lock in mind would be great.

If you can start monitored process in valgrind, then there are two projects:
http://code.google.com/p/data-race-test/wiki/ThreadSanitizer
and Helgrind
http://valgrind.org/docs/manual/hg-manual.html
Helgrind is aware of all the pthread
abstractions and tracks their effects
as accurately as it can. On x86 and
amd64 platforms, it understands and
partially handles implicit locking
arising from the use of the LOCK
instruction prefix.
So, this tools can detect even atomic memory accesses. And they will check pthread usage

flock is another good one

There are many system calls can be used for locking: flock, fcntl, and even create.
When you are using pthreads/sem_* locks they may be executed in user space so you'll never
see them in strace as futex is called only for pending operations. Like when you actually
need to wait.
Some operations can be done in user space only - like spinlocks - you'll never see them
unless they do some waits for timer - backoff so you may see only stuff like nanosleep when one lock waits for other.
So there is no "generic" way to trace them.

on systems with glibc ~ >= 2.5 (glibc + nptl) you can use process shared
semaphores (last parameter to sem_init), more precisely, posix unnamed semaphores
posix mutexes (with PTHREAD_PROCESS_SHARED to pthread_mutexattr_setpshared)
posix named semaphores (got from sem_open/sem_unlink)
system v (sysv) semaphores: semget, semop
On older systems with glibc 2.2, 2.3 with linuxthreads or on embedded systems with uClibc you can use ONLY system v (sysv) semaphores for iterprocess communication.
upd1: any IPC and socker must be checked.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string