How to detect if a thread is blocked because of mutex - multithreading

#include <iostream>
#include <thread>
#include <chrono>
#include <mutex>
std::mutex mtx;
int i = 0;
void func()
{
std::lock_guard<std::mutex> lock(mtx);
std::cout<<++i<<std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(8000));
}
int main() {
std::thread t1(func);
std::thread t2(func);
std::thread t3(func);
std::thread t4(func);
t1.join();
t2.join();
t3.join();
t4.join();
return 0;
}
Here is my c++ code. As you can see, only one thread has the chance to execute at any time because of the mutex. In other words, most of threads are blocked by the mutex.
My question is if there is some tool or some technique to detect how many threads are blocked by mutex in an executable file without reading the source code?

If it’s for debugging, use a debugger. On Windows, MSVC compiles std::mutex into either critical sections or SRW locks (depends on environment). Break execution of the app, and you’ll see how many threads are sleeping in EnterCriticalSection or AcquireSRWLockExclusive WinAPI.
If you want that info because you want to do something different in runtime based on that, I’m afraid std::mutex can’t do that. You should probably use some other synchronization primitives. I don’t know what it is you’re doing, but when I write complex multithreading stuff I often use conditional variables, and/or atomic operators i.e. Interlocked* on Windows, std::atomic in modern C++.

Related

The advantage of c++11 threads

Threads are added to C++11 language
Then I am wondering what is the difference, advantages and impact?
If this code is by c++03
#include <iostream>
#include <pthread.h>
void *call_from_thread(void *)
{
std::cout << "Launched by thread" << std::endl;
return NULL;
}
int main()
{
pthread_t t;
pthread_create(&t, NULL, call_from_thread, NULL);
pthread_join(t, NULL);
return 0;
}
and this one by c++11
#include <iostream>
#include <thread>
void call_from_thread()
{
std::cout << "Hello, World" << std::endl;
}
int main()
{
std::thread t1(call_from_thread);
t1.join();
return 0;
}
Then, I see no fundamental advantage.
Also, when it is said a part of language, I am confused about that as I see no new keyword or no new syntax. I just see a new standard library. Is it beyond that? and is this just a paraphrase of pthread?
Besides being much more portable, C++11 threads also provides other benefits:
allows passing arguments (and more than one) to the thread handler in a type safe way. pthread_create passes a single void*, whereas with std::thread you get compile time errors if something is wrong instead of runtime errors
the thread handler can be a lambda
a std::thread is an object, not a pointer, which makes managing object lifetimes easier, and reduces risk of dangling pointers, especially if combined with std::unique_ptr or std::shared_ptr if pointer juggling is even needed.
Those are the immediate benefits that come to mind.
As for standard library vs language spec: they are both part of the same standard, so they are both considered "C++11". Note that std::thread can not be implemented in C++03, since move semantics is new in C++11 and std::thread implements move.
The primary advantage of C++ thread library is portability. Like many other C++ standard library facilities platform-dependent libraries like pthreads or Win32API provide more control over your threads comparing to C++ thread library. For instance on Windows Win32 API thread library allows you to set thread stack size you can't do with C++ thread library without using platform dependent code. API functions like TerminateThread allows developers to terminate their running threads (a very dangerous operation) or setting thread priority using function SetThreadPriority.
But, using C++ thread library makes your code platform independent. And it's not just about class thread. There are other facilities like mutexes, conditional variables, locks that have been standardized so every C++ implementation is supposed to implement them to comply with C++ standard.
So, using C++ thread library is always a trade-off. You are losing some degree of control over threads but your code is being portable. And if you really need some low level feature, you can use std::thread::native_handle that allows you to mix standard and platform dependent code.
This link std::thread::native_handle provides a nice example how to mix class thread and pthread library.
It sounds funny in 2019, but one small disadvantage of std::thread (since you asked about it) is ~130kb of code added to your binary.

What difference between C++ std::mutex and windows CreateMutex

I'm programming in Windows on c++ (Visual Studio)
I can create mutex using either std::mutex or CreateMutex.
What is the difference between them? Which one I should prefer and which one is faster? Do they have any specifics in usage or implimintation? or maybe std::mutex is just a shell and uses CreateMutex inside?
Besides the fact that std::mutex is cross platform and CreateMutex is not another difference is that the WinAPI mutexes (created through CreateMutex) can be used for synchronization between different processes, while std::mutex can not be used for that. In that sense std::mutex is more equal to the WinAPI Critical Section.
However there are also other things to consider, e.g. if you need to interoperate with std::condition_variable's or with WinAPI events (e.g. in order to use WaitForMultipleObjects).
std::mutex is provided by the C++ standard library and yes, most probably it will call CreateMutex under the hood.
Since CreateMutex is not portable, std::mutex is generally preferred.
A difference in behavior is that CreateMutex under Windows creates a recursive mutex, while std::mutex is a non-recursive one. You'd have to use std::recursive_mutex that was also added in C++11.

How does pthread implemented in linux kernel 3.2?

all,
The code below comes from "Advanced Programing in Unix Environment", it creates a new thread, and prints the process id and thread id for main and new threads.
In the book, it said that in linux, the output of this code would show that two threads have different
process ids, because pthread uses lightweight process to emulate thread. But when I ran this code in Ubuntu 12.04, it has kernel 3.2, printed the same pid.
so, does the new linux kernel change the internal implementation of pthread?
#include "apue.h"
#include <pthread.h>
pthread_t ntid;
void printids(const char *s) {
pid_t pid;
pthread_t tid;
pid = getpid();
tid = pthread_self();
printf("%s pid %u tid %u (0x%x)\n",
s, (unsigned int)pid, (unsigned int)tid, (unsigned int)tid);
}
void *thread_fn(void* arg) {
printids("new thread: ");
return (void *)0;
}
int main(void) {
int err;
err = pthread_create(&ntid, NULL, thread_fn, NULL);
if (err != 0)
err_quit("can't create thread: %s\n", strerror(err));
printids("main thread: ");
sleep(1);
return 0;
}
On Linux pthread uses the clone syscall with a special flag CLONE_THREAD.
See the documentation of clone syscall.
CLONE_THREAD (since Linux 2.4.0-test8)
If CLONE_THREAD is set, the child is placed in the same thread group as the calling process. To make the remainder of the discussion of CLONE_THREAD more readable, the term "thread" is used to refer to the processes within a thread group.
Thread groups were a feature added in Linux 2.4 to support the POSIX threads notion of a set of threads that share a single PID. Internally, this shared PID is the so-called thread group identifier (TGID) for the thread group. Since Linux 2.4, calls to getpid(2) return the TGID of the caller.
And in fact, Linux do change its thread implementation, since POSIX.1 requires threads share a same process ID.
In the obsolete LinuxThreads implementation, each of the threads in a process
has a different process ID. This is in violation of the POSIX threads
specification, and is the source of many other nonconformances to the
standard; see pthreads(7).
Linux typically uses two implementations of pthreads: LinuxThreads and Native POSIX Thread Library(NPTL), although the former is largely obsolete. Kernel from 2.6 provides NPTL, which provides much closer conformance to SUSv3, and perform better especially when there are many threads.
You can query the specific implementation of pthreads under shell using command:
getconf GNU_LIBPTHREAD_VERSION
You can also get a more detailed implementation difference in The Linux Programming Interface.

Wake up a thread without kernel support

Is there any mechanism through which I can wake up a thread in another process without going through the kernel? The waiting thread might spin in a loop, no problem (each thread is pegged to a separate core), but in my case the sending thread has to be quick, and can't afford to go through the kernel to wake up the waiting thread.
No, if the other thread is sleeping (not on CPU). To wake up such thread you need to change its state into "RUNNING" by calling scheduler which is part of the kernel.
Yes, you can syncronize two threads or processes if both are running on different CPUs, and if there is shared memory between them. You should bind all threads to different CPUs. Then you may use spinlock:pthread_spin_lock and pthread_spin_unlock functions from optional part of POSIX's Pthread ('(ADVANCED REALTIME THREADS)'; [THR SPI]); or any of custom spinlock. Custom spinlock most likely will use some atomic operations and/or memory barriers.
Sending thread will change the value in memory, which is checked in loop by receiver thread.
E.g.
init:
pthread_spinlock_t lock;
pthread_spin_lock(&lock); // close the "mutex"
then start threads.
waiting thread:
{
pthread_spin_lock(&lock); // wait for event;
work();
}
main thread:
{
do_smth();
pthread_spin_unlock(&lock); // open the mutex; other thread will see this change
// in ~150 CPU ticks (checked on Pentium4 and Intel Core2 single socket systems);
// time of the operation itself is of the same order; didn't measure it.
continue_work();
}
To signal to another process that it should continue, without forcing the sender to spend time in a kernel call, one mechanism comes to mind right away. Without kernel calls, all a process can do is modify memory; so the solution is inter-process shared memory. Once the sender writes to shared memory, the receiver should see the change without any explicit kernel calls, and naive polling by the receiver should work fine.
One cheap (but maybe not cheap enough) alternative is delegating the sending to a helper thread in the same process, and have the helper thread make a proper inter-process "semaphore release" or pipe write call.
I understand that you want to avoid using the kernel in order to avoid kernel-related overheads. Most of such overheads are context-switch related. Here is a demonstration of one way to accomplish what you need using signals without spinning, and without context switches:
#include <signal.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <pthread.h>
#include <iostream>
#include <thread>
using namespace std;
void sigRtHandler(int sig) {
cout << "Recevied signal" << endl;
}
int main() {
constexpr static int kIter = 100000;
thread t([]() {
signal(SIGRTMIN, sigRtHandler);
for (int i = 0; i < kIter; ++i) {
usleep(1000);
}
cout << "Done" << endl;
});
usleep(1000); // Give child time to setup signal handler.
auto handle = t.native_handle();
for (int i = 0; i < kIter; ++i)
pthread_kill(handle, SIGRTMIN);
t.join();
return 0;
}
If you run this code, you'll see that the child thread keeps receiving the SIGRTMIN. While the process is running, if you look in the files /proc/(PID)/task/*/status for this process, you'll see that parent thread does not incur context switches from calling pthread_kill().
The advantage of this approach is that the waiting thread doesn't need to spin. If the waiting thread's job is not time-sensitive, this approach allows you to save CPU.

What is the Re-entrant lock and concept in general?

I always get confused. Would someone explain what Reentrant means in different contexts? And why would you want to use reentrant vs. non-reentrant?
Say pthread (posix) locking primitives, are they re-entrant or not? What pitfalls should be avoided when using them?
Is mutex re-entrant?
Re-entrant locking
A reentrant lock is one where a process can claim the lock multiple times without blocking on itself. It's useful in situations where it's not easy to keep track of whether you've already grabbed a lock. If a lock is non re-entrant you could grab the lock, then block when you go to grab it again, effectively deadlocking your own process.
Reentrancy in general is a property of code where it has no central mutable state that could be corrupted if the code was called while it is executing. Such a call could be made by another thread, or it could be made recursively by an execution path originating from within the code itself.
If the code relies on shared state that could be updated in the middle of its execution it is not re-entrant, at least not if that update could break it.
A use case for re-entrant locking
A (somewhat generic and contrived) example of an application for a re-entrant lock might be:
You have some computation involving an algorithm that traverses a graph (perhaps with cycles in it). A traversal may visit the same node more than once due to the cycles or due to multiple paths to the same node.
The data structure is subject to concurrent access and could be updated for some reason, perhaps by another thread. You need to be able to lock individual nodes to deal with potential data corruption due to race conditions. For some reason (perhaps performance) you don't want to globally lock the whole data structure.
Your computation can't retain complete information on what nodes you've visited, or you're using a data structure that doesn't allow 'have I been here before' questions to be answered quickly. An example of this situation would be a simple implementation of Dijkstra's algorithm with a priority queue implemented as a binary heap or a breadth-first search using a simple linked list as a queue. In these cases, scanning the queue for existing insertions is O(N) and you may not want to do it on every iteration.
In this situation, keeping track of what locks you've already acquired is expensive. Assuming you want to do the locking at the node level a re-entrant locking mechanism alleviates the need to tell whether you've visited a node before. You can just blindly lock the node, perhaps unlocking it after you pop it off the queue.
Re-entrant mutexes
A simple mutex is not re-entrant as only one thread can be in the critical section at a given time. If you grab the mutex and then try to grab it again a simple mutex doesn't have enough information to tell who was holding it previously. To do this recursively you need a mechanism where each thread had a token so you could tell who had grabbed the mutex. This makes the mutex mechanism somewhat more expensive so you may not want to do it in all situations.
IIRC the POSIX threads API does offer the option of re-entrant and non re-entrant mutexes.
A re-entrant lock lets you write a method M that puts a lock on resource A and then call M recursively or from code that already holds a lock on A.
With a non re-entrant lock, you would need 2 versions of M, one that locks and one that doesn't, and additional logic to call the right one.
Reentrant lock is very well described in this tutorial.
The example in the tutorial is far less contrived than in the answer about traversing a graph. A reentrant lock is useful in very simple cases.
The what and why of recursive mutex should not be such a complicated thing described in the accepted answer.
I would like to write down my understanding after some digging around the net.
First, you should realize that when talking about mutex, multi thread concepts is definitely involved too. (mutex is used for synchronization. I don't need mutex if I only have 1 thread in my program)
Secondly, you should know the difference bewteen a normal mutex and a recursive mutex.
Quoted from APUE:
(A recursive mutex is a) A mutex type that allows the same thread to lock
it multiple times without first unlocking it.
The key difference is that within the same thread, relock a recursive lock does not lead to deadlock, neither block the thread.
Does this mean that recusive lock never causes deadlock?
No, it can still cause deadlock as normal mutex if you have locked it in one thread without unlocking it, and try to lock it in other threads.
Let's see some code as proof.
normal mutex with deadlock
#include <pthread.h>
#include <stdio.h>
pthread_mutex_t lock;
void * func1(void *arg){
printf("thread1\n");
pthread_mutex_lock(&lock);
printf("thread1 hey hey\n");
}
void * func2(void *arg){
printf("thread2\n");
pthread_mutex_lock(&lock);
printf("thread2 hey hey\n");
}
int main(){
pthread_mutexattr_t lock_attr;
int error;
// error = pthread_mutexattr_settype(&lock_attr, PTHREAD_MUTEX_RECURSIVE);
error = pthread_mutexattr_settype(&lock_attr, PTHREAD_MUTEX_DEFAULT);
if(error){
perror(NULL);
}
pthread_mutex_init(&lock, &lock_attr);
pthread_t t1, t2;
pthread_create(&t1, NULL, func1, NULL);
pthread_create(&t2, NULL, func2, NULL);
pthread_join(t2, NULL);
}
output:
thread1
thread1 hey hey
thread2
common deadlock example, no problem.
recursive mutex with deadlock
Just uncomment this line
error = pthread_mutexattr_settype(&lock_attr, PTHREAD_MUTEX_RECURSIVE);
and comment out the other one.
output:
thread1
thread1 hey hey
thread2
Yes, recursive mutex can also cause deadlock.
normal mutex, relock in the same thread
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
pthread_mutex_t lock;
void func3(){
printf("func3\n");
pthread_mutex_lock(&lock);
printf("func3 hey hey\n");
}
void * func1(void *arg){
printf("thread1\n");
pthread_mutex_lock(&lock);
func3();
printf("thread1 hey hey\n");
}
void * func2(void *arg){
printf("thread2\n");
pthread_mutex_lock(&lock);
printf("thread2 hey hey\n");
}
int main(){
pthread_mutexattr_t lock_attr;
int error;
// error = pthread_mutexattr_settype(&lock_attr, PTHREAD_MUTEX_RECURSIVE);
error = pthread_mutexattr_settype(&lock_attr, PTHREAD_MUTEX_DEFAULT);
if(error){
perror(NULL);
}
pthread_mutex_init(&lock, &lock_attr);
pthread_t t1, t2;
pthread_create(&t1, NULL, func1, NULL);
sleep(2);
pthread_create(&t2, NULL, func2, NULL);
pthread_join(t2, NULL);
}
output:
thread1
func3
thread2
Deadlock in thread t1, in func3.
(I use sleep(2) to make it easier to see that the deadlock is firstly caused by relocking in func3)
recursive mutex, relock in the same thread
Again, uncomment the recursive mutex line and comment out the other line.
output:
thread1
func3
func3 hey hey
thread1 hey hey
thread2
Deadlock in thread t2, in func2. See? func3 finishes and exits, relocking does not block the thread or lead to deadlock.
So, last question, why do we need it ?
For recursive function (called in multi-threaded programs and you want to protect some resource/data).
E.g. You have a multi thread program, and call a recursive function in thread A. You have some data that you want to protect in that recursive function, so you use the mutex mechanism. The execution of that function is sequential in thread A, so you would definitely relock the mutex in recursion. Use normal mutex causes deadlocks. And resursive mutex is invented to solve this.
See an example from the accepted answer
When to use recursive mutex?.
The Wikipedia explains the recursive mutex very well. Definitely worth for a read. Wikipedia: Reentrant_mutex

Resources