Threads are added to C++11 language
Then I am wondering what is the difference, advantages and impact?
If this code is by c++03
#include <iostream>
#include <pthread.h>
void *call_from_thread(void *)
{
std::cout << "Launched by thread" << std::endl;
return NULL;
}
int main()
{
pthread_t t;
pthread_create(&t, NULL, call_from_thread, NULL);
pthread_join(t, NULL);
return 0;
}
and this one by c++11
#include <iostream>
#include <thread>
void call_from_thread()
{
std::cout << "Hello, World" << std::endl;
}
int main()
{
std::thread t1(call_from_thread);
t1.join();
return 0;
}
Then, I see no fundamental advantage.
Also, when it is said a part of language, I am confused about that as I see no new keyword or no new syntax. I just see a new standard library. Is it beyond that? and is this just a paraphrase of pthread?
Besides being much more portable, C++11 threads also provides other benefits:
allows passing arguments (and more than one) to the thread handler in a type safe way. pthread_create passes a single void*, whereas with std::thread you get compile time errors if something is wrong instead of runtime errors
the thread handler can be a lambda
a std::thread is an object, not a pointer, which makes managing object lifetimes easier, and reduces risk of dangling pointers, especially if combined with std::unique_ptr or std::shared_ptr if pointer juggling is even needed.
Those are the immediate benefits that come to mind.
As for standard library vs language spec: they are both part of the same standard, so they are both considered "C++11". Note that std::thread can not be implemented in C++03, since move semantics is new in C++11 and std::thread implements move.
The primary advantage of C++ thread library is portability. Like many other C++ standard library facilities platform-dependent libraries like pthreads or Win32API provide more control over your threads comparing to C++ thread library. For instance on Windows Win32 API thread library allows you to set thread stack size you can't do with C++ thread library without using platform dependent code. API functions like TerminateThread allows developers to terminate their running threads (a very dangerous operation) or setting thread priority using function SetThreadPriority.
But, using C++ thread library makes your code platform independent. And it's not just about class thread. There are other facilities like mutexes, conditional variables, locks that have been standardized so every C++ implementation is supposed to implement them to comply with C++ standard.
So, using C++ thread library is always a trade-off. You are losing some degree of control over threads but your code is being portable. And if you really need some low level feature, you can use std::thread::native_handle that allows you to mix standard and platform dependent code.
This link std::thread::native_handle provides a nice example how to mix class thread and pthread library.
It sounds funny in 2019, but one small disadvantage of std::thread (since you asked about it) is ~130kb of code added to your binary.
Related
#include <iostream>
#include <thread>
#include <chrono>
#include <mutex>
std::mutex mtx;
int i = 0;
void func()
{
std::lock_guard<std::mutex> lock(mtx);
std::cout<<++i<<std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(8000));
}
int main() {
std::thread t1(func);
std::thread t2(func);
std::thread t3(func);
std::thread t4(func);
t1.join();
t2.join();
t3.join();
t4.join();
return 0;
}
Here is my c++ code. As you can see, only one thread has the chance to execute at any time because of the mutex. In other words, most of threads are blocked by the mutex.
My question is if there is some tool or some technique to detect how many threads are blocked by mutex in an executable file without reading the source code?
If it’s for debugging, use a debugger. On Windows, MSVC compiles std::mutex into either critical sections or SRW locks (depends on environment). Break execution of the app, and you’ll see how many threads are sleeping in EnterCriticalSection or AcquireSRWLockExclusive WinAPI.
If you want that info because you want to do something different in runtime based on that, I’m afraid std::mutex can’t do that. You should probably use some other synchronization primitives. I don’t know what it is you’re doing, but when I write complex multithreading stuff I often use conditional variables, and/or atomic operators i.e. Interlocked* on Windows, std::atomic in modern C++.
I'm programming in Windows on c++ (Visual Studio)
I can create mutex using either std::mutex or CreateMutex.
What is the difference between them? Which one I should prefer and which one is faster? Do they have any specifics in usage or implimintation? or maybe std::mutex is just a shell and uses CreateMutex inside?
Besides the fact that std::mutex is cross platform and CreateMutex is not another difference is that the WinAPI mutexes (created through CreateMutex) can be used for synchronization between different processes, while std::mutex can not be used for that. In that sense std::mutex is more equal to the WinAPI Critical Section.
However there are also other things to consider, e.g. if you need to interoperate with std::condition_variable's or with WinAPI events (e.g. in order to use WaitForMultipleObjects).
std::mutex is provided by the C++ standard library and yes, most probably it will call CreateMutex under the hood.
Since CreateMutex is not portable, std::mutex is generally preferred.
A difference in behavior is that CreateMutex under Windows creates a recursive mutex, while std::mutex is a non-recursive one. You'd have to use std::recursive_mutex that was also added in C++11.
I was studying the raw_spinlock struct, which is in /usr/src/linux/include/linux/spinlock_types.h:
typedef struct raw_spinlock {
arch_spinlock_t raw_lock;
#ifdef CONFIG_GENERIC_LOCKBREAK
unsigned int break_lock;
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
unsigned int magic, owner_cpu;
void *owner;
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map dep_map;
#endif
} raw_spinlock_t;
I think raw_lock is for a lock which is dependent on an architecture and dep_map is a kind of data structure to avoid deadlocks, but what do break_lock, magic, owner_cpu, and *owner mean?
spinlock
spinlock is public API for spinlocks in kernel code.
See Documentation/locking/spinlocks.txt.
raw_spinlock
raw_spinlock is actual implementation of normal spinlocks. On not-RT kernels, spinlock is just a wrapper for raw_spinlock. On RT kernels, spinlock doesn't always use raw_spinlock.
See this article on LWN.
arch_spinlock
arch_spinlock is platform-specific part of spinlock implementation. raw_spinlock is generally platform-independent and delegates low-level operations to arch_spinlock.
lockdep_map
lockdep_map is a dependency map for locking correctness validator.
See Documentation/locking/lockdep-design.txt.
break_lock
On SMP kernels, when spin_lock() on one CPU starts looping while the lock is held on another CPU, it sets this flag to 1. Another CPU that holds the lock can periodically check this flag using spin_is_contended() and then call spin_unlock().
This allows to archive two goals at the same time:
avoid frequent locking/unlocking;
avoid holding lock for a long time, preventing others to acquire the lock.
See also this article.
magic, owner, owner_cpu
These fields are enabled when CONFIG_SPINLOCK_DEBUG is set and help to detect common bugs:
magic is set to some randomly choosen constant when spinlock is created (SPINLOCK_MAGIC which is 0xdead4ead)
owner is set to current process in spin_lock();
owner_cpu is set to current CPU id in spin_lock().
spin_unlock() checks that it is called when current process and CPU are the same as they were when spin_lock() was called.
spin_lock() checks that magic is equal to SPINLOCK_MAGIC to ensure that caller passed a pointer to correctly initialized spinlock and (hopefully) no memory corruption occurred.
See kernel/locking/spinlock_debug.c.
I'm trying to implement my own read/write lock using atomic types. I can easily define exclusive locks, but I fail to create locks for shared reader threads, like SRWLock does (see SRWLock). My question is how to implement locks that can be used in exclusive mode (one reader/writer threads at a time) or in shared mode (multiple reader threads at a time).
I can't use std::mutex lock because it doesn't support multiple readers. Also I don't use boost, so no shared_mutex either.
The shared timed mutex
There is no equivalent for that kind of read-write locking in the C++11 standard library. The good news is that there is one in C++14 and it's called shared_timed_mutex.
Take a look here:
http://en.cppreference.com/w/cpp/thread/shared_timed_mutex
Compiler support
GCC's recent versions support shared_timed_mutex according to its documentation if you use the -std=c++14 compiler flag. The bad news is that Visual C++ doesn't support it yet, or at least I haven't been able to find any concrete info about it, the closest thing I got is this feature table which says that Shared Locking in C++ is missing.
Possible alternatives
You can implement this kind of thing using a mutex and a semaphore as described in this tutorial if you use a library that has these primitives.
If you prefer to stay with the standard library, you can implement the stuff with an std::mutex and an std::condition_variable similarly to how it's done here or here.
There is also shared_mutex in boost (as you already noted), or uv_rwlock_t in libuv, or pthread_rwlock in unix-like OSes.
Is there any mechanism through which I can wake up a thread in another process without going through the kernel? The waiting thread might spin in a loop, no problem (each thread is pegged to a separate core), but in my case the sending thread has to be quick, and can't afford to go through the kernel to wake up the waiting thread.
No, if the other thread is sleeping (not on CPU). To wake up such thread you need to change its state into "RUNNING" by calling scheduler which is part of the kernel.
Yes, you can syncronize two threads or processes if both are running on different CPUs, and if there is shared memory between them. You should bind all threads to different CPUs. Then you may use spinlock:pthread_spin_lock and pthread_spin_unlock functions from optional part of POSIX's Pthread ('(ADVANCED REALTIME THREADS)'; [THR SPI]); or any of custom spinlock. Custom spinlock most likely will use some atomic operations and/or memory barriers.
Sending thread will change the value in memory, which is checked in loop by receiver thread.
E.g.
init:
pthread_spinlock_t lock;
pthread_spin_lock(&lock); // close the "mutex"
then start threads.
waiting thread:
{
pthread_spin_lock(&lock); // wait for event;
work();
}
main thread:
{
do_smth();
pthread_spin_unlock(&lock); // open the mutex; other thread will see this change
// in ~150 CPU ticks (checked on Pentium4 and Intel Core2 single socket systems);
// time of the operation itself is of the same order; didn't measure it.
continue_work();
}
To signal to another process that it should continue, without forcing the sender to spend time in a kernel call, one mechanism comes to mind right away. Without kernel calls, all a process can do is modify memory; so the solution is inter-process shared memory. Once the sender writes to shared memory, the receiver should see the change without any explicit kernel calls, and naive polling by the receiver should work fine.
One cheap (but maybe not cheap enough) alternative is delegating the sending to a helper thread in the same process, and have the helper thread make a proper inter-process "semaphore release" or pipe write call.
I understand that you want to avoid using the kernel in order to avoid kernel-related overheads. Most of such overheads are context-switch related. Here is a demonstration of one way to accomplish what you need using signals without spinning, and without context switches:
#include <signal.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <pthread.h>
#include <iostream>
#include <thread>
using namespace std;
void sigRtHandler(int sig) {
cout << "Recevied signal" << endl;
}
int main() {
constexpr static int kIter = 100000;
thread t([]() {
signal(SIGRTMIN, sigRtHandler);
for (int i = 0; i < kIter; ++i) {
usleep(1000);
}
cout << "Done" << endl;
});
usleep(1000); // Give child time to setup signal handler.
auto handle = t.native_handle();
for (int i = 0; i < kIter; ++i)
pthread_kill(handle, SIGRTMIN);
t.join();
return 0;
}
If you run this code, you'll see that the child thread keeps receiving the SIGRTMIN. While the process is running, if you look in the files /proc/(PID)/task/*/status for this process, you'll see that parent thread does not incur context switches from calling pthread_kill().
The advantage of this approach is that the waiting thread doesn't need to spin. If the waiting thread's job is not time-sensitive, this approach allows you to save CPU.