Double-Checked Locking Pattern in C++11? - multithreading

The new machine model of C++11 allows for multi-processor systems to work reliably, wrt. to reorganization of instructions.
As Meyers and Alexandrescu pointed out the "simple" Double-Checked Locking Pattern implementation is not safe in C++03
Singleton* Singleton::instance() {
if (pInstance == 0) { // 1st test
Lock lock;
if (pInstance == 0) { // 2nd test
pInstance = new Singleton;
}
}
return pInstance;
}
They showed in their article that no matter what you do as a programmer, in C++03 the compiler has too much freedom: It is allowed to reorder the instructions in a way that you can not be sure that you end up with only one instance of Singleton.
My question is now:
Do the restrictions/definitions of the new C++11 machine model now constrain the sequence of instructions, that the above code would always work with a C++11 compiler?
How does a safe C++11-Implementation of this Singleton pattern now looks like, when using the new library facilities (instead of the mock Lock here)?

If pInstance is a regular pointer, the code has a potential data race -- operations on pointers (or any builtin type, for that matter) are not guaranteed to be atomic (EDIT: or well-ordered)
If pInstance is an std::atomic<Singleton*> and Lock internally uses an std::mutex to achieve synchronization (for example, if Lock is actually std::lock_guard<std::mutex>), the code should be data race free.
Note that you need both explicit locking and an atomic pInstance to achieve proper synchronization.

Since static variable initialization is now guaranteed to be threadsafe, the Meyer's singleton should be threadsafe.
Singleton* Singleton::instance() {
static Singleton _instance;
return &_instance;
}
Now you need to address the main problem: there is a Singleton in your code.
EDIT: based on my comment below: This implementation has a major drawback when compared to the others. What happens if the compiler doesn't support this feature? The compiler will spit out thread unsafe code without even issuing a warning. The other solutions with locks will not even compile if the compiler doesn't support the new interfaces. This might be a good reason not to rely on this feature, even for things other than singletons.

C++11 doesn't change the meaning of that implementation of double-checked locking. If you want to make double-checked locking work you need to erect suitable memory barriers/fences.

Related

C++11 thread safe singleton using lambda and call_once: main function (g++, clang++, Ubuntu 14.04)

All!
I am new to C++11 and many of its features.
I am looking for a C++11 (non boost) implementation of a thread safe singleton, using lambda and call_once (Sorry... I have no rights to include the call_once tag in the post).
I have investigated quite a lot (I am using g++ (4.8, 5.x, 6.2), clang++3.8, Ubuntu 14.04, trying to avoid using boost), and I have found the following links:
http://www.nuonsoft.com/blog/2012/10/21/implementing-a-thread-safe-singleton-with-c11/comment-page-1/
http://silviuardelean.ro/2012/06/05/few-singleton-approaches/ (which seems to be very similar to the previous one, but it is more complete, and provides at the end its own implementation).
But: I am facing these problems with the mentioned implementations: Or I am writing a wrong implementation of main function (probable), or there are mistakes in the posted codes (less probable), but I am receiving different compiling / linking errors (or both things at the same time, of course...).
Similar happens with following code, which seems to compile according to comments (but this one does not use lambda, neither call_once):
How to ensure std::call_once really is only called once (In this case, it compiles fine, but throws the following error in runtime):
terminate called after throwing an instance of 'std::system_error'
what(): Unknown error -1
Aborted (core dumped)
So, could you help me, please, with the correct way to call the getInstance() in the main function, to get one (and only one object) and then, how to call other functions that I might include in the Singleton? (Something like: Singleton::getInstance()->myFx(x, y, z);?
(Note: I have also found several references in StackOverflow, which are resolved as "thread safe", but there are similar implementations in other StackOverflow posts and other Internet places which are not considered "thread safe"; here are a few example of both (these do not use lambda) ):
Thread-safe singleton in C++11
c++ singleton implementation STL thread safe
Thread safe singleton in C++
Thread safe singleton implementation in C++
Thread safe lazy construction of a singleton in C++
Finally, I will appreciate very much if you can suggest to me the best books to study about these subjects. Thanks in advance!!
I just ran across this issue. In my case, I needed to add -lpthread to my compilation options.
Implementing a singleton with a static variable as e. g. suggested by Thread safe singleton implementation in C++ is thread safe with C++11. With C++11 the initialization of static variables is defined to happen on
exactly one thread, and no other threads will proceed until that initialization is complete. (I can also backup that with problems we recently had on an embedded platform when we used call_once to implement a singleton and it worked after we returned to the "classic" singleton implementation with the static variable.)
ISO/IEC 14882:2011 defines in §3.6.2 e. g. that
Static initialization shall be performed before any dynamic initialization takes place.
and as part of §6.7:
The zero-initialization (8.5) of all block-scope variables with static
storage duration (3.7.1) or thread storage duration (3.7.2) is
performed before any other initialization takes place.
(See also this answer)
A very good book I can recommend is "C++ Concurrency in Action" by A. Williams. (As part of Chapter 3 call_once and the Singleton pattern is discussed - that is why I know that the "classic Singleton" is thread safe since C++11.)

A thread_guard Equivalent To lock_guard / unique_lock

The standard library provides a mutex class, with the ability to manually lock and unlock it:
std::mutex m;
m.lock();
// ...
m.unlock();
However, the library apparently also recognizes that a common case is just to lock the mutex at some point, and unlock it when leaving a block. For this it provides std::lock_guard and std::unique_lock:
std::mutex m;
std::lock_guard<std::mutex> lock(m);
// ...
// Automatic unlock
I think a fairly common pattern for threads, is to create one (either as a stack variable, or a member), then join it before destructing it:
std::thread t(foo);
// ...
t.join();
It seems easy to write a thread_guard, which would take a thread (or a sequence of threads), and would just call join on its own destruction:
std::thread t(foo);
thread_guard<std::thread> g(t);
// ...
// Join automatically
Is there a standard-library class like it?
If not, is there some reason to avoid this?
This issue is discussed in Scott Meyer's book "Modern Effective c++"
The problem is that if there would be another default behavior (detach or join) would cause hard to find errors in case you forget that there is a implicit operation. So the actual default behavior on destruction is asserting if not explicitly joined or detached. And no "Guard" class is there also because of that reason.
If you always want to join it's safe to write such a class yourself. But when someone uses it and wants to detach people can forget that the destructor will implicitly join it. So that's the risk in writing such function.
As an alternative you can use a scope_guard by boost or the folly library (which I personally prefer more) and declare in the beginning explicitly your intention and it will be executed. Or you can write a policy based "Guard" class where you have to explicitly state what you want to do on destruction.

Difference between std::mutex lock function and std::lock_guard<std::mutex>?

Basically, the title is self-explanatory.
I use it in following way:
The code is in Objective-C++.
Objective-C classes make concurrent calls to different purpose functions.
I use std::mutex to lock and unlock std::vector<T> editing option across entire class, as C++ std containers are not thread safe.
Using lock_guard automatically unlocks the mutex again when it goes out of scope. That makes it impossible to forget to unlock it, when returning, or when an exception is thrown. You should always prefer to use lock_guard or unique_lock instead of using mutex::lock(). See http://kayari.org/cxx/antipatterns.html#locking-mutex
lock_guard is an example of an RAII or SBRM type.
The std::lock_guard is only used for two purposes:
Automate mutex unlock during destruction (no need to call .unlock()).
Allow simultaneous lock of multiple mutexes to overcome deadlock problem.
For the last use case you will need std::adopt_lock flag:
std::lock(mutex_one, mutex_two);
std::lock_guard<std::mutex> lockPurposeOne(mutex_one, std::adopt_lock);
std::lock_guard<std::mutex> lockPurposeTwo(mutex_two, std::adopt_lock);
On the other hand, you will need allocate yet another class instance for the guard every time you need to lock the mutex, as std::lock_guard has no member functions. If you need guard with unlocking functionality take a look at std::unique_lock class. You may also consider using std::shared_lock for parallel reading of your vector.
You may notice, that std::shared_lock class is commented in header files and will be only accessible with C++17. According to header file you can use std::shared_timed_mutex, but when you will try to build the app it will fail, as Apple had updated the header files, but not the libc++ itself.
So for Objective-C app it may be more convenient to use GCD, allocate a couple of queue for all your C++ containers at the same time and put semaphores where needed. Take a look at this excellent comparison.

is Ccriticalsection usable in production?

We are a couple of newbies in MFC and we are building a multi-threded application. We come across the article in the URL that warns us not to use CCriticalSection since its implementation is broken. We are interested to know if anyone has any experience in using CCriticalSection and do you come across any problems or bugs? Is CCriticalSection usable and production ready if we use VC++ 2008 to build our application?
http://www.flounder.com/avoid_mfc_syncrhonization.htm
thx
I think that article is based on a fundamental misunderstanding of what CSingleLock is for and how to use it.
You cannot lock the same CSingleLock multiple times, but you are not supposed to. CSingleLock, as its name suggests, is for locking something ONCE.
Each CSingleLock just manages one lock on some other object (e.g. a CCriticalSection which you pass it during construction), with the aim of automatically releasing that lock when the CSingleLock goes out of scope.
If you want to lock the underlying object multiple times you would use multiple CSingleLocks; you would not use a single CSingleLock and try to lock it multiple times.
Wrong (his example):
CCriticalSection crit;
CSingleLock lock(&crit);
lock.Lock();
lock.Lock();
lock.Unlock();
lock.Unlock();
Right:
CCriticalSection crit;
CSingleLock lock1(&crit);
CSingleLock lock2(&crit);
lock1.Lock();
lock2.Lock();
lock2.Unlock();
lock1.Unlock();
Even better (so you get RAII):
CCriticalSection crit;
// Scope the objects
{
CSingleLock lock1(&crit, TRUE); // TRUE means it (tries to) locks immediately.
// Do stuff which needs the lock (if IsLocked returns success)
CSingleLock lock2(&crit, TRUE);
// Do stuff which needs the lock (if IsLocked returns success)
}
// crit is unlocked now.
(Of course, you would never intentionally get two locks on the same underlying critical section in a single block like that. That'd usually only happen as a result of calling functions which get a lock while inside something else that already has its own lock.)
(Also, you should check CSingleLock.IsLocked to see if the lock was successful. I've left those checks out for brevity, and because they were left out of the original example.)
If CCriticalSection itself suffers from the same problem then that certainly is a problem, but he's presented no evidence of that that I can see. (Maybe I missed something. I can't find the source to CCriticalSection in my MFC install to verify that way, either.)
That article suggests that a simple situation of using those primitives are fine, except that the implementation of them violates the semantics that should be expected of them.
basically, it suggests that if you use it as a non-recursive lock, where you take care to always ensure that the lock is valid (ie, not abandoned), then you should be fine.
The article does complain, however, that the limitations are inexcusable.

Is this a safe version of double-checked locking?

Slightly modified version of canonical broken double-checked locking from Wikipedia:
class Foo {
private Helper helper = null;
public Helper getHelper() {
if (helper == null) {
synchronized(this) {
if (helper == null) {
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper();
// Atomically publish this instance.
atomicSet(helper, myHelper);
}
}
}
return helper;
}
}
Does simply making the publishing of the newly created Helper instance atomic make this double checked locking idiom safe, assuming that the underlying atomic ops library works properly? I realize that in Java, one could just use volatile, but even though the example is in pseudo-Java, this is supposed to be a language-agnostic question.
See also:
Double checked locking Article
It entirely depends on the exact memory model of your platform/language.
My rule of thumb: just don't do it. Lock-free (or reduced lock, in this case) programming is hard and shouldn't be attempted unless you're a threading ninja. You should only even contemplate it when you've got profiling proof that you really need it, and in that case you get the absolute best and most recent book on threading for that particular platform and see if it can help you.
I don't think you can answer the question in a language-agnostic fashion without getting away from code completely. It all depends on how synchronized and atomicSet work in your pseudocode.
The answer is language dependent - it comes down to the guarantees provided by atomicSet().
If the construction of myHelper can be spread out after the atomicSet() then it doesn't matter how the variable is assigned to the shared state.
i.e.
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper(); // ALLOCATE MEMORY HERE BUT DON'T INITIALISE
// Atomically publish this instance.
atomicSet(helper, myHelper); // ATOMICALLY POINT UNINITIALISED MEMORY from helper
// other thread gets run at this time and tries to use helper object
// AT THE PROGRAMS LEISURE INITIALISE Helper object.
If this is allowed by the language then the double checking will not work.
Using volatile would not prevent a multiple instantiations - however using the synchronize will prevent multiple instances being created. However with your code it is possible that helper is returned before it has been setup (thread 'A' instantiates it, but before it is setup thread 'B' comes along, helper is non-null and so returns it straight away. To fix that problem, remove the first if (helper == null).
Most likely it is broken, because the problem of a partially constructed object is not addressed.
To all the people worried about a partially constructed object:
As far as I understand, the problem of partially constructed objects is only a problem within constructors. In other words, within a constructor, if an object references itself (including it's subclass) or it's members, then there are possible issues with partial construction. Otherwise, when a constructor returns, the class is fully constructed.
I think you are confusing partial construction with the different problem of how the compiler optimizes the writes. The compiler can choose to A) allocate the memory for the new Helper object, B) write the address to myHelper (the local stack variable), and then C) invoke any constructor initialization. Anytime after point B and before point C, accessing myHelper would be a problem.
It is this compiler optimization of the writes, not partial construction that the cited papers are concerned with. In the original single-check lock solution, optimized writes can allow multiple threads to see the member variable between points B and C. This implementation avoids the write optimization issue by using a local stack variable.
The main scope of the cited papers is to describe the various problems with the double-check lock solution. However, unless the atomicSet method is also synchronizing against the Foo class, this solution is not a double-check lock solution. It is using multiple locks.
I would say this all comes down to the implementation of the atomic assignment function. The function needs to be truly atomic, it needs to guarantee that processor local memory caches are synchronized, and it needs to do all this at a lower cost than simply always synchronizing the getHelper method.
Based on the cited paper, in Java, it is unlikely to meet all these requirements. Also, something that should be very clear from the paper is that Java's memory model changes frequently. It adapts as better understanding of caching, garbage collection, etc. evolve, as well as adapting to changes in the underlying real processor architecture that the VM runs on.
As a rule of thumb, if you optimize your Java code in a way that depends on the underlying implementation, as opposed to the API, you run the risk of having broken code in the next release of the JVM. (Although, sometimes you will have no choice.)
dsimcha:
If your atomicSet method is real, then I would try sending your question to Doug Lea (along with your atomicSet implementation). I have a feeling he's the kind of guy that would answer. I'm guessing that for Java he will tell you that it's cheaper to always synchronize and to look to optimize somewhere else.

Resources