is Ccriticalsection usable in production? - multithreading

We are a couple of newbies in MFC and we are building a multi-threded application. We come across the article in the URL that warns us not to use CCriticalSection since its implementation is broken. We are interested to know if anyone has any experience in using CCriticalSection and do you come across any problems or bugs? Is CCriticalSection usable and production ready if we use VC++ 2008 to build our application?
http://www.flounder.com/avoid_mfc_syncrhonization.htm
thx

I think that article is based on a fundamental misunderstanding of what CSingleLock is for and how to use it.
You cannot lock the same CSingleLock multiple times, but you are not supposed to. CSingleLock, as its name suggests, is for locking something ONCE.
Each CSingleLock just manages one lock on some other object (e.g. a CCriticalSection which you pass it during construction), with the aim of automatically releasing that lock when the CSingleLock goes out of scope.
If you want to lock the underlying object multiple times you would use multiple CSingleLocks; you would not use a single CSingleLock and try to lock it multiple times.
Wrong (his example):
CCriticalSection crit;
CSingleLock lock(&crit);
lock.Lock();
lock.Lock();
lock.Unlock();
lock.Unlock();
Right:
CCriticalSection crit;
CSingleLock lock1(&crit);
CSingleLock lock2(&crit);
lock1.Lock();
lock2.Lock();
lock2.Unlock();
lock1.Unlock();
Even better (so you get RAII):
CCriticalSection crit;
// Scope the objects
{
CSingleLock lock1(&crit, TRUE); // TRUE means it (tries to) locks immediately.
// Do stuff which needs the lock (if IsLocked returns success)
CSingleLock lock2(&crit, TRUE);
// Do stuff which needs the lock (if IsLocked returns success)
}
// crit is unlocked now.
(Of course, you would never intentionally get two locks on the same underlying critical section in a single block like that. That'd usually only happen as a result of calling functions which get a lock while inside something else that already has its own lock.)
(Also, you should check CSingleLock.IsLocked to see if the lock was successful. I've left those checks out for brevity, and because they were left out of the original example.)
If CCriticalSection itself suffers from the same problem then that certainly is a problem, but he's presented no evidence of that that I can see. (Maybe I missed something. I can't find the source to CCriticalSection in my MFC install to verify that way, either.)

That article suggests that a simple situation of using those primitives are fine, except that the implementation of them violates the semantics that should be expected of them.
basically, it suggests that if you use it as a non-recursive lock, where you take care to always ensure that the lock is valid (ie, not abandoned), then you should be fine.
The article does complain, however, that the limitations are inexcusable.

Related

Confusion about C++11 lock free stack push() function

I'm reading C++ Concurrency in Action by Anthony Williams, and don't understand its push implementation of the lock_free_stack class. Listing 7.12 to be precise
void push(T const& data)
{
counted_node_ptr new_node;
new_node.ptr=new node(data);
new_node.external_count=1;
new_node.ptr->next=head.load(std::memory_order_relaxed)
while(!head.compare_exchange_weak(new_node.ptr->next,new_node, std::memory_order_release, std::memory_order_relaxed));
}
So imagine 2 threads (A, B) calling push function. Both of them reach while loop but not start it. So they both read the same value from head.load(std::memory_order_relaxed).
Then we have the following things going on:
B thread gets swiped out for any reason
A thread starts the loop and obviously successfully adds a new node to the stack.
B thread gets back on track and also starts the loop.
And this is where it gets interesting as it seems to me.
Because there was a load operation with std::memory_order_relaxed and compare_exchange_weak(..., std::memory_order_release, ...) in case of success it looks like there is no synchronization between threads whatsoever.
I mean it's like std::memory_order_relaxed - std::memory_order_release and not std::memory_order_acquire - std::memory_order_release.
So B thread will simply add a new node to the stack but to its initial state when we had no nodes in the stack and reset head to this new node.
I was doing my research all around this subject and the best i could find was in this post Does exchange or compare_and_exchange reads last value in modification order?
So the question is, is it true? and all RMW functions see the last value in modification order? No matter what std::memory_order we used, if we use RMW operation it will synchronize with all threads (CPU and etc) and find the last value to be written to the atomic operation upon it is called?
So after some research and asking a bunch of people I believe I found the proper answer to this question, I hope it'll be a help to someone.
So the question is, is it true? and all RMW functions see the last
value in modification order?
Yes, it is true.
No matter what std::memory_order we used, if we use RMW operation it
will synchronize with all threads (CPU and etc) and find the last
value to be written to the atomic operation upon it is called?
Yes, it is also true, however there is something that needs to be highlighted.
RMW operation will synchronize only the atomic variable it works with. In our case, it is head.load
Perhaps you would like to ask why we need release - acquire semantics at all if RMW does the synchronization even with the relaxed memory order.
The answer is because RMW works only with the variable it synchronizes, but other operations which occurred before RMW might not be seen in the other thread.
lets look at the push function again:
void push(T const& data)
{
counted_node_ptr new_node;
new_node.ptr=new node(data);
new_node.external_count=1;
new_node.ptr->next=head.load(std::memory_order_relaxed)
while(!head.compare_exchange_weak(new_node.ptr->next,new_node, std::memory_order_release, std::memory_order_relaxed));
}
In this example, in case of using two push threads they won't be synchronized with each other to some extent, but it could be allowed here.
Both threads will always see the newest head because compare_exchange_weak
provides this. And a new node will be always added to the top of the stack.
However if we tried to get the value like this *(new_node.ptr->next) after this line new_node.ptr->next=head.load(std::memory_order_relaxed) things could easily turn ugly: empty pointer might be dereferenced.
This might happen because a processor can change the order of instructions and because there is no synchronization between threads the second thread could see the pointer to a top node even before that was initialized!
And this is exactly where release-acquire semantic comes to help. It ensures that all operations which happened before release operation will be seen in acquire part!
Check out and compare listings 5.5 and 5.8 in the book.
I also recommend you to read this article about how processors work, it might provide some essential information for better understanding.
memory barriers

How can we know we are inside a spinlock?

I have a function which is called by multiple functions. Some functions call it with spinlock held and some without any lock. How can I know if my function is called with spinlock held?
I have a big piece of code written some time back. It has some functions which are called with and without locks from different code paths. The functions allocate skbs with GFP_KERNEL flag only considering the cases without locks. It is causing issues when called with spin_lock(). I need to handle both the cases to avoid sleeping inside a spin_lock.
How can I know if my function is called with spinlock held?
You cannot, not directly. You would need to set a flag in some structure yourself that indicates whether you hold the lock or not.
You are better off creating 2 functions. One that you call if you hold the lock, one that you call if you don't hold the lock.
//b->lck must be taken
void foo_unlocked(struct bar *b)
{
//do your thing, assume relevant lock is held
}
//b->lck must not be taken
void foo(struct bar *b)
{
spin_lock(b->lck);
foo_unlocked(b);
spin_unlock(b->lck);
}
I need to check only preemption disabled or irqs disabled. Based on that I can allocate memory with GFP_KERNEL or GFP_ATOMIC. Hence I don't need to rely on when spin_lock or another lock. Using in_atomic() and irqs_disabled() functions, I can achieve it. Thanks

c++ multi threading - lock one pointer assignment?

I have a method as below
SomeStruct* abc;
void NullABC()
{
abc = NULL;
}
This is just example and not very interesting.
Many thread could call this method at the same time.
Do I need to lock "abc = NULL" line?
I think it is just pointer so it could be done in one shot and there isn't really need for it but just wanted to make sure.
Thanks
It depends on the platform on which you are running. On many platforms, as long as abc is correctly aligned, the write will be atomic.
However, if your platform does not have such a guarantee, you need to synchronize access to the variable, using a lock, an atomic variable, or an interlocked operation.
No you do not need a lock, at least not on x86. A memory barrier is required in may real world situations though, and locking is one way to get this (the other would be an explicit barrier). You may also consider using an interlocked operation, like VisualC's InterlockedExchangePointer if you need access to the original pointer. There are equivalent intrinsics supported by most compilers.
If no other threads are ever using abc for any other purpose, then the code as shown is fine... but of course it's a bit silly to have a pointer that never gets used except to set it to NULL.
If there is some other code somewhere that does something like this, OTOH:
if (abc != NULL)
{
abc->DoSomething();
}
Then in this case both the code that uses the abc pointer (above) and the code that changes it (that you posted) needed to lock a mutex before accessing (abc). Otherwise the code above risks crashing if the value of abc gets set to NULL after the if statement but before the DoSomething() call.
A borderline case would be if the other code does this:
SomeStruct * my_abc = abc;
if (my_abc != NULL)
{
my_abc->DoSomething();
}
That will probably work, because at the time the abc pointer's value is copied over to my_abc, the value of abc is either NULL or it isn't... and my_abc is a local variable, so other thread's won't be able to change it before DoSomething() is called. The above could theoretically break on some platforms where copying of pointers isn't atomic though (in which case my_abc might end up being an invalid pointer, with have of abc's bits and half NULL bits)... but common/PC hardware will copy pointers atomically, so it shouldn't be an issue there. It might be worthwhile to use a Mutex anyway just to for paranoia's sake though.

Double-Checked Locking Pattern in C++11?

The new machine model of C++11 allows for multi-processor systems to work reliably, wrt. to reorganization of instructions.
As Meyers and Alexandrescu pointed out the "simple" Double-Checked Locking Pattern implementation is not safe in C++03
Singleton* Singleton::instance() {
if (pInstance == 0) { // 1st test
Lock lock;
if (pInstance == 0) { // 2nd test
pInstance = new Singleton;
}
}
return pInstance;
}
They showed in their article that no matter what you do as a programmer, in C++03 the compiler has too much freedom: It is allowed to reorder the instructions in a way that you can not be sure that you end up with only one instance of Singleton.
My question is now:
Do the restrictions/definitions of the new C++11 machine model now constrain the sequence of instructions, that the above code would always work with a C++11 compiler?
How does a safe C++11-Implementation of this Singleton pattern now looks like, when using the new library facilities (instead of the mock Lock here)?
If pInstance is a regular pointer, the code has a potential data race -- operations on pointers (or any builtin type, for that matter) are not guaranteed to be atomic (EDIT: or well-ordered)
If pInstance is an std::atomic<Singleton*> and Lock internally uses an std::mutex to achieve synchronization (for example, if Lock is actually std::lock_guard<std::mutex>), the code should be data race free.
Note that you need both explicit locking and an atomic pInstance to achieve proper synchronization.
Since static variable initialization is now guaranteed to be threadsafe, the Meyer's singleton should be threadsafe.
Singleton* Singleton::instance() {
static Singleton _instance;
return &_instance;
}
Now you need to address the main problem: there is a Singleton in your code.
EDIT: based on my comment below: This implementation has a major drawback when compared to the others. What happens if the compiler doesn't support this feature? The compiler will spit out thread unsafe code without even issuing a warning. The other solutions with locks will not even compile if the compiler doesn't support the new interfaces. This might be a good reason not to rely on this feature, even for things other than singletons.
C++11 doesn't change the meaning of that implementation of double-checked locking. If you want to make double-checked locking work you need to erect suitable memory barriers/fences.

Is this a safe version of double-checked locking?

Slightly modified version of canonical broken double-checked locking from Wikipedia:
class Foo {
private Helper helper = null;
public Helper getHelper() {
if (helper == null) {
synchronized(this) {
if (helper == null) {
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper();
// Atomically publish this instance.
atomicSet(helper, myHelper);
}
}
}
return helper;
}
}
Does simply making the publishing of the newly created Helper instance atomic make this double checked locking idiom safe, assuming that the underlying atomic ops library works properly? I realize that in Java, one could just use volatile, but even though the example is in pseudo-Java, this is supposed to be a language-agnostic question.
See also:
Double checked locking Article
It entirely depends on the exact memory model of your platform/language.
My rule of thumb: just don't do it. Lock-free (or reduced lock, in this case) programming is hard and shouldn't be attempted unless you're a threading ninja. You should only even contemplate it when you've got profiling proof that you really need it, and in that case you get the absolute best and most recent book on threading for that particular platform and see if it can help you.
I don't think you can answer the question in a language-agnostic fashion without getting away from code completely. It all depends on how synchronized and atomicSet work in your pseudocode.
The answer is language dependent - it comes down to the guarantees provided by atomicSet().
If the construction of myHelper can be spread out after the atomicSet() then it doesn't matter how the variable is assigned to the shared state.
i.e.
// Create new Helper instance and store reference on
// stack so other threads can't see it.
Helper myHelper = new Helper(); // ALLOCATE MEMORY HERE BUT DON'T INITIALISE
// Atomically publish this instance.
atomicSet(helper, myHelper); // ATOMICALLY POINT UNINITIALISED MEMORY from helper
// other thread gets run at this time and tries to use helper object
// AT THE PROGRAMS LEISURE INITIALISE Helper object.
If this is allowed by the language then the double checking will not work.
Using volatile would not prevent a multiple instantiations - however using the synchronize will prevent multiple instances being created. However with your code it is possible that helper is returned before it has been setup (thread 'A' instantiates it, but before it is setup thread 'B' comes along, helper is non-null and so returns it straight away. To fix that problem, remove the first if (helper == null).
Most likely it is broken, because the problem of a partially constructed object is not addressed.
To all the people worried about a partially constructed object:
As far as I understand, the problem of partially constructed objects is only a problem within constructors. In other words, within a constructor, if an object references itself (including it's subclass) or it's members, then there are possible issues with partial construction. Otherwise, when a constructor returns, the class is fully constructed.
I think you are confusing partial construction with the different problem of how the compiler optimizes the writes. The compiler can choose to A) allocate the memory for the new Helper object, B) write the address to myHelper (the local stack variable), and then C) invoke any constructor initialization. Anytime after point B and before point C, accessing myHelper would be a problem.
It is this compiler optimization of the writes, not partial construction that the cited papers are concerned with. In the original single-check lock solution, optimized writes can allow multiple threads to see the member variable between points B and C. This implementation avoids the write optimization issue by using a local stack variable.
The main scope of the cited papers is to describe the various problems with the double-check lock solution. However, unless the atomicSet method is also synchronizing against the Foo class, this solution is not a double-check lock solution. It is using multiple locks.
I would say this all comes down to the implementation of the atomic assignment function. The function needs to be truly atomic, it needs to guarantee that processor local memory caches are synchronized, and it needs to do all this at a lower cost than simply always synchronizing the getHelper method.
Based on the cited paper, in Java, it is unlikely to meet all these requirements. Also, something that should be very clear from the paper is that Java's memory model changes frequently. It adapts as better understanding of caching, garbage collection, etc. evolve, as well as adapting to changes in the underlying real processor architecture that the VM runs on.
As a rule of thumb, if you optimize your Java code in a way that depends on the underlying implementation, as opposed to the API, you run the risk of having broken code in the next release of the JVM. (Although, sometimes you will have no choice.)
dsimcha:
If your atomicSet method is real, then I would try sending your question to Doug Lea (along with your atomicSet implementation). I have a feeling he's the kind of guy that would answer. I'm guessing that for Java he will tell you that it's cheaper to always synchronize and to look to optimize somewhere else.

Resources