C++11 thread deadlock

I have written a simple synchronization for threads but it deadlocks and I don't know hot ot fix it in a clever way.
I have a
std::atomic<int> count_;
declared in my main (it is initialized equal to 0) and each threads run this code
while(!(count_%size_==0)); // wait until all the threads have reached this point
is the number of threads launched (in my case 2). This synchronization code is run several times inside the threads (in my case 3).
Sometimes it works fine, but sometimes it deadlocks.
I think that sometimes, not at the first call of this barrier, a thread increments again the count_ before that the other thread test the condition leading to a deadlock.
How can I fix this issue without putting any delay function? Is there a better way to create a checkpoint inside thread?

The race in exhibited by your code is as follows: Once the last thread reaches the barrier, all threads are allowed to continue. However, as soon as the first thread reaches the barrier again, the condition for continuing is no longer true. Any threads that have not left the barrier by that point will be stuck.
An easy solution to this problem is the use of C++11's condition_variable. I posted a sample implementation not too long ago as part of this answer.


Ways to detect deadlock in a live application

What are the ways to detect deadlocks in a live multi-threaded application?
If we found there is a deadlock, are there any ways to resolve it, without taking down/restarting the application?
There are two popular ways to detect deadlocks.
One is to have threads set checkpoints. For example, if you have a thread that has a work loop, you set a timer at the beginning of doing work that's set for longer than you think the work could possibly take. If the timer fires, you assume the thread is deadlocked. When the work is done, you cancel the timer.
Another (sometimes used in combination) is to have things that a thread might block on track what other resources a thread might hold. This can directly detect an attempt to acquire one lock while holding another one when other threads have acquired those locks in the opposite order.
This can even detect deadlock risk without the deadlock actually occurring. If one thread acquires lock A then B and another acquires lock B then A, there is no deadlock unless they overlap. But this method can detect it.
Advanced deadlock detection is typically only used during debugging. Other than coding the application to check each blocking lock for a possible deadlock and knowing what to do if it happens, the only thing you can do after a deadlock is tear the application down. You can't release locks blindly because the resources they protect may be in an inconsistent state.
Sometimes you deliberately write code that you know can deadlock and specifically code it to avoid the problem. For example, if you know lots of threads take lock A and then try to acquire lock B, and some other thread needs to do the reverse, you can code it do a non-blocking attempt to lock B and release lock A if it fails.
Typically, it's more useful to spend your effort making deadlocks impossible rather than making the code detect and work around deadlocks.
Python has a feature called the faulthandler that's very useful for dealing with deadlocks:
import faulthandler
If you're using C++ or any compiler that uses glibc, you can use the backtrace() functions in execinfo.h to print a stacktrace and exit gracefully when you get a signal. You can take a deadlocked program, send it a signal and get a list of all the threads.
In Java, use jstack <pid> on the stuck process.

Thread.yield and sleep

I'm new to multithreading and I ran into a two questions about thread scheduling with thread.yield and sleep in which I couldn't find a clear anwser to from my book or with googling. I'm going to save all pseudo codes or real codes because I think I already understand the possible starvation problem if my assumptions aren't right.
I'm going to refer to 3 pseudo threads in my questions:
My first question is that if I call thread yield or sleep in one of my 3 threads, is it guaranteed that CPU tries to schelude and process the other 2 threads before it comes back to the thread which called yield? So basically are threads in a clear queue, that makes the yiealding thread go to last of the queue?
I know that yield should give other threads chance to run but is it possible for example that after the yielding thread one of the 2 other threads tries to run and after that it goes back to the original thread which called yield, skipping the last thread and not giving it a chance to run at all?
My second question is related to the first. So do yield and sleep both have the same propeties that they both go to be the last on the queue when called like I assumed in my first question or is there anything other differences between them but the sleeping time in sleep?
If these question doesn't make sense the possible problem in my code is that before the thread which goes to sleep it has unlocked a mutex which one of the other threads has tried locking before, failed and gone waiting for it to open. So after the thread has gone to sleep, is it guaranteed that the thread which tried to lock the mutex will lock it before the sleeping thread?
Thread.yield() is a hint to thread scheduler which means "hey, right now I feel ok if you alseep me and let other thread run". There is no guarantees, it is only a hint. The assumption about the ordering of threads in "queue" is also incorrect because thread scheduling is done also by OS and it is very hard to predict a particular exection order without additional thread interaction mechanisms.
Thread.sleep() puts current thread to sleep for a specified amount of time, so the answer to your second question is - no, they do different things.

Confused about threads

I'm studying threads in C and I have this theoretical question in mind that is driving me crazy. Assume the following code:
1) void main() {
2) createThread(...); // create a new thread that does "something"
3) }
After line 2 is executed, two paths of execution are created. However I believe that immediately after line 2 is executed then it doesn't even matter what the new thread does, which was created at line 2, because the original thread that executed line 2 will end the entire program at its next instruction. Am I wrong? is there any chance the original thread gets suspended somehow and the new thread get its chance to do something (assume the code as is, no sync between threads or join operations are performed)
It can work out either way. If you have more than one core, the new thread might get its own core. Even if you don't, the scheduler might give the new thread priority over the existing one. The original thread might exhaust its timeslice right after it creates a new thread.
So that code creates a race condition -- one thread is trying to do work, another thread is trying to terminate the process. Which one wins will depend on the threading implementation, the hardware, and perhaps even some random chance.
If main() finishes before the spawned threads, all those threads will be terminated as there is no main() to support them.
Calling pthread_exit() at the end of main() will block it and keep it alive to support the threads it created until they complete execution.
You can learn more about this here: https://computing.llnl.gov/tutorials/pthreads/
Assuming you are using POSIX pthreads (not clear from your example) then you are right. If you don't want that then indeed pthread_exit from main will mean the program will continue to run until all the threads finish. The "main thread" is special in this regard, as its exit normally causes all threads to terminate.
More typically, you'll do something useful in the main thread after a new thread has been forked. Otherwise, what's the point? So you'll do your own processing, wait on some events, etc. If you want main (or any other thread) to wait for a thread to complete before proceeding, you can call pthread_join() with the handle of the thread of interest.
All of this may be off the point, however since you are not explicitly using POSIX threads in your example, so I don't know if that's pseudo-code for the purpose of example or literal code. In Windows, CreateThread has different semantics from POSIX pthreads. However, you didn't use that capitalization for the call in your example so I don't know if that's what you intended either. Personally I use the pthreads_win32 library even on Windows.

Know how many are waiting on a pthread mutex lock

I would like to know how many threads are waiting on a lock so I would be able to destroy it safely.
The problem is that I can't destroy the lock when someone holds it or someone is waiting on it.
My program can make sure that no new requests are made to acquire the lock, but how can I know when all the threads that waited on it are done with it?
I thought about a conditional variable but I suspect it will create problems..
dlv, could you add some code snippet to your description.
I hope you should be using condition variables,
Each thread will block in pthread_cond_wait() until the other thread signals it to wake up. This will not cause a deadlock. It can easily be extended to many threads, by allocating one int, pthread_cond_t and pthread_mutex_t per thread.
pthread_cond_wait() blocks the calling thread until the specified condition is signalled. This routine should be called while mutex is locked, and it will automatically release the mutex while it waits. After signal is received and thread is awakened, mutex will be automatically locked for use by the thread. The programmer is then responsible for unlocking mutex when the thread is finished with it.
The pthread_cond_signal() routine is used to signal (or wake up) another thread which is waiting on the condition variable. It should be called after mutex is locked, and must unlock mutex in order for pthread_cond_wait() routine to complete.
The pthread_cond_broadcast() routine should be used instead of pthread_cond_signal() if more than one thread is in a blocking wait state.
It is a logical error to call pthread_cond_signal() before calling pthread_cond_wait().
Proper locking and unlocking of the associated mutex variable is essential when using these routines. For example:
Failing to lock the mutex before calling pthread_cond_wait() may cause it NOT to block.
Failing to unlock the mutex after calling pthread_cond_signal() may not allow a matching pthread_cond_wait() routine to complete (it will remain blocked).
If threads that can use the mutex still exist or might be created in the future then don't delete it.
You do know and are tracking what threads are created, right?
If, for some reason, you cannot keep track of the threads using a resource, your only way out is to leak the resource. It can never be safely deleted because you never know when you are done using it.
Say you had a counter that counted the threads using a mutex. That counter would need its own mutex. Then how do you decide when to delete that one?
That way of thinking is the road that leads to hell. You could do what you want with condition variables, but the result would be an extremely weak design.
Assuming you managed to create such a monster, it would basically allow you to kill "safely" any other thread regardless of its internal state. Except for a quick and dirty panic exit (in case of some internal software error), this is the worst possible way of solving synchronization issues.
A design relying on such tricks would have to create implicit synchronizations between tasks to make sure the terminations occur in the proper order. A lot of software are designed that way, and most of them allow mediocre programmers to make a living by maintaining the pile of crap they created in the first place.
Task termination should be an issue solved at global design level, not by a toolbox of wonky objects that allow you to twist synchronization any odd way.

Sleeping a PThread other than the one doing the calling

So I have a bunch of pthreads, where one is the "main" thread and determines if a worker thread should be running or sleeping. But the POSIX definition for sleep says that The sleep() function shall cause the calling thread to be suspended from execution...
Obviously I could do something clumsy like have each worker thread check to see if a flag is set, but I'm looking for something a little better. I'm hoping I'm missing something obvious, because this is throwing a wrench in my plans.
If you're hacking Cilk anyway, I guess you can do whatever you want
How about having each pthread acquire a semaphore unit before dequeueing, (or stealing), a work object, and releasing it after doing the work? There may be a little latency, sure, but the number of threads available to do work will match the number of units signaled to the semaphore. To reduce the number of available threads by N from your control thread, wait for and acquire N units, so choking off N work threads. To start 'em again, signal N units.
Would this work for your system?
