Can multithreaded code possible deadlock be avoided this way? - multithreading

We know that multi-threaded code has the bane of possible deadlocks if the threads acquire mutex locks but before it gets a chance to release it, the thread gets suspended by main thread or pre-empted out by Scheduler?
I am a beginner in using pthread library so please bear with me if my below query/proposed solution might be unfeasible or outright wrong.
void main()
{
thread_create(T1,NULL,thr_function,NULL)
suspend_thread(T1);
acquire_lock(Lock1);<--- //Now here is a possible deadlock if thread_function acquried Lock1 before main and main suspended T1 before its release
//Do something further;
}
void *thr_function(void *val)
{
///do something;
acquire_lock(Lock1);
//do some more things;
//do some more things;
release_lock(Lock1);
}
In this below pseudo code segment above I have, can't the thread run-time/compiler work together to make sure if a thread which has acquired a mutex lock, is suspended/pre-empted then it executes some 'cleanup code' of releasing all locks it has held before it gets out. The compiler/linker can identify the places inside a thread function which acquire , release lock, then when a thread is suspended between those two places(i.e. after acquire but before release) the execution in the thread function should jump via some kind of 'goto label;' inserted by the runtime where at the label: the thread would release the lock and then the thread gets blocked or context switch happens. [ I know if a thread acquires more than 1 locks it might get messy to jump across those points to release those locks...]
But basic idea/question is can the thread function not do the necessary releases of acquired locks for mutexes, semaphores before it gets blocked out or goes out of execution state to wait or some other state?

No. The reason a thread holds a lock is so that it can make data temporarily inconsistent or see a consistent view of that data itself. If some scheme were to automatically release that lock before the thread made the data consistent again, other threads would acquire the lock, see the inconsistent data, and fail. Or when that thread was resumed, it would either not have the lock or have the lock and see inconsistent data itself. This is why you can only reliably suspend a thread with that thread's cooperation.
Consider this logic to add an object to a linked list protected by a mutex:
Acquire the lock protecting a linked list.
Modify the link's head pointer.
Modify the object's next pointer.
Release the lock.
Now imagine if something were to suspend the thread between steps 2 and 3. If the lock were released, other threads would see the link's head pointer pointing to an object that had not been linked to the list. And when the thread resumed, it might set the object to the wrong pointer because the list had changed.
The general consensus is that suspending threads is so evil that even a feeling that you might want to suspend a thread suggests an incorrect application design. There is practically no reason a properly-designed application would ever want to suspend a thread. (If you didn't want that thread to continue doing the work it was doing, why did you code it to continue doing that work in the first place?)
By the way, scheduler pre-emption is not a problem. Eventually, the thread will be scheduled again and release the lock. So long as there are other threads that can make forward progress, no harm is done. And if there are no other threads that can make forward progress, the only thing the system can do is schedule the thread that was pre-empted.

One way to avoid this kind of deadlocks is to have a global, mutexed variable should_stop_thread which eventually gets set to true by the master thread.
The child thread checks the variable regularly and terminates in a controlled manner if it is true. "Controlled" in this sense means that all data (pointers) are valid (again) and mutex locks are released.

Related

Why does std::condition_variable wait() require a std::unique_lock arg?

My thread does not need to be locked. std::unique_lock locks thread on construction. I am simply using cond_var.wait() as a way to avoid busy waiting. I have essentially circumvented the auto-locking by putting the unique_lock within a tiny scope and hence destroying the unique lock after it leaves the tiny scope. Additionally, there is only a single consumer thread if that's relevant.
{
std::unique_lock<std::mutex> dispatch_ul(dispatch_mtx);
pq_cond.wait(dispatch_ul);
}
Is there possibly a better option to avoid the unnecessary auto-lock functionality from the unique_lock? I'm looking for a mutexless option to simply signal the thread, I am aware of std::condition_variable_any but that requires a mutex of sorts which is yet again unnessesary in my case.
You need a lock to prevent this common newbie mistake:
Producer thread produces something,
Producer thread calls some_condition.notify_all(),
Producer thread goes idle for a while,
meanwhile:
Consumer thread calls some_condition.wait(...)
Consumer thread waits,...
And waits,...
And waits.
A condition variable is not a flag. It does not remember that it was notified. If the producer calls notify_one() or notify_all() before the consumer has entered the wait() call, then the notification is "lost."
In order to prevent lost notifications, there must be some shared data that tells the consumer whether or not it needs to wait, and there must be a lock to protect the shared data.
The producer should:
Lock the lock,
update the shared data,
notify the condition variable,
release the lock
The consumer must then:
Lock the lock,
Check the shared data to see if it needs wait,
Wait if needed,
consume whatever,
release the lock.
The consumer needs to pass the lock in to the wait(...) call so that wait(...) can temporarily unlock it, and then re-lock it before returning. If wait(...) did not unlock the lock, then the producer would never be able to reach the notify() call.

Know how many are waiting on a pthread mutex lock

I would like to know how many threads are waiting on a lock so I would be able to destroy it safely.
The problem is that I can't destroy the lock when someone holds it or someone is waiting on it.
My program can make sure that no new requests are made to acquire the lock, but how can I know when all the threads that waited on it are done with it?
I thought about a conditional variable but I suspect it will create problems..
dlv, could you add some code snippet to your description.
I hope you should be using condition variables,
Each thread will block in pthread_cond_wait() until the other thread signals it to wake up. This will not cause a deadlock. It can easily be extended to many threads, by allocating one int, pthread_cond_t and pthread_mutex_t per thread.
pthread_cond_wait() blocks the calling thread until the specified condition is signalled. This routine should be called while mutex is locked, and it will automatically release the mutex while it waits. After signal is received and thread is awakened, mutex will be automatically locked for use by the thread. The programmer is then responsible for unlocking mutex when the thread is finished with it.
The pthread_cond_signal() routine is used to signal (or wake up) another thread which is waiting on the condition variable. It should be called after mutex is locked, and must unlock mutex in order for pthread_cond_wait() routine to complete.
The pthread_cond_broadcast() routine should be used instead of pthread_cond_signal() if more than one thread is in a blocking wait state.
It is a logical error to call pthread_cond_signal() before calling pthread_cond_wait().
Proper locking and unlocking of the associated mutex variable is essential when using these routines. For example:
Failing to lock the mutex before calling pthread_cond_wait() may cause it NOT to block.
Failing to unlock the mutex after calling pthread_cond_signal() may not allow a matching pthread_cond_wait() routine to complete (it will remain blocked).
If threads that can use the mutex still exist or might be created in the future then don't delete it.
You do know and are tracking what threads are created, right?
If, for some reason, you cannot keep track of the threads using a resource, your only way out is to leak the resource. It can never be safely deleted because you never know when you are done using it.
Say you had a counter that counted the threads using a mutex. That counter would need its own mutex. Then how do you decide when to delete that one?
That way of thinking is the road that leads to hell. You could do what you want with condition variables, but the result would be an extremely weak design.
Assuming you managed to create such a monster, it would basically allow you to kill "safely" any other thread regardless of its internal state. Except for a quick and dirty panic exit (in case of some internal software error), this is the worst possible way of solving synchronization issues.
A design relying on such tricks would have to create implicit synchronizations between tasks to make sure the terminations occur in the proper order. A lot of software are designed that way, and most of them allow mediocre programmers to make a living by maintaining the pile of crap they created in the first place.
Task termination should be an issue solved at global design level, not by a toolbox of wonky objects that allow you to twist synchronization any odd way.

Deadlock Delphi explanation/solution

On a server application we have the following:
A class called a JobManager that is a singleton.
Another class, the Scheduler, that keeps checking if it is time to add any sort of job to the JobManager.
When it is time to do so, the Scheduler do something like:
TJobManager.Singleton.NewJobItem(parameterlist goes here...);
At the same time, on the client application, the user do something that generates a call to the server. Internally, the server sends a message to itself, and one of the classes listening for that message is the JobManager.
The JobManager handles the message, and knows that it is time to add a new job to the list, calling its own method:
NewJobItem(parameter list...);
On the NewJobItem method, I have something like this:
CS.Acquire;
try
DoSomething;
CallAMethodWithAnotherCriticalSessionInternally;
finally
CS.Release;
end;
It happens that the system reaches a deadlock at this point (CS.Acquire).
The communication between client and server application, is made via Indy 10.
I think, the RPC call that fire the server application method that sends a message to the JobManager is running on the context of the Indy Thread.
The Scheduler has its own thread running, and it makes a direct call to the JobManager method. Is this situation prone to deadlocks?
Can someone help me understand why a deadlock is happening here?
We knew that, sometimes, when the client did a specific action, that cause the system to lock, then I could finally find out this point, where the critical section on the same class is reached twice, from different points (the Scheduler and the message handler method of the JobManager).
Some more info
I want to add that (this may be silly, but anyway...) inside the DoSomething there is another
CS.Acquire;
try
Do other stuff...
finally
CS.Release;
end;
This internal CS.Release is doing anything to the external CS.Acquire? If so, this could be the point where the Scheduler is entering the Critical Section, and all the lock and unlock becomes a mess.
There isn't enough information about your system to be able to tell you definitively if your JobManager and Scheduler are causing a deadlock, but if they are both calling the same NewJobItem method, then this should not be the problem since they will both acquire the locks in the same order.
For your question if your NewJobItem CS.acquire and DoSomething CS.acquire interact with each other: it depends. If the lock object used in both methods is different, then no the two calls should be independant. If it's the same object then it depends on the type of lock. If you locks are re-entrant locks (eg. they allow acquire to be called multiple times from the same thread and count how many time they have been acquired and released) then this should not be a problem. On the other hand if you have simple lock objects that don't support re-entry, then the DoSomething CS.release could release your lock for that thread and then the CallAMethodWithAnotherCriticalSessionInternally would be running without the protection of the CS lock that was acquired in NewJobItem.
Deadlocks occur when there are two or more threads running and each thread is waiting for another thread to finish it's current job before it can continue its self.
For Example:
Thread 1 executes:
lock_a.acquire()
lock_b.acquire()
lock_b.release()
lock_a.release()
Thread 2 executes:
lock_b.acquire()
lock_a.acquire()
lock_a.release()
lock_b.release()
Notice that the locks in thread 2 are acquired in the opposite order from thread 1. Now if thread 1 acquires the lock_a and then is interrupted and thread 2 now runs and acquires lock_b and then starts waiting for lock_a to be available before it can continue. Then thread 1 continues running and the next thing it does is try to acquire lock_b, but it is already taken by thread 2 and so it waits. Finally we are in a situation in which thread 1 is waiting for thread 2 to release lock_b and thread 2 is waiting for thread 1 to release lock_a.
This is a deadlock.
There are several common solutions:
Only use one shared global lock in all your code. This way it is impossible to have two threads waiting for two locks. This makes your code wait a lot for the lock to be available.
Only ever allow your code to hold one lock at a time. This is usually too hard to control since you might not know or control the behavior of method calls.
Only allow your code to acquire multiple locks all at the same time, and release them all at the same time, and disallow acquiring new locks while you already have locks acquired.
Make sure that all locks are acquired in the same global order. This is a more common technique.
With solution 4. you need to be careful programming and always make sure that you acquire the locks/critical sections in the same order. To help with debugging you can place a global order on all the locks in your system (eg. just a unique integer for each lock) and then throwing an error if you try to acquire a lock that has a lower ranking that a lock that the current thread already has acquired (eg. if new_lock.id < lock_already_acquired.id then throw exception)
If you can't put in a global debugging aid to help find which locks have been acquired out of order, the I'd suggest that you find all the places in your code that you acquire any lock and just print a debugging message with the current time, the method calling acquire/release, the thread id, and the lock id that is being acquired. Also do the same thing with all the release calls. Then run your system until you get the deadlock and find in your log file which locks have been acquired by which threads and in which order. Then decide which thread is accessing it's locks in the wrong order and change it.

Mutex lock: what does "blocking" mean?

I've been reading up on multithreading and shared resources access and one of the many (for me) new concepts is the mutex lock. What I can't seem to find out is what is actually happening to the thread that finds a "critical section" is locked. It says in many places that the thread gets "blocked", but what does that mean? Is it suspended, and will it resume when the lock is lifted? Or will it try again in the next iteration of the "run loop"?
The reason I ask, is because I want to have system supplied events (mouse, keyboard, etc.), which (apparantly) are delivered on the main thread, to be handled in a very specific part in the run loop of my secondary thread. So whatever event is delivered, I queue in my own datastructure. Obviously, the datastructure needs a mutex lock because it's being modified by both threads. The missing puzzle-piece is: what happens when an event gets delivered in a function on the main thread, I want to queue it, but the queue is locked? Will the main thread be suspended, or will it just jump over the locked section and go out of scope (losing the event)?
Blocked means execution gets stuck there; generally, the thread is put to sleep by the system and yields the processor to another thread. When a thread is blocked trying to acquire a mutex, execution resumes when the mutex is released, though the thread might block again if another thread grabs the mutex before it can.
There is generally a try-lock operation that grab the mutex if possible, and if not, will return an error. But you are eventually going to have to move the current event into that queue. Also, if you delay moving the events to the thread where they are handled, the application will become unresponsive regardless.
A queue is actually one case where you can get away with not using a mutex. For example, Mac OS X (and possibly also iOS) provides the OSAtomicEnqueue() and OSAtomicDequeue() functions (see man atomic or <libkern/OSAtomic.h>) that exploit processor-specific atomic operations to avoid using a lock.
But, why not just process the events on the main thread as part of the main run loop?
The simplest way to think of it is that the blocked thread is put in a wait ("sleeping") state until the mutex is released by the thread holding it. At that point the operating system will "wake up" one of the threads waiting on the mutex and let it acquire it and continue. It's as if the OS simply puts the blocked thread on a shelf until it has the thing it needs to continue. Until the OS takes the thread off the shelf, it's not doing anything. The exact implementation -- which thread gets to go next, whether they all get woken up or they're queued -- will depend on your OS and what language/framework you are using.
Too late to answer but I may facilitate the understanding. I am talking more from implementation perspective rather than theoretical texts.
The word "blocking" is kind of technical homonym. People may use it for sleeping or mere waiting. The term has to be understood in context of usage.
Blocking means Waiting - Assume on an SMP system a thread B wants to acquire a spinlock held by some other thread A. One of the mechanisms is to disable preemption and keep spinning on the processor unless B gets it. Another mechanism probably, an efficient one, is to allow other threads to use processor, in case B does not gets it in easy attempts. Therefore we schedule out thread B (as preemption is enabled) and give processor to some other thread C. In this case thread B just waits in the scheduler's queue and comes back with its turn. Understand that B is not sleeping just waiting rather passively instead of busy-wait and burning processor cycles. On BSD and Solaris systems there are data-structures like turnstiles to implement this situation.
Blocking means Sleeping - If the thread B had instead made system call like read() waiting data from network socket, it cannot proceed until it gets it. Therefore, some texts casually use term blocking as "... blocked for I/O" or "... in blocking system call". Actually, thread B is rather sleeping. There are specific data-structures known as sleep queues - much like luxury waiting rooms on air-ports :-). The thread will be woken up when OS detects availability of data, much like an attendant of the waiting room.
Blocking means just that. It is blocked. It will not proceed until able. You don't say which language you're using, but most languages/libraries have lock objects where you can "attempt" to take the lock and then carry on and do something different depending on whether you succeeded or not.
But in, for example, Java synchronized blocks, your thread will stall until it is able to acquire the monitor (mutex, lock). The java.util.concurrent.locks.Lock interface describes lock objects which have more flexibility in terms of lock acquisition.

synchronising threads with mutexes

In Qt, I have a method which contains a mutex lock and unlock. The problem is when the mutex is unlock it sometimes take long before the other thread gets the lock back. In other words it seems the same thread can get the lock back(method called in a loop) even though another thread is waiting for it. What can I do about this? One thread is a qthread and the other thread is the main thread.
You can have your thread that just unlocked the mutex relinquish the processor. On Posix, you do that by calling pthread_yield() and on Windows by calling Sleep(0).
That said, there is no guarantee that the thread waiting on the lock will be scheduled before your thread wakes up again.
It shouldn't be possible to release a lock and then get it back if some other thread is already waiting on it.
Check that you actually releasing the lock when you think you do. Check that waiting thread actually waits (and not spins a loop with a trylock tests and sleeps, I actually done that once and was very puzzled at first :)).
Or if waiting thread really never gets time to even reach locking code, try QThread::yieldCurrentThread(). This will stop current thread and give scheduler a chance to give execution to somebody else. Might cause unnecessary switching depending on tightness of your loop.
If you want to make sure that one thread has priority over the other ones, an option is to use a QReadWriteLock. It's adapted to a typical scenario where n threads are going to read a value in a infinite loop, with only one thread updating it. I think it's the scenario you described.
QReadWriteLock offers two ways to lock: lockForRead() and lockForWrite(). The threads depending on the value will use the latter, while the thread updating the value (typically via the GUI) will use the former (lockForWrite()) and will have top priority. You won't need to sleep or yield or whatever.
Example code
Let's say you have a QReadWrite lock; somewhere.
"Reader" thread
forever {
lock.lockForRead();
if (condition) {
do_stuff();
}
lock.unlock();
}
"Writer" thread
// external input (eg. user) changes the thread
lock.lockForWrite(); // will block as soon as the reader lock ends
update_condition();
lock.unlock();

Resources