Avoid deadlocks in a multithreaded process - multithreading

What are the best practices/idioms should someone follow in order to avoid deadlocks?

Please see What are common reasons for deadlocks?

There are four conditions which must occur for deadlock to occur:
Mutual exclusion condition: a resource that cannot be used by more than one process at a time
Hold and wait condition: processes already holding resources may request new resources
No preemption condition: No resource can be forcibly removed from a process holding it, resources can be released only by the explicit action of the process
Circular wait condition: two or more processes form a circular chain where each process waits for a resource that the next process in the chain holds
Avoid at least one of these, and preferably more, and you shouldn't have too many problems.

There is so called Banker's algorithm, for deadlock avoidance. Also you can consider the use of Watch Dog in order to break out form deadlock. Here also few interesting points.

The canonical technique for deadlock avoidance is to have a lock hierarchy. Make sure that all threads acquire locks or other resources in the same order. This avoids the deadlock scenario where thread 1 hold lock A and needs lock B while thread 2 holds lock B and needs lock A. With a lock hierarchy, both threads would have to acquire the locks in the same order (say, A before B).

The best practice would be by defining a class for your thread and use only non-static fields from this class in your thread so your threads won't be sharing any memory.
Of course, to avoid deadlocks you could also avoid the use of semaphores, critical sections and mutexes. Less is better, if you want to avoid deadlocks. Unfortunately, these are required if some memory or other resource is shared between two threads or else you risk corruption of data.

Among the various methods to enter critical sections -- semaphores and mutexs are the most popular.
A semaphore is a waiting mechanism and mutex is a locking mechanism, well the concept is confusing to the most, but in short, a thread activating a mutex can only deactivate it. with this in mind...
Dont allow any process to lock partial no of resources, if a process need 5 resources, wait until all the are available.
if u use semaphore here, u can unblock/un-wait the resource occupied by other thread. by this i mean pre-emption is another reason.
These 2 according to me are the basic conditions, the remaining 2 of the common 4 precautions can be related to these.
If u dont agree ps add comments. I've gtg already late, I will later add a cleaner and clearer explanation.

Related

Terminology question: mutex lock, spin lock, sleepable lock

All over StackOverflow and the net I see folks to distinguish mutexes and spinlocks as like mutex is a mutual exclusion lock providing acquire() and release() functions and if the lock is taken, then acquire() will allow a process to be preempted.
Nevertheless, A. Silberschatz in his Operating System Concepts says in the section 6.5:
... The simplest of these tools is the mutex lock. (In fact, the term mutex is short for mutual exclusion.) We use the mutex lock to protect critical sections and thus prevent race conditions. That is, a process must acquire the lock before entering a critical section; it releases the lock when it exits the critical section. The acquire() function acquires the lock, and the release() function releases the lock.
and then he describes a spinlock, though adding a bit later
The type of mutex lock we have been describing is also called a spinlock because the process “spins” while waiting for the lock to become available.
so as spinlock is just a type of mutex lock as opposed to sleepable locks allowing a process to be preempted. That is, spinlocks and sleepable locks are all mutexes: locks by means of acquire() and release() functions.
I see totally logical to define mutex locks in the way Silberschatz did (though a bit implicitly).
What approach would you agree with?
The type of mutex lock we have been describing is also called a spinlock because the process “spins” while waiting for the lock to become available.
Maybe you're misreading the book (that is, "The type of mutex lock we have been describing" might not refer to the exact passage you think it does), or the book is outdated. Modern terminology is quite clear in what a mutex is, but spinlocks get a bit muddy.
A mutex is a concurrency primitive that allows one agent at a time access to its resource, while the others have to wait in the meantime until it the exclusive access is released. How they wait is not specified and irrelevant, their process might go to sleep, get written to disk, spin in a loop, or perhaps you are using cooperative concurrency (also known as "asynchronous programming") and passing control to the event loop as your 'waiting operation'.
A spinlock does not have a clear definition. It might be used to refer to:
A synonym for mutex (this is in my opinion wrong, but it happens).
A specific mutex implementation that always waits in a busy loop.
Any sort of busy-waiting loop waiting for a resource. A semaphore, for example, might also get implemented using a 'spinlock'.
I would consider any use of the word to refer to a (part of a) specific implementation of a concurrency primitive that waits in a busy loop to be correct, if a more general term is not appropriate. That is, use mutex (or whatever primitive you desire) unless you specifically want to talk about a busy-waiting concurrency primitive.
The words that one author uses in one book or manual will not always have the same exact meaning in every book and every manual. The meanings of the words evolve over time, and it can happen fast when the words are names for new ideas.
Not every book was written at the same time. Not every author is the same age or had the same teachers. It's just something you'll have to get used to.
"Mutex" was a name for a new idea not so very long ago.
In one book, it might mean nothing more than a thing that keeps two or more threads from entering the same critical section at the same time. In another book, it might refer to a specific type of object in a certain operating system or library that is used for that same purpose.
A spinlock is a lock/mutex whose implementation relies mainly on a spinning loop.
More advanced locks/mutexes may have spinning parts in their implementation, however those often last for no more than a few microseconds or so.

Know how many are waiting on a pthread mutex lock

I would like to know how many threads are waiting on a lock so I would be able to destroy it safely.
The problem is that I can't destroy the lock when someone holds it or someone is waiting on it.
My program can make sure that no new requests are made to acquire the lock, but how can I know when all the threads that waited on it are done with it?
I thought about a conditional variable but I suspect it will create problems..
dlv, could you add some code snippet to your description.
I hope you should be using condition variables,
Each thread will block in pthread_cond_wait() until the other thread signals it to wake up. This will not cause a deadlock. It can easily be extended to many threads, by allocating one int, pthread_cond_t and pthread_mutex_t per thread.
pthread_cond_wait() blocks the calling thread until the specified condition is signalled. This routine should be called while mutex is locked, and it will automatically release the mutex while it waits. After signal is received and thread is awakened, mutex will be automatically locked for use by the thread. The programmer is then responsible for unlocking mutex when the thread is finished with it.
The pthread_cond_signal() routine is used to signal (or wake up) another thread which is waiting on the condition variable. It should be called after mutex is locked, and must unlock mutex in order for pthread_cond_wait() routine to complete.
The pthread_cond_broadcast() routine should be used instead of pthread_cond_signal() if more than one thread is in a blocking wait state.
It is a logical error to call pthread_cond_signal() before calling pthread_cond_wait().
Proper locking and unlocking of the associated mutex variable is essential when using these routines. For example:
Failing to lock the mutex before calling pthread_cond_wait() may cause it NOT to block.
Failing to unlock the mutex after calling pthread_cond_signal() may not allow a matching pthread_cond_wait() routine to complete (it will remain blocked).
If threads that can use the mutex still exist or might be created in the future then don't delete it.
You do know and are tracking what threads are created, right?
If, for some reason, you cannot keep track of the threads using a resource, your only way out is to leak the resource. It can never be safely deleted because you never know when you are done using it.
Say you had a counter that counted the threads using a mutex. That counter would need its own mutex. Then how do you decide when to delete that one?
That way of thinking is the road that leads to hell. You could do what you want with condition variables, but the result would be an extremely weak design.
Assuming you managed to create such a monster, it would basically allow you to kill "safely" any other thread regardless of its internal state. Except for a quick and dirty panic exit (in case of some internal software error), this is the worst possible way of solving synchronization issues.
A design relying on such tricks would have to create implicit synchronizations between tasks to make sure the terminations occur in the proper order. A lot of software are designed that way, and most of them allow mediocre programmers to make a living by maintaining the pile of crap they created in the first place.
Task termination should be an issue solved at global design level, not by a toolbox of wonky objects that allow you to twist synchronization any odd way.

Java Thread Live Lock

I have an interesting problem related to Java thread live lock. Here it goes.
There are four global locks - L1,L2,L3,L4
There are four threads - T1, T2, T3, T4
T1 requires locks L1,L2,L3
T2 requires locks L2
T3 required locks L3,L4
T4 requires locks L1,L2
So, the pattern of the problem is - Any of the threads can run and acquire the locks in any order. If any of the thread detects that a lock which it needs is not available, it release all other locks it had previously acquired waits for a fixed time before retrying again. The cycle repeats giving rise to a live lock condition.
So, to solve this problem, I have two solutions in mind
1) Let each thread wait for a random period of time before retrying.
OR,
2) Let each thread acquire all the locks in a particular order ( even if a thread does not require all the
locks)
I am not convinced that these are the only two options available to me. Please advise.
Have all the threads enter a single mutex-protected state-machine whenever they require and release their set of locks. The threads should expose methods that return the set of locks they require to continue and also to signal/wait for a private semaphore signal. The SM should contain a bool for each lock and a 'Waiting' queue/array/vector/list/whatever container to store waiting threads.
If a thread enters the SM mutex to get locks and can immediately get its lock set, it can reset its bool set, exit the mutex and continue on.
If a thread enters the SM mutex and cannot immediately get its lock set, it should add itself to 'Waiting', exit the mutex and wait on its private semaphore.
If a thread enters the SM mutex to release its locks, it sets the lock bools to 'return' its locks and iterates 'Waiting' in an attempt to find a thread that can now run with the set of locks available. If it finds one, it resets the bools appropriately, removes the thread it found from 'Waiting' and signals the 'found' thread semaphore. It then exits the mutex.
You can twiddle with the algorithm that you use to match up the available set lock bools with waiting threads as you wish. Maybe you should release the thread that requires the largest set of matches, or perhaps you would like to 'rotate' the 'Waiting' container elements to reduce starvation. Up to you.
A solution like this requires no polling, (with its performance-sapping CPU use and latency), and no continual aquire/release of multiple locks.
It's much easier to develop such a scheme with an OO design. The methods/member functions to signal/wait the semaphore and return the set of locks needed can usually be stuffed somewhere in the thread class inheritance chain.
Unless there is a good reason (performance wise) not to do so,
I would unify all locks to one lock object.
This is similar to solution 2 you suggested, only more simple in my opinion.
And by the way, not only is this solution more simple and less bug proned,
The performance might be better than solution 1 you suggested.
Personally, I have never heard of Option 1, but I am by no means an expert on multithreading. After thinking about it, it sounds like it will work fine.
However, the standard way to deal with threads and resource locking is somewhat related to Option 2. To prevent deadlocks, resources need to always be acquired in the same order. For example, if you always lock the resources in the same order, you won't have any issues.
Go with 2a) Let each thread acquire all of the locks that it needs (NOT all of the locks) in a particular order; if a thread encounters a lock that isn't available then it releases all of its locks
As long as threads acquire their locks in the same order you can't have deadlock; however, you can still have starvation (a thread might run into a situation where it keeps releasing all of its locks without making forward progress). To ensure that progress is made you can assign priorities to threads (0 = lowest priority, MAX_INT = highest priority) - increase a thread's priority when it has to release its locks, and reduce it to 0 when it acquires all of its locks. Put your waiting threads in a queue, and don't start a lower-priority thread if it needs the same resources as a higher-priority thread - this way you guarantee that the higher-priority threads will eventually acquire all of their locks. Don't implement this thread queue unless you're actually having problems with thread starvation, though, because it's probably less efficient than just letting all of your threads run at once.
You can also simplify things by implementing omer schleifer's condense-all-locks-to-one solution; however, unless threads other than the four you've mentioned are contending for these resources (in which case you'll still need to lock the resources from the external threads), you can more efficiently implement this by removing all locks and putting your threads in a circular queue (so your threads just keep running in the same order).

Is Deadlock recovery possible in MultiThread programming?

Process has some 10 threads and all 10 threads entered DEADLOCK state( assume all are waiting for Mutex variable ).
How can you free process(threads) from DEADLOCK state ? .
Is there any way to kill lower priority thread ?( in Multi process case we can kill lower priority process when all processes in deadlock state).
Can we attach that deadlocked process to the debugger and assign proper value to the Mutex variable ( assume all the threads are waiting on a mutex variable MUT but it is value is 0 and can we assign MUT value to 1 through debugger ) .
If every thread in the app is waiting on every other, and none are set to time out, you're rather screwed. You might be able to run the app in a debugger or something, but locks are generally acquired for a reason -- and manually forcing a mutex to be owned by a thread that didn't legitimately acquire it can cause some big problems (the thread that previously owned it is still going to try and release it, the results of which can be unpredictable if the mutex is unexpectedly yanked away. Could cause an unexpected exception, could cause the mutex to be unlocked while still in use.) Anyway it defeats the whole purpose of mutexes, so you're just covering up a much bigger problem.
There are two common solutions:
Instead of having threads wait forever, set a timeout. This is slightly harder to do in languages like Java that embed mutexes into the language via synchronized or lock blocks, but it's almost always possible. If you time out waiting on the lock, release all the locks/mutexes you had and try later.
Better, but potentially much more complex, is to figure out why everything's fighting for the resource and remove that contention. If you must lock, lock consistently. But if there's 10 threads blocking on a single mutex, that could be a clue either that your operations are badly chunked (ie: that your threads are doing too much or too little at once before trying to acquire a lock), or that there's unnecessary locking going on. Don't lock unless you have to. Some synchronization could be obviated by using collections and algorithms specifically designed to be "lock-free" while still offering thread-safety.
Adding another answer because I don't agree with the solutions proposed by cHao earlier - the analysis is fine.
First, why I disagree with the two solutions offered:
Reduce contention
Contention doesn't lead to deadlocks. It just causes poor performance. Deadlock means no performance whatsoever. Therefore, reducing contention does not solve deadlocks.
timeout on mutex.
A mutex protects a resource, and a thread locks the mutex because it needs the resource. With a timeout, you won't be able to acquire the resource, and your thread fails. Does it solve the deadlock problem? Only if the failing thread releases another resource that was blocking the other threads.
But in that case, there's a much better solution. Mutexes should have a partial ordering. If there is at least one thread that can both mutex A and B, you should decide whether A or B is acquired first, and then stick with that. This must be a transitive order: if you lock A before B, and B before C, then obviously you must lock A before C.
This is a perfect solution to deadlocks. Look back at the timeout example: it only works if the thread that times out waiting on A then releases its lock on B, to release another thread that was waiting on B. In the most simple case, that other thread was itself directly locking A. Thus, the mutexes A and B are not properly ordered. You should have consistently locked either A or B first.
The timeout case could also be the result of a cyclic order problem; one thread locks A then B, another B then C, and a third C then A, with the deadlock happening when each thread owns one lock. The solution again is the same; order the locks.
Alternatively said, mutex lock orders can be described by a directed graph. If a thread locks A before B, there's an arc from A to B. Deadlocks appear if the directed graph is cyclic, and then the arcs of that cycle are the deadlocked threads.
This theory can be a bit complex, but there are some simple insights to be found. For instance, from the graph theory, we know that trees are acyclic graphs. Hence, neither "leaf mutexes" (those that are always locked last) nor "root mutexes" (those that are always locked first) can cause deadlocks. Leaf mutexes are excluded because no thread ever blocks holding them, and root mutexes are excluded because the thread that holds them will be able to lock all subsequent mutexes in due time.

Is a lock (threading) atomic?

This may sound like a stupid question, but if one locks a resource in a multi-threaded app, then the operation that happens on the resource, is that done atomically?
I.E.: can the processor be interrupted or can a context switch occur while that resource has a lock on it? If it does, then nothing else can access this resource until it's scheduled back in to finish off it's process. Sounds like an expensive operation.
The processor can very definitely still switch to another thread, yes. Indeed, in most modern computers there can be multiple threads running simultaneously anyway. The locking just makes sure that no other thread can acquire the same lock, so you can make sure that an operation on that resource is atomic in terms of that resource. Code using other resources can operate completely independently.
You should usually lock for short operations wherever possible. You can also choose the granularity of locks... for example, if you have two independent variables in a shared object, you could use two separate locks to protect access to those variables. That will potentially provide better concurrency - but at the same time, more locks means more complexity and more potential for deadlock. There's always a balancing act when it comes to concurrency.
You're exactly right. That's one reason why it's so important to lock for short period of time. However, this isn't as bad as it sounds because no other thread that's waiting on the lock will get scheduled until the thread holding the lock releases it.
Yes, a context switch can definitely occur.
This is exactly why when accessing a shared resource it is important to lock it from another thread as well. When thread A has the lock, thread B cannot access the code locked.
For example if two threads run the following code:
1. lock(l);
2. -- change shared resource S here --
3. unlock(l);
A context switch can occur after step 1, but the other thread cannot hold the lock at that time, and therefore, cannot change the shared resource. If access to the shared resource on one of the threads is done without a lock - bad things can happen!
Regarding the wastefulness, yes, it is a wasteful method. This is why there are methods that try to avoid locks altogether. These methods are called lock-free, and some of them are based on strong locking services such as CAS (Compare-And-Swap) or others.
No, it's not really expensive. There are typically only two possibilities:
1) The system has other things it can do: In this case, the system is still doing useful work with all available cores.
2) The system doesn't have anything else to do: In this case, the thread that holds the lock will be scheduled. A sane system won't leave a core unused while there's a ready-to-run thread that's not scheduled.
So, how can it be expensive? If there's nothing else for the system to do that doesn't require acquiring that lock (or not enough other things to occupy all cores) and the thread holding the lock is not ready-to-run. So that's the case you have to avoid, and the context switch or pre-empt issue doesn't matter (since the thread would be ready-to-run).

Resources