spin lock acquiring in linux - multithreading

I was just wondering, suppose PC is having multi cores. There are three threads running in three different cores. Thread(T1) has acquired spin lock(S) in core(C1) and acquired lock by T1, same time T2 and T3 threads running in core C2 and C3 try to acquire lock and waiting for release of lock. once T1 thread releases lock which thread will acquire lock either T2 or T3? I am considering same priority of T2 and T3,and also waiting in different cores same time.

The linux kernel uses MCS spin locks. The gist is that waiters end up adding themselves to a queue. However, if there are 2 threads doing this, there are no guarantees as to who is going to succeed first.
Similarly for more simplistic spin locks where the code just tries to flip the "taken" bit, there are no guarantees whatsoever. However, certain hardware characteristics can make it so certain cores have an easier time than others (if they share the same socket).
You want to read https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html
I repeat: if 2 different threads compete for a lock, there is no guaranteed order in which they will take it and looking for one is wrong in the first place.

Related

Do we count the main thread when we compute the recommended number of threads that we can create in C using Pthreads?

I have a computer with 1 cpu, 4 cores and 2 threads per core that can run. So I have effiency with maximum 8 running threads.
When I write a program in C and create threads using pthred_create function, how many threads is recommended to be created: 7 or 8? Do I have to substract the main thread, thus create 7, or main thread should not be counted and I can effiently crete 8? I know that in theory you can create much more, like thousands, but I want to be effiently planned, according with my computer architecture.
Which thread started which is not much relevant. A program's initial thread is a thread: while it is scheduled on an execution unit, no other thread can use that execution unit. You cannot have more threads executing concurrently than you have execution units, and if you have more than that eligible to run at any given time then you will pay the cost of extra context switches without receiving any offsetting gain from additional concurrency.
To a first approximation, then, yes, you must count the initial thread. But read the above carefully. The relevant metric is not how many threads exist at any given time, but rather how many are contending for execution resources. Threads that are currently blocked (on I/O, on acquiring a mutex, on pthread_join(), etc.) do not contend for execution resources.
More precisely, then, it depends on your threads' behavior. For example, if the initial thread follows the pattern of launching a bunch of other threads and then joining them all without itself performing any other work, then no, you do not count that thread, because it does not contend for CPU to any significant degree while the other threads are doing so.

Understanding issues with atomic lock operations in case of multi processors

In case of uniprocessor, we disable interrupts before performing a lock operation (Lock acquire, Lock release) to prevent context
switching, then after the operation we re-enable it.
But, in the case of multi-processor CPU, just disabling interrupts is not sufficient to make the lock operations atomic.
I read from a source that,
"It happens as each processor has a cache, and they can write to the same memory even with the interrupts being disabled."
Q1. Why this even matters in case of atomic lock operation?
Q2. What are the other issues that arise while implementing lock operations in multi-processor environment with only disabling the interrupts?
Only disabling interrupts is insufficient, as the threads running on multiprocessors can still access the data structures and codes inside the functions of synchronization objects at the same time, hence atomicity can not be achieved by just disabling the interrupts.
For example, let L be an LOCK object and L.status is "FREE" and a X is a process that has four threads T1, T2, T3, T4 and each of them are running on separate processors P1, P2, P3, P4.
Let's assume the pseudo code for LOCK::acquire() is as following,
LOCK::acquire(){
if(status==BUSY){
lock.waitList.add(RunningThread);
TCB t == readyList.remove();
thread_switch(RunningThread,t);
t.state=running;
}
else{
status=BUSY;
}
}
If we disable only the interrupts, the codes of T1,T2,T3,T4 can still run on the corresponding processors. Let's assume that the lock is free at one moment.
If, all the threads try to acquire the lock-L at the same time, it is possible that they might end up checking the status of the lock at the same time , and in that case each of the threads will find status=="FREE", and every threads will acquire the lock, which would eliminate the applicability of the current locks implementation.
That is why, different atomic operations, such as test_and_set are used when implementing lock objects for multi processors. These atomic operations would allow only one thread from one multiprocessor access lock's codes at a time.

What guarantee thread with spin lock on multiprocessor run on a different processor

I know spin lock only works on multiprocessor. But if two threads try to acquire the same resource and one is put on spinlock, what prevents the other one not running on the same processor? If it happens the one with spin lock will prevent the one holding the resources to exceed. In this case it becomes a deadlock. How does OS prevent it happen?
Some background facts first:
spin-locks (and locks generally) are not limited to multiprocessor systems. They work fine on single processor or even single-threaded application can use them without any harm.
spin-locks are not only provided by OS, they have pure user-space implementation as well. For example, tbb provides tbb::spin_mutex.
By default, nothing prevents a thread from running on any available CPU (regardless of the locks they use).
There are reentrant/recursive type of locks. It means that if a thread acquired it once, and tries to acquire it once again without releasing, it will succeed, not deadlock as usual locks. But it does not mean that the same applies to different threads just because they are scheduled to the same CPU. With any type of lock, if one software thread locked a mutex, other threads have to wait.
It is possible for one thread to acquire the lock and be preempted (i.e. interrupted by OS timer) before it releases the lock. Another thread can be scheduled to the same CPU and it might want to acquire the same lock. In case of pure spin-locks, this thread will uselessly spin until it exceeds its time-slice allowed by OS and will be preempted. Finally, the first thread will get a chance to run and release its lock so another thread will be able to acquire it.
As you can see, it is not quite efficient to spent the time on the hopeless waiting. Thus, more sophisticated implementations, after a number of attempts to acquire the spinlock, call OS for help in order to voluntary give away its time-slice to other threads which possibly can unlock the current one.

Java Thread Live Lock

I have an interesting problem related to Java thread live lock. Here it goes.
There are four global locks - L1,L2,L3,L4
There are four threads - T1, T2, T3, T4
T1 requires locks L1,L2,L3
T2 requires locks L2
T3 required locks L3,L4
T4 requires locks L1,L2
So, the pattern of the problem is - Any of the threads can run and acquire the locks in any order. If any of the thread detects that a lock which it needs is not available, it release all other locks it had previously acquired waits for a fixed time before retrying again. The cycle repeats giving rise to a live lock condition.
So, to solve this problem, I have two solutions in mind
1) Let each thread wait for a random period of time before retrying.
OR,
2) Let each thread acquire all the locks in a particular order ( even if a thread does not require all the
locks)
I am not convinced that these are the only two options available to me. Please advise.
Have all the threads enter a single mutex-protected state-machine whenever they require and release their set of locks. The threads should expose methods that return the set of locks they require to continue and also to signal/wait for a private semaphore signal. The SM should contain a bool for each lock and a 'Waiting' queue/array/vector/list/whatever container to store waiting threads.
If a thread enters the SM mutex to get locks and can immediately get its lock set, it can reset its bool set, exit the mutex and continue on.
If a thread enters the SM mutex and cannot immediately get its lock set, it should add itself to 'Waiting', exit the mutex and wait on its private semaphore.
If a thread enters the SM mutex to release its locks, it sets the lock bools to 'return' its locks and iterates 'Waiting' in an attempt to find a thread that can now run with the set of locks available. If it finds one, it resets the bools appropriately, removes the thread it found from 'Waiting' and signals the 'found' thread semaphore. It then exits the mutex.
You can twiddle with the algorithm that you use to match up the available set lock bools with waiting threads as you wish. Maybe you should release the thread that requires the largest set of matches, or perhaps you would like to 'rotate' the 'Waiting' container elements to reduce starvation. Up to you.
A solution like this requires no polling, (with its performance-sapping CPU use and latency), and no continual aquire/release of multiple locks.
It's much easier to develop such a scheme with an OO design. The methods/member functions to signal/wait the semaphore and return the set of locks needed can usually be stuffed somewhere in the thread class inheritance chain.
Unless there is a good reason (performance wise) not to do so,
I would unify all locks to one lock object.
This is similar to solution 2 you suggested, only more simple in my opinion.
And by the way, not only is this solution more simple and less bug proned,
The performance might be better than solution 1 you suggested.
Personally, I have never heard of Option 1, but I am by no means an expert on multithreading. After thinking about it, it sounds like it will work fine.
However, the standard way to deal with threads and resource locking is somewhat related to Option 2. To prevent deadlocks, resources need to always be acquired in the same order. For example, if you always lock the resources in the same order, you won't have any issues.
Go with 2a) Let each thread acquire all of the locks that it needs (NOT all of the locks) in a particular order; if a thread encounters a lock that isn't available then it releases all of its locks
As long as threads acquire their locks in the same order you can't have deadlock; however, you can still have starvation (a thread might run into a situation where it keeps releasing all of its locks without making forward progress). To ensure that progress is made you can assign priorities to threads (0 = lowest priority, MAX_INT = highest priority) - increase a thread's priority when it has to release its locks, and reduce it to 0 when it acquires all of its locks. Put your waiting threads in a queue, and don't start a lower-priority thread if it needs the same resources as a higher-priority thread - this way you guarantee that the higher-priority threads will eventually acquire all of their locks. Don't implement this thread queue unless you're actually having problems with thread starvation, though, because it's probably less efficient than just letting all of your threads run at once.
You can also simplify things by implementing omer schleifer's condense-all-locks-to-one solution; however, unless threads other than the four you've mentioned are contending for these resources (in which case you'll still need to lock the resources from the external threads), you can more efficiently implement this by removing all locks and putting your threads in a circular queue (so your threads just keep running in the same order).

Do mutexes block ALL threads at any point?

In Linux, say I have code with 100 threads. 5 of those threads compete over a shared resource protected by a mutex. I know that when the critical section is actually being run, only the 5 threads are subject to having their execution stopped if they try to obtain the lock and the other 95 threads will be running without issues.
My question is is there any point at which those other 95 threads' execution will be paused or affected, ie when the mutex/kernel/whatever is determining which threads are blocked on the mutex and which thread should get the lock, and which threads should be able to run because their not asking for the lock, etc
No, other threads are not affected.
The kernel doesn't ask which threads are affected by the lock. Each thread tells the kernel when it tries to acquire the lock.
When threads do that, they go to sleep and get into a special wake-up queue associated with the lock.
Threads that don't use the lock won't get into the same queue as those that do, so their blocking behavior is unrelated.

Resources