Why do I get a thread context switch every time I synchronize with a mutex? - multithreading

I have multiple threads updating a single array in tight loops. (10 threads on a dual-core processor # roughly 100000 updates per second). Each time the array is updated under the protection of a mutex (WaitForSingleObject / ReleaseMutex). I have noticed that no thread ever does two consecutive updates to the array which means there must be some sort of yield relating to the synchronization. This means there are about 100000 context switches happening every second which seems sub-optimal. Why does this happen ?

The problem here is that there is an order of all waiting threads.
Each thread blocked in a WaitForSingleObject goes into a queue and is then suspended by the scheduler so that it does not eat up execution time anymore. When the mutex is freed, one of the waiting threads is resumed by the scheduler. It is unspecified what the exact order is in which threads are wakened from the queue, but in many cases it will be a simple first-in, first-out.
What happens now is that if the same thread releases the mutex and then does another WaitForSingleObject on the same mutex, he is going to be re-inserted into the queue and it is quite unlikely that he will be inserted at the front of the queue if there are already other threads waiting. This makes sense, as allowing him to skip to the front of the queue could lead to other threads starving. So the scheduler will probably just suspend him and wake the the thread that is at the front of the queue instead.

I guess this is because of the multi processor.
When the first thread (running on the first processor) release the mutex, the second thread (on the second processor) got it, then when the first thread try to get the mutex, it can not. When the mutex is finally released by the second thread, it is taken by the third thread (on the first processor).

Related

How the epoll(), mutex and semaphore alike system calls are implemented behind the scene?

This is really a question confusing me for a long time. I tried googling a lot but still don't quite understand. My question is like this:
for system calls such as epoll(), mutex and semaphore, they have one thing in common: as soon as something happens(taking mutex for example, a thread release the lock), then a thread get woken up(the thread who are waiting for the lock can be woken up).
I'm wondering how is this mechanism(an event in one thread happens, then another thread is notified about this) implemented on earth behind the scene? I can only come up with 2 ways:
Hardware level interrupt: For example, as soon as another thread releases the lock, an edge trigger will happen.
Busy waiting: busy waiting in very low level. for example, as soon as another thread releases the lock, it will change a bit from 0 to 1 so that threads who are waiting for the lock can check this bit.
I'm not sure which of my guess, if any, is correct. I guess reading linux source code can help here. But it's sort of hard to a noob like me. It will be great to have a general idea here plus some pseudo code.
Linux kernel has a built-in object class called "wait queue" (other OSes have similar mechanisms). Wait queues are created for all types of "waitable" resources, so there are quite a few of them around the kernel. When thread detects that it must wait for a resource, it joins the relevant wait queue. The process goes roughly as following:
Thread adds its control structure to the linked list associated with the desired wait queue.
Thread calls scheduler, which marks the calling thread as sleeping, removes it from "ready to run" list and stashes its context away from the CPU. The scheduler is then free to select any other thread context to load onto the CPU instead.
When the resource becomes available, another thread (be it a user/kernel thread or a task scheduled by an interrupt handler - those usually piggy back on special "work queue" threads) invokes a "wake up" call on the relevant wait queue. "Wake up" means, that scheduler shall remove one or more thread control structures from the wait queue linked list and add all those threads to the "ready to run" list, which will enable them to be scheduled in due course.
A bit more technical overview is here:
http://www.makelinux.net/ldd3/chp-6-sect-2

Difference between a semaphore and a conditional variable

I am implementing conditional wait, and both semaphore or conditional varible can be used to implement it. Is there any difference between the two? More specifically from the performance point of view?
I have heard that when a thread waits on a conditional variable it is not scheduled until it is signaled. This ensures that it does not consume CPU cycle. But this is not true for a semaphore and a semaphore will consume CPU cycle even if it is waiting?
If all of your threads are waiting for some event, e.g., submission of a task, then you can wake them all up by using a condition variable upon an event.
If you have a limited resource, say 10 pages of memory reserved for your threads, then you will need them to wait until a page is available. When this happens, you will need to let just one thread start execution. In this case you can use a semaphore unlock up as many threads as available pages.
A semaphore has extra state - a count of units held - as well as a queue for threads waiting on it, so allowing a sema to, say, record how many times it has been signaled even if there is no thread currently waiting on it. If a thread loops around a semaphore wait() and the semaphore is signaled N times, the thread will eventually loop N times, even if the thread is sometimes busy when the sema is signaled - very useful for producer-consumer queues.
A condvar does not have this extra count state, but it can release a lock that it is bound to until a thread signals it - very useful for producer-consumer queues.
Sometimes, I wish for a combination of the two - a condvar with a count, but this does not seem to to be forthcoming from OS developers :(
A semaphore and condvar are the same in that they are both synchro primitives. Apart from that..
conditional variable and binary semaphore both block thread until specified signaled condition true , and both are same you can use any one but conditional varibale always use with mutex . in both cases you scheduled only by signal without it you can not scheduled . But in case to maintain number of resources in this case you use counting semaphore .
Both are not consume cpu when you use mutex.

Mechanics of Condition.Signal()

If I had threads as below
void thread(){
while() {
lock.acquire();
if(condition not true)
{
Cond.wait()
}
// blah blah
Cond.Signal();
lock.release();
}
}
Well I guess my main question is that whether the signalling thread continues running for a while after cond.signal() or immediately gives up the CPU?. I would like it in some cases not to release the lock before the woken up thread finishes execution and in some other cases it may be beneficial to release the lock immediately after signalling, without waiting for the other woken thread to finish.
I understand that if there are any threads waiting on the condition then they get woken up on Cond.signal(). But what do you mean by woekn up - put on the ready queue or does the scheduler make sure that it runs immediately?.
and what about the signalling thread.. does it go to sleep on the same condtion upon signalling? .. so then some other thread has to wake it up to make it release the lock?.
This is in large part dependent on your environment (OS, library, language...) and how the synchronisation primitives are implemented. Since you haven't specified any I'll just give a general answer.
When putting a thread to sleep, most environment will choose to remove it from the scheduler's ready queue and the thread will give up its remaining CPU time. When woken up, the thread is simply placed back into the ready queue and will resume execution the next time the scheduler selects it from the queue.
It's also possible that the thread will do some active waiting (spinning) instead of being removed from the scheduler's ready queue. In this case, the thread will resume execution right away. Note that since a thread can still be run out of CPU of time while spinning, it might have to wait to be rescheduled before waking up. This is a useful strategy if your critical sections are very small and you don't want to pay for the scheduling overheads.
A hybrid approach would be to do a small amount of active waiting before removing the thread from the scheduler's ready queue.
As for the signaling thread, unless specified explicitly by your environment (I can't of any reasons but you never know), I wouldn't expect a call to signal() to block in a way that you have to wake it up. Signal() might have to synchronize itself with other threads calling signal() but those are implementation details and you shouldn't have to do anything about it.

Advantages of using condition variables over mutex

I was wondering what is the performance benefit of using condition variables over mutex locks in pthreads.
What I found is : "Without condition variables, the programmer would need to have threads continually polling (possibly in a critical section), to check if the condition is met. This can be very resource consuming since the thread would be continuously busy in this activity. A condition variable is a way to achieve the same goal without polling." (https://computing.llnl.gov/tutorials/pthreads)
But it also seems that mutex calls are blocking (unlike spin-locks). Hence if a thread (T1) fails to get a lock because some other thread (T2) has the lock, T1 is put to sleep by the OS, and is woken up only when T2 releases the lock and the OS gives T1 the lock. The thread T1 does not really poll to get the lock. From this description, it seems that there is no performance benefit of using condition variables. In either case, there is no polling involved. The OS anyway provides the benefit that the condition-variable paradigm can provide.
Can you please explain what actually happens.
A condition variable allows a thread to be signaled when something of interest to that thread occurs.
By itself, a mutex doesn't do this.
If you just need mutual exclusion, then condition variables don't do anything for you. However, if you need to know when something happens, then condition variables can help.
For example, if you have a queue of items to work on, you'll have a mutex to ensure the queue's internals are consistent when accessed by the various producer and consumer threads. However, when the queue is empty, how will a consumer thread know when something is in there for it to work on? Without something like a condition variable it would need to poll the queue, taking and releasing the mutex on each poll (otherwise a producer thread could never put something on the queue).
Using a condition variable lets the consumer find that when the queue is empty it can just wait on the condition variable indicating that the queue has had something put into it. No polling - that thread does nothing until a producer puts something in the queue, then signals the condition that the queue has a new item.
You're looking for too much overlap in two separate but related things: a mutex and a condition variable.
A common implementation approach for a mutex is to use a flag and a queue. The flag indicates whether the mutex is held by anyone (a single-count semaphore would work too), and the queue tracks which threads are in line waiting to acquire the mutex exclusively.
A condition variable is then implemented as another queue bolted onto that mutex. Threads that got in line to wait to acquire the mutex can—usually once they have acquired it—volunteer to get out of the front of the line and get into the condition queue instead. At this point, you have two separate sets of waiters:
Those waiting to acquire the mutex exclusively
Those waiting for the condition variable to be signaled
When a thread holding the mutex exclusively signals the condition variable, for which we'll assume for now that it's a singular signal (unleashing no more than one waiting thread) and not a broadcast (unleashing all the waiting threads), the first thread in the condition variable queue gets shunted back over into the front (usually) of the mutex queue. Once the thread currently holding the mutex—usually the thread that signaled the condition variable—relinquishes the mutex, the next thread in the mutex queue can acquire it. That next thread in line will have been the one that was at the head of the condition variable queue.
There are many complicated details that come into play, but this sketch should give you a feel for the structures and operations in play.
If you are looking for performance, then start reading about "non blocking / non locking" thread synchronization algorithms. They are based upon atomic operations, which gcc is kind enough to provide. Lookup gcc atomic operations. Our tests showed we could increment a global value with multiple threads using atomic operation magnitudes faster than locking with a mutex. Here is some sample code that shows how to add items to and from a linked list from multiple threads at the same time without locking.
For sleeping and waking threads, signals are much faster than conditions. You use pthread_kill to send the signal, and sigwait to sleep the thread. We tested this too with the same kind of performance benefits. Here is some example code.

Mutex lock: what does "blocking" mean?

I've been reading up on multithreading and shared resources access and one of the many (for me) new concepts is the mutex lock. What I can't seem to find out is what is actually happening to the thread that finds a "critical section" is locked. It says in many places that the thread gets "blocked", but what does that mean? Is it suspended, and will it resume when the lock is lifted? Or will it try again in the next iteration of the "run loop"?
The reason I ask, is because I want to have system supplied events (mouse, keyboard, etc.), which (apparantly) are delivered on the main thread, to be handled in a very specific part in the run loop of my secondary thread. So whatever event is delivered, I queue in my own datastructure. Obviously, the datastructure needs a mutex lock because it's being modified by both threads. The missing puzzle-piece is: what happens when an event gets delivered in a function on the main thread, I want to queue it, but the queue is locked? Will the main thread be suspended, or will it just jump over the locked section and go out of scope (losing the event)?
Blocked means execution gets stuck there; generally, the thread is put to sleep by the system and yields the processor to another thread. When a thread is blocked trying to acquire a mutex, execution resumes when the mutex is released, though the thread might block again if another thread grabs the mutex before it can.
There is generally a try-lock operation that grab the mutex if possible, and if not, will return an error. But you are eventually going to have to move the current event into that queue. Also, if you delay moving the events to the thread where they are handled, the application will become unresponsive regardless.
A queue is actually one case where you can get away with not using a mutex. For example, Mac OS X (and possibly also iOS) provides the OSAtomicEnqueue() and OSAtomicDequeue() functions (see man atomic or <libkern/OSAtomic.h>) that exploit processor-specific atomic operations to avoid using a lock.
But, why not just process the events on the main thread as part of the main run loop?
The simplest way to think of it is that the blocked thread is put in a wait ("sleeping") state until the mutex is released by the thread holding it. At that point the operating system will "wake up" one of the threads waiting on the mutex and let it acquire it and continue. It's as if the OS simply puts the blocked thread on a shelf until it has the thing it needs to continue. Until the OS takes the thread off the shelf, it's not doing anything. The exact implementation -- which thread gets to go next, whether they all get woken up or they're queued -- will depend on your OS and what language/framework you are using.
Too late to answer but I may facilitate the understanding. I am talking more from implementation perspective rather than theoretical texts.
The word "blocking" is kind of technical homonym. People may use it for sleeping or mere waiting. The term has to be understood in context of usage.
Blocking means Waiting - Assume on an SMP system a thread B wants to acquire a spinlock held by some other thread A. One of the mechanisms is to disable preemption and keep spinning on the processor unless B gets it. Another mechanism probably, an efficient one, is to allow other threads to use processor, in case B does not gets it in easy attempts. Therefore we schedule out thread B (as preemption is enabled) and give processor to some other thread C. In this case thread B just waits in the scheduler's queue and comes back with its turn. Understand that B is not sleeping just waiting rather passively instead of busy-wait and burning processor cycles. On BSD and Solaris systems there are data-structures like turnstiles to implement this situation.
Blocking means Sleeping - If the thread B had instead made system call like read() waiting data from network socket, it cannot proceed until it gets it. Therefore, some texts casually use term blocking as "... blocked for I/O" or "... in blocking system call". Actually, thread B is rather sleeping. There are specific data-structures known as sleep queues - much like luxury waiting rooms on air-ports :-). The thread will be woken up when OS detects availability of data, much like an attendant of the waiting room.
Blocking means just that. It is blocked. It will not proceed until able. You don't say which language you're using, but most languages/libraries have lock objects where you can "attempt" to take the lock and then carry on and do something different depending on whether you succeeded or not.
But in, for example, Java synchronized blocks, your thread will stall until it is able to acquire the monitor (mutex, lock). The java.util.concurrent.locks.Lock interface describes lock objects which have more flexibility in terms of lock acquisition.

Resources