Here in this video at 26:00, there is an implementation of a lock that tries to avoids busy waiting as much as possible by using a wait queue, the code looks like this (pseudo code):
int guard = 0;
int value = FREE;
Acquire()
{
while (test_and_set(guard));
if (value == BUSY) {
release_guard_and_wait();
} else {
value = BUSY;
guard = 0;
}
}
Release()
{
while (test_and_set(guard));
if (!wait_queue.empty())
wake_one();
else
value = FREE;
guard = 0;
}
test_and_set is an atomic operation that returns the old value of guard and sets it to 1.
release_guard_and_wait has to be atomic as well to avoid potential problems:
If the thread waits then releases the guard when it wakes up, no thread will be able to acquire it.
If the thread releases the guard then waits, this scenario might happen:
thread 1 (in Acquire) -> guard = 0;
thread 2 (in Release) -> test_and_set(guard);
thread 2 (in Release) -> wake_one();
thread 1 (in Acquire) -> wait();
thread 2 (in Release) -> guard = 0;
wake_one wakes one thread (takes it from the wait queue and puts it in the ready queue).
My question is, why using guard? isn't this redundant?
The code without guard may look like this:
int value = 0;
Acquire()
{
while (test_and_set(value))
wait();
}
Release()
{
value = 0;
wake_one();
}
Will these two implementations behave differently under some conditions? Is there any advantage in using the guard?
There are two big problems with your code.
First, your code has a race condition. Consider:
Thread 1 holds the lock, it calls Release.
Thread 2 wants the lock, it calls Acquire.
Thread 1 sets value to zero.
Thread 2 passes the test_and_set.
Thread 1 calls wake_one, it doesn't do anything.
Thread 2 calls wait, it is waiting for a wakeup that already happened.
Oops, deadlock. This is why you need an atomic release_guard_and_wait function.
Second problem:
If two threads call Acquire at the same time, your code will only cause one of them to wait. The other one will do horrible things, for example it will:
Keep a core busy, preventing other cores from reaching their peak speeds on many CPUs with adaptive clock speeds.
Waste power.
Starve another thread running in the same core on CPUs with hyperthreading and similar technologies.
When the spinning thread finally does pass the test_and_set loop, it will take a massive mispredicted branch penalty. So if several threads are waiting, each one will stall just as it gets the lock. Yuck.
On some CPUs, a test_and_set loop will cause inter-core traffic even if the comparison fails. So you may saturate inter-core buses, slowing other innocent threads (and the one holding the lock) to a crawl.
And so on.
I hate to see the test and set loop in the original code (that's only appropriate in toy code, even for very short times) but at least it won't spin for the whole time another thread holds the lock as yours will.
"there is an implementation of a lock that avoids busy waiting by using a wait channel" -- I could still see a busy waiting, in the form of this while (test_and_set(guard));). But the essence of the code is making that busy wait for a short period. All your code does is this:
Declare a lock-queue where a process can register itself for a lock.
Add the process to that lock-queue, which is interested in acquiring the lock.
Release one process from the lock-queue, when an already holding process releases the lock.
Acquire()
while (test_and_set(guard)); -- Get the gaurd for editing the lock-queue.
if (value == BUSY) {release_guard_and_wait();} -- If the lock is already acquired, add yourself to the lock-queue, and release the guard on lock-queue so that other processes may add themselves to the lock-queue. And wait till you are given a call to wake up.
else { value = BUSY; guard = 0;} -- If no process acquired the lock, then acquire by yourself and release the guard on the lock-queue.
Release()
while (test_and_set(guard)); -- Get the gaurd for editing the lock-queue.
if (!wait_queue.empty()) wake_one(); -- If the lock queue is not empty then wake one process.
else value = FREE; -- If no process is waiting for the lock in the lock-queue, just release the lock.
guard = 0; -- Of course at the end, release the guard on the lock-queue, so other processes can edit the queue.
Now coming to your modified code, you can immediately find that two processes running acquire() and release() may edit the queue at the same instant. Moreover, multiple processes trying to acquire the lock at the same time, may also corrupt the lock-queue and leave it in a broken state.
Related
I am going to explain my understanding of this OS construct and appreciate some polite correction.
I understand thread-safety clearly and simply.
If there is some setup where
X: some condition
Y: do something
and
if X
do Y
is atomic, meaning that if at the exact moment in time
doing Y
not X
there is some problem.
By my understanding, the lowest-level solution of this is to use shared objects (mutexes). As an example, in the solution to the "Too Much Milk" Problem
Thead A | Thread B
-------------------------------------
leave Note A | leave Note B
while Note B | if no Note A
do nothing | if no milk
if no milk | buy milk
buy milk | remove Note B
remove Note A |
Note A and Note B would be the shared objects, i.e. some piece of memory accessible by both threads A and B.
This is can be generalized (beyond milk) for 2-thread case like
Thead A | Thread B
-------------------------------------
leave Note A | leave Note B
while Note B | if no Note A
do nothing | if X
if X | do Y
do Y | remove Note B
remove Note A |
and there is some way to generalize it for the N-thread case (so I'll continue referring to the 2-thread case for simplicity).
Possibly incorrect assumption #1: This is the lowest-level solution known (possible?).
Now one of the defficiencies of this solution is the spinning or busy-wait
while Note B
do nothing
because if the do Y is an expensive task then the thread scheduler will keep switching to Thread A to perform this check, i.e. the thread is still "awake" and using processing power even when we "know" its processing is to perform a check that will fail for some time.
The question then becomes: Is there some way we could make Thread A "sleep", so that it isn't scheduled to run until Note B is gone, and then "wake up"???
The Condition Variable design pattern provides a solution and it built on top of mutexes.
Possibly incorrect assumption #2: Then, isn't there still some spinning under the hood? Is the average amount of spinning somehow reduced?
I could use a logical explanation like only S.O. can provide ;)
Isn't there still some spinning under the hood.
No. That's the whole point of condition variables: It's to avoid the need for spinning.
An operating system scheduler creates a private object to represent each thread, and it keeps these objects in containers which, for purpose of this discussion, we will call queues.
Simplistic explanation:
When a thread calls condition.await(), that invokes a system call. The scheduler handles it by removing the calling thread from whatever CPU it was running on, and by putting its proxy object into a queue. Specifically, it puts it into the queue of threads that are waiting to be notified about that particular condition.
There usually is a separate queue for every different thing that a thread could wait for. If you create a mutex, the OS creates a queue of threads that are waiting to acquire the mutex. If you create a condition variable, the OS creates a queue of threads that are waiting to be notified.
Once the thread's proxy object is in that queue, nothing will wake it up until some other thread notifies the condition variable. That notification also is a system call. The OS handles it (simplest case) by moving all of the threads that were in the condition variable's queue into the global run queue. The run queue holds all of the threads that are waiting for a CPU to run on.
On some future timer tick, the OS will pick the formerly waiting thread from the run queue and set it up on a CPU.
Extra credit:
Bad News! the first thing the thread does after being awakened, while it's still inside the condition.await() call, is it tries to re-lock the mutex. But there's a chance that the thread that signalled the condition still has the mutex locked. Our victim is going to go right back to sleep again, this time, waiting in the queue for the mutex.
A more sophisticated system might be able to optimize the situation by moving the thread directly from the condition variable's queue to the mutex queue without ever needing to wake it up and then put it back to sleep.
yes, on the lowest, hardware level instructions like Compare-and-set, Compare-and-swap are used, which spin until the condition is met, and only then make set (assignment). This spin is required each time we put a thread in a queue, be it queue to a mutex, to condition or to processor.
Then, isn't there still some spinning under the hood? Is the average amount of spinning somehow reduced?
That's a decision for the implementation to make. If spinning works best on the platform, then spinning can be used. But almost no spinning is required.
Typically, there's a lock somewhere at the lowest level of the implementation that protects system state. That lock is only held by any thread for a tiny split second as it manipulates that system state. Typically, you do need to spin while waiting for that inner lock.
A block on a mutex might look like this:
Atomically try to acquire the mutex.
If that succeeds, stop, you are done. (This is the "fast path".)
Acquire the inner lock that no thread holds for more than a few instructions.
Mark yourself as waiting for that mutex to be acquired.
Atomically release the inner lock and set your thread as not ready-to-run.
Notice the only place that there is any spinning in here is in step 3. That's not in the fast path. No spinning is needed after the call in step 5 does not return to this thread until the lock is conveyed to this thread by the thread that held it.
When a thread releases the lock, it checks the count of threads waiting for the lock. If that's greater than zero, instead of releasing the lock, it acquires the inner lock protecting system state, picks one of the threads recorded as waiting for the lock, conveys the lock to that thread, and tells the scheduler to run that thread. That thread then sees step 5 return from its call with now holding the lock.
Again, the only waiting is on that inner lock that is used just to track what thread is waiting for what.
I have stumbled upon the term spinning, referring to a thread while reading this (ROS)
What is the general concept behind spinning a thread?
My intuition would say that a spinning thread is a thread that keeps executing in a multithreading process with a certain frequency, somewhat related to the concept of polling (i.e. keep checking some condition with a certain frequency) but I am not sure at all about it.
Could you give some explanation? The more general the better.
There are a couple of separate concepts here.
In terms of ROS (the link you reference), ros::spin() runs the ROS callback invoker, so that pending events are delivered to your program callbacks via a thread belonging to your program. This sort of call typically does not return; it will wait for new events to be ready, and invoke appropriate callbacks when they occur.
But you also refer to "spinning a thread."
This is a separate topic. It generally relates to a low level programming pattern whereby a thread will repeatedly check for some condition being met without being suspended.
A common way to wait for some condition to be met is to just wait on a conditional variable. In this example, the thread will be suspended by the kernel until some other thread calls notify on the condition variable. Upon the notify, the kernel will resume the thread, and the condition will evaluate to true, allowing the thread to continue.
std::mutex m;
std::condition_variable cv;
bool ready = false;
std::unique_lock<std::mutex> lk(m);
cv.wait(lk, []{ return ready; }); /* thread suspended */
Alternatively a spinning approach would repeatedly check some condition, without going to sleep. Caution: this results in high CPU, and there are subtle caveats to implementing correctly).
Here is an example of a simple spinlock (although note that spinning threads can be used for other purposes than spinlocks). In the below code, notice that the while loop repeatedly calls test_and_set ... which is just an attempt to set the flag to true; that's the spin part.
// spin until true
std::atomic_flag lock = ATOMIC_FLAG_INIT;
while (lock.test_and_set(std::memory_order_acquire)); // acquire lock
/* got the flag .. do work */
lock.clear(std::memory_order_release); // release lock
spin like while loop without sleeping, your task consumes cpu resource constantly until the conditions is satisfied.
As per multithreading concepts, I have learned so far:
MUTEX: A mutex_t object can be used for managing access to a resource.
BINARY SEMAPHORE: A sem_t object can also be used to manage access to a resource
Differenrce between two: The concept of ownership i.e. in case of mutex_t the thread which locked the mutex_t, only it can unlock it. But in case of sem_t, there is no concpt of ownership and hence any thread can perform sem_post() on the sem_t object. This is the reson it can be used as Event signals.
Now suppose my crital section appears as:
A) Using mutex_t
typedef struct {
int count;
pthread_mutex_t mutex;
}counter;
counter count;
void increment(counter* c)
{
pthread_mutex_lock(&(c->mutex));
(c->count)++;
pthread_mutex_unlock(&(c->mutex));
}
B) Using sem_t
EDIT: (Binary semaphore i.e. initialized to 1)
typedef struct {
int count;
sem_t sem;
}counter;
counter count;
void increment(counter* c)
{
sem_wait(&(c->sem));
(c->count)++;
sem_post(&(c->sem));
}
In case of B) till the time sem is zero no thread can enter the critical section and hence providing access control. But suppose due to some event sem_post() is executed by some other thread then it will allow access to critical section by other threads.
In this case this is actually a buggy situation and not a proper a access control.And hence programmer has to be careful with use of binary semaphore for resource access control.
I can conclude, it always better to use mutex_t for access control and binary semaphore for event signalling.
Please let me know if my understanding is correct or am I missing something?
Mutexes are very specific in their purpose. Like you said, they can only ever be released by the thread currently holding the mutex, and they only allow single entrancy. A mutex is effectively a semaphore initialized to a count of 1 that also verifies that the thread calling post is the thread that last called wait ('ownership' invariance).
Your example does not show the initalization conditions for your semaphore. If you wanted to use it in the same way as a mutex, it would have to be initialized to a count of 1.
Semaphores do have a wide range of uses, so I wouldn't call them 'unsafe'. For example, lets say that you have some resource that allows, say, 5 total consumers to be running. So you would protect that resource with a semaphore initialized to 5. As consumers invoke the resource, the semaphore will tick down until it hits zero, at which point it'll block new consumers until running consumers increment the semaphore. This is called a counting semaphore.
A great example of how to use a counting semaphore is a blocking queue - a simple Queue that has an Enqueue and a Dequeue method. Each Enqueue increments the semaphore, and each Dequeue decrements the semaphore. In this way, if the queue is empty, Dequeuers will be blocked.
Another example of how a semaphore could be used would be in a simple signaling situation, as you mention:
Thread A enqueues 10 jobs into a thread pool to run them in parallel. Each job has a semaphore associated with it that is initially set to 0. When the job has been completed by the thread pool, the thread pool posts to the jobs semaphore. Meanwhile, Thread A is waiting on each job's semaphore to find out when they complete.
So we see many uses for a semaphore:
Making Mutexs, where the initial count = 1
Protecting limited resources, where the initial count = N, the limit
Counting queued work, where the initial count is = 0 and grows with queued work
Signaling job completion, where the initial count = 0, and the semaphore is used to coordinate two threads.
I would remind you that locking in general is a topic that requires close attention - in systems that have complex interactions, its sometimes very easy to use too little locking, causing unintended concurrency, or too much locking, causing deadlocks.
If I had threads as below
void thread(){
while() {
lock.acquire();
if(condition not true)
{
Cond.wait()
}
// blah blah
Cond.Signal();
lock.release();
}
}
Well I guess my main question is that whether the signalling thread continues running for a while after cond.signal() or immediately gives up the CPU?. I would like it in some cases not to release the lock before the woken up thread finishes execution and in some other cases it may be beneficial to release the lock immediately after signalling, without waiting for the other woken thread to finish.
I understand that if there are any threads waiting on the condition then they get woken up on Cond.signal(). But what do you mean by woekn up - put on the ready queue or does the scheduler make sure that it runs immediately?.
and what about the signalling thread.. does it go to sleep on the same condtion upon signalling? .. so then some other thread has to wake it up to make it release the lock?.
This is in large part dependent on your environment (OS, library, language...) and how the synchronisation primitives are implemented. Since you haven't specified any I'll just give a general answer.
When putting a thread to sleep, most environment will choose to remove it from the scheduler's ready queue and the thread will give up its remaining CPU time. When woken up, the thread is simply placed back into the ready queue and will resume execution the next time the scheduler selects it from the queue.
It's also possible that the thread will do some active waiting (spinning) instead of being removed from the scheduler's ready queue. In this case, the thread will resume execution right away. Note that since a thread can still be run out of CPU of time while spinning, it might have to wait to be rescheduled before waking up. This is a useful strategy if your critical sections are very small and you don't want to pay for the scheduling overheads.
A hybrid approach would be to do a small amount of active waiting before removing the thread from the scheduler's ready queue.
As for the signaling thread, unless specified explicitly by your environment (I can't of any reasons but you never know), I wouldn't expect a call to signal() to block in a way that you have to wake it up. Signal() might have to synchronize itself with other threads calling signal() but those are implementation details and you shouldn't have to do anything about it.
In Qt, I have a method which contains a mutex lock and unlock. The problem is when the mutex is unlock it sometimes take long before the other thread gets the lock back. In other words it seems the same thread can get the lock back(method called in a loop) even though another thread is waiting for it. What can I do about this? One thread is a qthread and the other thread is the main thread.
You can have your thread that just unlocked the mutex relinquish the processor. On Posix, you do that by calling pthread_yield() and on Windows by calling Sleep(0).
That said, there is no guarantee that the thread waiting on the lock will be scheduled before your thread wakes up again.
It shouldn't be possible to release a lock and then get it back if some other thread is already waiting on it.
Check that you actually releasing the lock when you think you do. Check that waiting thread actually waits (and not spins a loop with a trylock tests and sleeps, I actually done that once and was very puzzled at first :)).
Or if waiting thread really never gets time to even reach locking code, try QThread::yieldCurrentThread(). This will stop current thread and give scheduler a chance to give execution to somebody else. Might cause unnecessary switching depending on tightness of your loop.
If you want to make sure that one thread has priority over the other ones, an option is to use a QReadWriteLock. It's adapted to a typical scenario where n threads are going to read a value in a infinite loop, with only one thread updating it. I think it's the scenario you described.
QReadWriteLock offers two ways to lock: lockForRead() and lockForWrite(). The threads depending on the value will use the latter, while the thread updating the value (typically via the GUI) will use the former (lockForWrite()) and will have top priority. You won't need to sleep or yield or whatever.
Example code
Let's say you have a QReadWrite lock; somewhere.
"Reader" thread
forever {
lock.lockForRead();
if (condition) {
do_stuff();
}
lock.unlock();
}
"Writer" thread
// external input (eg. user) changes the thread
lock.lockForWrite(); // will block as soon as the reader lock ends
update_condition();
lock.unlock();