Producer consumer using only 1 additional semaphore - multithreading

Solution traditional to producer-consumer
In Operating-Systems, as you see in the link above for producer consumer, two semaphores full and empty are used, why is it not possible to do this using only one quantity semaphore fullEmpty.
What I mean is, we have a binary semaphore mutex and another semaphore fullEmpty, which is initially 0 because there are no items in the buffer, so why do we need 2 semaphores (full, empty)?
The only thing is the order of wait and signal need to be changed so that the updating of fullEmpty is within the critical section.
Any thoughts or reasons?

The key statement in the description that relates to your answer is "We have a buffer of fixed size."
For the sake of answering your question, let's first assume that the buffer can expand to fit as many items as needed, or in other words the buffer can grow to an unlimited size. In this case, the only synchronization that would need to occur between producers and consumers (apart from locking the mutex to ensure that you don't corrupt items in the critical section) would be ensuring that consumers only consume items after they have been produced by a producer. You could solve this problem with just a mutex and one semaphore. Here is some code, which I borrowed and changed from the link you shared:
Producer
do {
//produce an item
wait(mutex);
//place in buffer
signal(mutex);
signal(full);
} while (true);
Consumer
do {
wait(full);
wait(mutex);
//remove item from buffer
signal(mutex);
//consume item
} while (true);
As you can see above, the producer is always able to add things to the queue (apart from when the mutex is being held) and doesn't need to wait for consumers to consume anything because the buffer will never fill, even if the consumers don't consume items. On the other hand, consumers can't consume anything until producers have produced items.
To answer your question, you need to focus on the statement, "We have a buffer of fixed size." This changes the problem. Since the buffer is no longer able to grow to an unlimited size, you need to get producers to wait when the buffer is full before they can add more things to the buffer. This is why you need a second semaphore. Not only do consumers need to wait for producers, but now producers need to wait for consumers. You get producers to wait for consumers by getting them to call wait on a semaphore that only consumers call signal on.
You can't do this with only one semaphore because the conditions when a producer has to wait are different from the conditions when a consumer has to wait. Since they should be able to decrement and advance past the semaphore in different conditions, you can't use the same semaphore for both.

This is because there 2 conditions you have to wait for: the queue is empty and the queue is full. But classic semaphore allows you to wait only for one condition - wait until semaphore is not 0.
You can solve this problem using a single synchronization object, but such object needs to be more feature-full than a semaphore. "Bounded semaphore" - semaphore that has a maximum value should be enough as it allows you to block waiting for both conditions.
How to get one is another question:
You can build one using mutex and condition variable.
Window's semaphore already has this functionality.
You can use futex on Linux (see FUTEX_WAIT, FUTEX_WAKE) or equivalents on other OSes: on FreeBSD use _umtx_op (see UMTX_OP_WAIT, UMTX_OP_WAKE), on Windows 8 and newer use WaitOnAddress, WakeByAddressSingle/WakeByAddressAll.
I suggest you to get familiar with futex interface - with it you can build more powerful and more efficient synchronization objects than regulars ones. Today most OSes provide an equivalent interface and even C++ might introduce something similar in the future (see std::synchronic<T>).
Few notes:
Linux has eventfd which can act as a semaphore when created with EFD_SEMAPHORE flag, but it has maximum value of 0xfffffffffffffffe and it cannot be changed. Maybe some day this syscall will be extended to support maximum value too.

Related

How to unblock all threads waiting on a semaphore?

I am dealing with a standard producer and consumer problem with finite array (or finitely many buffers ). I tried implementing it using semaphores and I have run into a problem. I want the producer to 'produce' only say 50 times. After that I want the producer thread to join the main thread. This part is easy, but what I am unable to do is to join the consumer threads. They are stuck on the semaphore signaling that there is no data. How do I solve this problem?
One possible option is to have a flag variable which becomes True when producer joins main and after that, the main thread would do post(semaphore) as many times as the number of worker threads. The worker threads would check the flag variable every time after waking up and if True, it would exit the function.
I think my method is pretty inefficient because of the many post semaphore calls. It would be great if I can unblock all threads at once!
Edit: I tried implementing whatever I said and it doesn't work due to deadlock
One option is the "poison pill" method. It assumes that you know how many consumer threads exist. Assuming there are N consumers, then after the producer has done it's thing, it puts N "poison pills" into the queue. A "poison pill" simply is an object/value that is type-compatible with whatever the producer normally produces, but which is distinguishable from a normal object/value.
When a consumer recognizes that it has eaten a poison pill, it dies. Problem solved.
I've done producer consumer structures in C++ in FreeRTOS operating system only, so keep that in mind. That has been my only experience so far with multitasking. I would say that I only used one producer in that program and one consumer. And I've done multitasking in LabView, but this is little bit different from what you might have, I think.
I think that one option could be to have a queue structure, so that the producer enqueues elements into the queue but if it's full of data, then you can hopefully implement it so that you can make some kind of queue policy as follows.
producer can either
block itself until space is available in the queue to enqueue,
block itself for certain time period, and continue elsewhere if time spent and didnt succeed in enqueuing data
immediately go elsewhere
So it looks like you have your enqueuing policy in order...
The queue readers are able to have similar three type of policies at least in FreeRTOS.
In general if you have a binary semaphore, then you have it so that the sender is sending it, and the receiver is waiting on it. It is used for synchronization or signalling.
In my opinion you have chosen the wrong approach with the "many semaphores" (???)
What you need to have is a queue structure where the producer inputs stuff...
Then, the consumers read from the queue whatever they must do...
If the queue is empty then you need a policy on what the queue reader threads should do.
Policy choice is needed also for those queue readers and semaphore readers on what they should do, when the queue is empty, or if they havent gotten the semaphore received. I would not use semaphores for this kind of problem...
I think the boolean variable idea could work, because you are only writing into that variable in the producer thread. Then the other threads should be able to read and poll that boolean variable if the producer is active...
But I think that you should provide more details what you are trying to do, especially with the consumer threads, how many threads of what kind you have, and what language you are programming in etc...

The usage case of counting semaphore

To be clear: I do mostly embedded stuff, i.e. it's C and some kind of real-time kernel in microcontroller; but actually this question should be platform-independent.
I've read nice article by Michael Barr: Mutexes and Semaphores Demystified, as well as this related answer on StackOverflow. I understand clearly what binary semaphore is for, and what mutex is for. That's great.
But to be honest I never knew, and still can't understand, what so-called counting semaphore (i.e. semaphore with max count > 1) is for. In what cases should I use it?
Long time ago, before I've read aforementioned article by Michael Barr, I've told something like "you can use it when you have, like, a hotel room with certain number of beds. The number of beds is a maximum count for the semaphore, just like a number of keys for that room".
It probably sounds nicely, but actually I never had such a situation in my programming practice (and can't imagine any), and Michael Barr said this approach is just wrong, and he seems right.
Then, after I've read the article, I supposed it might probably be used when I have, say, some kind of FIFO buffer. Assume the buffer's capacity is 10 elements, and we have two tasks: A (the producer), and B (the consumer). Then:
Semaphore's max count should be set to 10;
When A wants to put data into buffer, it signals the semaphore.
When B wants to get the data from buffer, it waits the semaphore.
Well, but it doesn't work:
What if A tries to put new data to the FIFO, but there is no room? How would it wait for the place: should it call signal before putting new data (and signal then should be able to wait until max count < max count)? If so, semaphore will be signaled before data is actually put in the FIFO, this is wrong.
Semaphore is not enough for the proper synchronization: the FIFO itself needs to be synchronized as well. And then, it produces classic TOCTTOU problem: there is a period of time while semaphore is already either signaled or waited, but FIFO isn't yet modified.
So, when should I use that beast, the counting semaphore?
The 'classic' example is, indeed, a producer-consumer queue.
An unbounded queue requires one semaphore, (to count the queue entries), and a mutex-protected thread-safe queue, (or equivalent lock-free thread-safe queue). The semaphore is intialized to zero. Producers lock the mutex, push an object onto the queue, unlock the mutex and signal the semaphore. Consumers wait on the semaphore, lock the mutex, pop the object and unlock the mutex.
An bounded queue requires two semaphores, (one 'count' to count the entries, the other 'available' to count the free space), and a mutex-protected thread-safe queue, (or equivalent lock-free thread-safe queue). 'count' is initialized to zero and 'available' to the number of spaces free in an empty queue. Producers wait for 'available', lock the mutex, push an object onto the queue, unlock the mutex and signal 'count'. Consumers wait on 'count', lock the mutex, pop the object, unlock the mutex and signal 'available'.
This is a classic use for semaphores and had been around since forever, (well, since Dijkstra, anyway:). It's been tried billions of times, and it works fine for any number of producers/consumers.
There is no TOCTTOU issue, no corner-cases, no races.
The 'mutex' functionality may be provided by yet another semaphore, initialized to 1. This allows 'two semaphore' unbounded, and 'three semaphore' bounded implementations.
I supposed it might probably be used when I have, say, some kind of FIFO buffer. Assume the buffer's capacity is 10 elements, and we have two tasks: A (the producer), and B (the consumer). Then:
Semaphore's max count should be set to 10;
When A wants to put data into buffer, it signals the semaphore.
When B wants to get the data from buffer, it waits the semaphore.
This is not the way semaphores are used in the producer-consumer scenario. The standard solution is to use two counting semaphores, one for the empty slots (initialized to the number of available slots), and another for the filled slots (initialized to 0).
Producers try to allocate empty slots to put items in, so they start with wait-ing on the semaphore assigned to the empty slots. Consumers try to "allocate" (get hold of) filled slots, so they start with wait-ing on the semaphore assigned to the filled slots.
After finishing their work, they both signal the other semaphore since they transform slots from empty to filled and from filled to empty, respectively.
Standard solution scheme:
semaphore mutex = 1;
semaphore filled = 0;
semaphore empty = SIZE;
producer() {
while ( true) {
item = produceItem();
wait(empty);
wait(mutex);
putItemIntoBuffer( item);
signal(mutex);
signal(filled);
}
}
consumer() {
while ( true) {
wait( filled);
wait( mutex);
item = removeItemFromBuffer();
signal( mutex);
signal( empty);
consumeItem( item);
}
}
I think counting semaphores serve well in this situation.
Another, maybe simpler, example could be using a counting semaphore for avoiding deadlock in the Dining philosophers scenario. Since deadlock can occur only when all philosophers sit down simultaneously and pick their (say) left fork, deadlock can be avoided by not allowing all of them into the dining room at the same time. This can be achieved by a counting semaphore (enter) initialized to one less than the number of philosophers.
The protocol of one philosopher then becomes:
wait( enter)
wait( left_fork)
wait( right_fork)
eat()
signal( left_fork)
signal( right_fork)
signal( enter)
This ensures that all philosophers cannot be in the dining room at the same time.
Some of the more popular use cases of counting semaphores are -
Limiting the number of connections in a JDBC connection pool.
Network connection throttling.
Limiting concurrent access to resources such as a disk.

Advantages of using condition variables over mutex

I was wondering what is the performance benefit of using condition variables over mutex locks in pthreads.
What I found is : "Without condition variables, the programmer would need to have threads continually polling (possibly in a critical section), to check if the condition is met. This can be very resource consuming since the thread would be continuously busy in this activity. A condition variable is a way to achieve the same goal without polling." (https://computing.llnl.gov/tutorials/pthreads)
But it also seems that mutex calls are blocking (unlike spin-locks). Hence if a thread (T1) fails to get a lock because some other thread (T2) has the lock, T1 is put to sleep by the OS, and is woken up only when T2 releases the lock and the OS gives T1 the lock. The thread T1 does not really poll to get the lock. From this description, it seems that there is no performance benefit of using condition variables. In either case, there is no polling involved. The OS anyway provides the benefit that the condition-variable paradigm can provide.
Can you please explain what actually happens.
A condition variable allows a thread to be signaled when something of interest to that thread occurs.
By itself, a mutex doesn't do this.
If you just need mutual exclusion, then condition variables don't do anything for you. However, if you need to know when something happens, then condition variables can help.
For example, if you have a queue of items to work on, you'll have a mutex to ensure the queue's internals are consistent when accessed by the various producer and consumer threads. However, when the queue is empty, how will a consumer thread know when something is in there for it to work on? Without something like a condition variable it would need to poll the queue, taking and releasing the mutex on each poll (otherwise a producer thread could never put something on the queue).
Using a condition variable lets the consumer find that when the queue is empty it can just wait on the condition variable indicating that the queue has had something put into it. No polling - that thread does nothing until a producer puts something in the queue, then signals the condition that the queue has a new item.
You're looking for too much overlap in two separate but related things: a mutex and a condition variable.
A common implementation approach for a mutex is to use a flag and a queue. The flag indicates whether the mutex is held by anyone (a single-count semaphore would work too), and the queue tracks which threads are in line waiting to acquire the mutex exclusively.
A condition variable is then implemented as another queue bolted onto that mutex. Threads that got in line to wait to acquire the mutex can—usually once they have acquired it—volunteer to get out of the front of the line and get into the condition queue instead. At this point, you have two separate sets of waiters:
Those waiting to acquire the mutex exclusively
Those waiting for the condition variable to be signaled
When a thread holding the mutex exclusively signals the condition variable, for which we'll assume for now that it's a singular signal (unleashing no more than one waiting thread) and not a broadcast (unleashing all the waiting threads), the first thread in the condition variable queue gets shunted back over into the front (usually) of the mutex queue. Once the thread currently holding the mutex—usually the thread that signaled the condition variable—relinquishes the mutex, the next thread in the mutex queue can acquire it. That next thread in line will have been the one that was at the head of the condition variable queue.
There are many complicated details that come into play, but this sketch should give you a feel for the structures and operations in play.
If you are looking for performance, then start reading about "non blocking / non locking" thread synchronization algorithms. They are based upon atomic operations, which gcc is kind enough to provide. Lookup gcc atomic operations. Our tests showed we could increment a global value with multiple threads using atomic operation magnitudes faster than locking with a mutex. Here is some sample code that shows how to add items to and from a linked list from multiple threads at the same time without locking.
For sleeping and waking threads, signals are much faster than conditions. You use pthread_kill to send the signal, and sigwait to sleep the thread. We tested this too with the same kind of performance benefits. Here is some example code.

Optimal sleep time in multiple producer / single consumer model

I'm writing an application that has a multiple producer, single consumer model (multiple threads send messages to a single file writer thread).
Each producer thread contains two queues, one to write into, and one for a consumer to read out of. Every loop of the consumer thread, it iterates through each producer and lock that producer's mutex, swaps the queues, unlocks, and writes out from the queue that the producer is no longer using.
In the consumer thread's loop, it sleeps for a designated amount of time after it processes all producer threads. One thing I immediately noticed was that the average time for a producer to write something into the queue and return increased dramatically (by 5x) when I moved from 1 producer thread to 2. As more threads are added, this average time decreases until it bottoms out - there isn't much difference between the time taken with 10 producers vs 15 producers. This is presumably because with more producers to process, there is less contention for the producer thread's mutex.
Unfortunately, having < 5 producers is a fairly common scenario for the application and I'd like to optimize the sleep time so that I get reasonable performance regardless of how many producers exist. I've noticed that by increasing the sleep time, I can get better performance for low producer counts, but worse performance for large producer counts.
Has anybody else encountered this, and if so what was your solution? I have tried scaling the sleep time with the number of threads, but it seems somewhat machine specific and pretty trial-and-error.
You could pick the sleep time based on the number of producers or even make the sleep time adapt based on some dyanmic scheme. If the consumer wakes up and has no work, double the sleep time, otherwise halve it. But constrain the sleep time to some minimum and maximum.
Either way you're papering over a more fundamental issue. Sleeping and polling is easy to get right and sometimes is the only approach available, but it has many drawbacks and isn't the "right" way.
You can head in the right direction by adding a semaphore which is incremented whenever a producer adds an item to a queue and decremented when the consumer processes an item in a queue. The consumer will only wake up when there are items to process and will do so immediately.
Polling the queues may still be a problem, though. You could add a new queue that refers to any queue which has items on it. But it rather raises the question as to why you don't have a single queue that the consumer processes rather than a queue per producer. All else being equal that sounds like the best approach.
Instead of sleeping, I would recommend that your consumer block on a condition signaled by the producers. On a posix-compliant system, you could make it work with pthread_cond. Create an array of pthread_cond_t, one for each producer, then create an additional one that is shared between them. The producers first signal their individual condition variable, and then the shared one. The consumer waits on the shared condition and then iterates over the elements of the array, performing a pthread_cond_timed_wait() on each element of the array (use pthread_get_expiration_np() to get the absolute time for "now"). If the wait returns 0, then that producer has written data. The consumer must reinitialize the condition variables before waiting again.
By using blocking waits, you'll minimize the amount time the consumer is needlessly locking-out the producers. You could also make this work with semaphores, as stated in a previous answer. Semaphores have simplified semantics compared to conditions, in my opinion, but you'd have to be careful to decrement the shared semaphore once for each producer that was processed on each pass through the consumer loop. Condition variables have the advantage that you can basically use them like boolean semaphores if you reinitialize them after they are signaled.
Try to find an implementation of a Blocking Queue in the language that you use for programming. No more than one queue will be enough for any number of producers and one consumer.
To me it sounds like you are accidentally introducing some buffering by having the consumer thread be busy somewhere else, either sleeping or doing actual work. (the queue acting as the buffer) Maybe doing some simple buffering on the producer side will reduce your contention.
It seems that your system is highly sensitive to lock-contention between the producer and consumer, but I'm baffled as to why such a simple swap operation would occupy enough cpu time to show up in your run stats.
Can you show some code?
edit: maybe you are taking your lock and swapping queues even when there is no work to do?

How to use queue with two threads-- one for consumer and one for producer

I am using an application where a lower level application always invokes a callback RecData(char *buf) when it receives data.
In the callback I am creating two threads and pass the consumer and producer function to these created threads respectively.
My code:
void RecData (char * buf)
{
CreateThread(NULL,0,producer_queue,(void *)buf,0,NULL);
CreateThread(NULL,0,consumer_queue,NULL,0,NULL);
}
The above works when I receive one data at a time. If I receive say 5 data almost at the same time then producer_queue should first put all the data in queue and then consumer_queue should start retrieving the data but here as soon as producer_queue puts the first data in queue, consumer_queue retrieves it.
What you want to do, I believe, is control access to the queue. You'll want to look at using a mutex to control reading from the queue.
When you recieve data, you will lock the mutex, then enqueue data. When you are done queing the data, then release the lock.
When reading from the queue, you will see if the mutex is locked. If you are writing data to the queue, you won't be able to start reading, until your producer thread has completed writing all of it's data and release the lock. If you actually lock the mutex, then you prevent your writer thread from writing while you are reading data.
This approach could introduce potential deadlocks. If your writer thread dies prior to releasing the lock, then your reader thread will not be able to continue (then again your thread dying may just trigger an error state).
I hope this makes sense.
Use the concept of condition variables. The probelm you have is the most common one in multi-threaded programming world. Just using mutexes doesn't help the situation. Always remember that mutexes are for locking & condition variables are for waiting. The later is always safer and almost certain when a thread should start consuming from a shared queue.
Check out the below link on how you can create a condition variable on your own on windows:
http://www.cs.wustl.edu/~schmidt/win32-cv-1.html
If you are using windows vista, the below msdn example may help you:
http://msdn.microsoft.com/en-us/library/ms686903(VS.85).aspx
In all cases use the logic as shown in Schmidt's website as it looks more portable (oh yes portable on different versions of windows atleast). Schmidt's implemention gives you the standard POSIX api feel which is the widely used standard on most modern UNIX/LINUX systems.

Resources