Producer Consumer synchronization with only 1 semaphore - multithreading

In the Producer-Consumer problem, the standard solution uses 3 semaphores.
However, I was wondering if we could just use 1 semaphore:
semaphore mutex = 1
procedure producer() {
while (true) {
down(mutex)
if (bufferNotFull()) {
item = produceItem()
putItemIntoBuffer(item)
}
up(mutex)
}
}
procedure consumer() {
while (true) {
down(mutex)
if (bufferNotEmpty()) {
item = removeItemFromBuffer()
consumeItem(item)
}
up(mutex)
}
}
Is this solution as good as the standard one??
The standard solution for reference:
semaphore mutex = 1
semaphore fillCount = 0
semaphore emptyCount = BUFFER_SIZE
procedure producer() {
while (true) {
item = produceItem()
down(emptyCount)
down(mutex)
putItemIntoBuffer(item)
up(mutex)
up(fillCount)
}
}
procedure consumer() {
while (true) {
down(fillCount)
down(mutex)
item = removeItemFromBuffer()
consumeItem(item)
up(mutex)
up(emptyCount)
}
}

The "standard solution" doesn't have to use 3 semaphores. Even the wikipedia article you linked has a two-semaphore solution for cases in which there's a single producer and a single consumer:
semaphore fillCount = 0; // items produced
semaphore emptyCount = BUFFER_SIZE; // remaining space
procedure producer()
{
while (true)
{
item = produceItem();
down(emptyCount);
putItemIntoBuffer(item);
up(fillCount);
}
}
procedure consumer()
{
while (true)
{
down(fillCount);
item = removeItemFromBuffer();
up(emptyCount);
consumeItem(item);
}
}
Solutions with a single semaphore aren't great since they give you just a single wait condition when you actually want to - "wait for free space", and "wait for an item". You can't do both with a single semaphore.
Anyway, your solution isn't wrong, it's just very inefficient since at any given moment only a single producer or a single consumer can run. This is essentially single-threaded code, only with locks and context switches between threads. So it's even less efficient that actual single threaded code.
And one more thing - item production and consumption aren't usually part of the critical section. While the item is being consumed or produced, the other thread can run. We need mutual exclusion only when using the buffer.

Related

c++11 lock-free queue with 2 thread

Along with the main thread, i have one more thread that receives data to write them in a file.
std::queue<std::vector<int>> dataQueue;
std::mutex mutex;
void setData(const std::vector<int>& data) {
std::lock_guard<std::mutex> lock(mutex);
dataQueue.push(data);
}
void write(const std::string& fileName) {
std::unique_ptr<std::ostream> ofs = std::unique_ptr<std::ostream>(new zstr::ofstream(fileName));
while (store) {
std::lock_guard<std::mutex> lock(mutex);
while (!dataQueue.empty()) {
std::vector<int>& data= dataQueue.front();
ofs->write(reinterpret_cast<char*>(data.data()), sizeof(data[0])*data.size());
dataQueue.pop();
}
}
}
}
setData is used by the main thread and write is actually the writing thread. I use std::lock_quard to avoid memory conflict but when locking on the writing thread, it slows down the main thread as it has to wait for the Queue to be unlocked. But i guess i can avoid this as the threads never act on the same element of the queue at the same time.
So i would like to do it lock-free but i don't really understand how i should implement it. I mean, how can i do it without locking anything ? moreover, if the writing thread is faster than the main thread, the queue might be empty most of the time, so it should somehow waits for new data instead of looping infinitly to check for non empty queue.
EDIT: I changed simple std::lock_guard by std::cond_variable so that it could wait when the queue is empty. But the main thread can still be blocked as , when cvQeue.wait(.) is resolved, it reacquire the lock. moreover, what if the main thread does cvQueue.notify_one() but the writing thread is not waiting ?
std::queue<std::vector<int>> dataQueue;
std::mutex mutex;
std::condition_variable cvQueue;
void setData(const std::vector<int>& data) {
std::unique_lock<std::mutex> lock(mutex);
dataQueue.push(data);
cvQueue.notify_one();
}
void write(const std::string& fileName) {
std::unique_ptr<std::ostream> ofs = std::unique_ptr<std::ostream>(new zstr::ofstream(fileName));
while (store) {
std::lock_guard<std::mutex> lock(mutex);
while (!dataQueue.empty()) {
std::unique_lock<std::mutex> lock(mutex);
cvQueue.wait(lock);
ofs->write(reinterpret_cast<char*>(data.data()), sizeof(data[0])*data.size());
dataQueue.pop();
}
}
}
}
If you only have two threads, than you could use a lock-free single-producer-single-consumer (SPSC) queue.
A bounded version can be found here: https://github.com/rigtor/SPSCQueue
Dmitry Vyukov presented an unbounded version here: http://www.1024cores.net/home/lock-free-algorithms/queues/unbounded-spsc-queue (You should note though, that this code should be adapted to use atomics.)
Regarding a blocking pop operation - this is something that lock-free data structures do not provide since such an operation is obviously not lock-free. However, it should be relatively straight forward to adapt the linked implementations in such a way, that a push operation notifies a condition variable if the queue was empty before the push.
i guess i have something that met my needs. I did a LockFreeQueue that uses std::atomic. I can thus manage the state of the head/tail of the queue atomically.
template<typename T>
class LockFreeQueue {
public:
void push(const T& newElement) {
fifo.push(newElement);
tail.fetch_add(1);
cvQueue.notify_one();
}
void pop() {
size_t oldTail = tail.load();
size_t oldHead = head.load();
if (oldTail == oldHead) {
return;
}
fifo.pop();
head.store(++oldHead);
}
bool isEmpty() {
return head.load() == tail.load();
}
T& getFront() {
return fifo.front();
}
void waitForNewElements() {
if (tail.load() == head.load()) {
std::mutex m;
std::unique_lock<std::mutex> lock(m);
cvQueue.wait_for(lock, std::chrono::milliseconds(TIMEOUT_VALUE));
}
}
private:
std::queue<T> fifo;
std::atomic<size_t> head = { 0 };
std::atomic<size_t> tail = { 0 };
std::condition_variable cvQueue;
};
LockFreeQueue<std::vector<int>> dataQueue;
std::atomic<bool> store(true);
void setData(const std::vector<int>& data) {
dataQueue.push(data);
// do other things
}
void write(const std::string& fileName) {
std::unique_ptr<std::ostream> ofs = std::unique_ptr<std::ostream>(new zstr::ofstream(fileName));
while (store.load()) {
dataQueue.waitForNewElements();
while (!dataQueue.isEmpty()) {
std::vector<int>& data= dataQueue.getFront();
ofs->write(reinterpret_cast<char*>(data.data()), sizeof(data[0])*data.size());
dataQueue.pop();
}
}
}
}
I still have one lock in waitForNewElements but it is not locking the whole process as it is waiting for things to do. But the big improvement is that the producer can push while the consumer pop. It is only forbidden when LockFreQueue::tail and LockFreeQueue::head are the same. Meaning that the queue is empty and it enters the waiting state.
The thing that i'm not very satisfied at is cvQueue.wait_for(lock, TIMEOUT_VALUE). I wanted to do a simple cvQueue.wait(lock), but the problem is that when it comes to end the thread, I do store.store(false) in the main thread. So if the writing thread is waiting it will never end without a timeout. So, I set a big enough timeout so that most of the time the condition_variable is resolved by the lock, and when the thread ends it is resolved by the timeout.
If you feel that something must be wrong or must be improved, feel free to comment.

Worker thread suspend / resume implementation

In my attempt to add suspend / resume functionality to my Worker [thread] class, I've happened upon an issue that I cannot explain. (C++1y / VS2015)
The issue looks like a deadlock, however I cannot seem to reproduce it once a debugger is attached and a breakpoint is set before a certain point (see #1) - so it looks like it's a timing issue.
The fix that I could find (#2) doesn't make a lot of sense to me because it requires to hold on to a mutex longer and where client code might attempt to acquire other mutexes, which I understand to actually increase the chance of a deadlock.
But it does fix the issue.
The Worker loop:
Job* job;
while (true)
{
{
std::unique_lock<std::mutex> lock(m_jobsMutex);
m_workSemaphore.Wait(lock);
if (m_jobs.empty() && m_finishing)
{
break;
}
// Take the next job
ASSERT(!m_jobs.empty());
job = m_jobs.front();
m_jobs.pop_front();
}
bool done = false;
bool wasSuspended = false;
do
{
// #2
{ // Removing this extra scoping seemingly fixes the issue BUT
// incurs us holding on to m_suspendMutex while the job is Process()ing,
// which might 1, be lengthy, 2, acquire other locks.
std::unique_lock<std::mutex> lock(m_suspendMutex);
if (m_isSuspended && !wasSuspended)
{
job->Suspend();
}
wasSuspended = m_isSuspended;
m_suspendCv.wait(lock, [this] {
return !m_isSuspended;
});
if (wasSuspended && !m_isSuspended)
{
job->Resume();
}
wasSuspended = m_isSuspended;
}
done = job->Process();
}
while (!done);
}
Suspend / Resume is just:
void Worker::Suspend()
{
std::unique_lock<std::mutex> lock(m_suspendMutex);
ASSERT(!m_isSuspended);
m_isSuspended = true;
}
void Worker::Resume()
{
{
std::unique_lock<std::mutex> lock(m_suspendMutex);
ASSERT(m_isSuspended);
m_isSuspended = false;
}
m_suspendCv.notify_one(); // notify_all() doesn't work either.
}
The (Visual Studio) test:
struct Job: Worker::Job
{
int durationMs = 25;
int chunks = 40;
int executed = 0;
bool Process()
{
auto now = std::chrono::system_clock::now();
auto until = now + std::chrono::milliseconds(durationMs);
while (std::chrono::system_clock::now() < until)
{ /* busy, busy */
}
++executed;
return executed < chunks;
}
void Suspend() { /* nothing here */ }
void Resume() { /* nothing here */ }
};
auto worker = std::make_unique<Worker>();
Job j;
worker->Enqueue(j);
std::this_thread::sleep_for(std::chrono::milliseconds(j.durationMs)); // Wait at least one chunk.
worker->Suspend();
Assert::IsTrue(j.executed < j.chunks); // We've suspended before we finished.
const int testExec = j.executed;
std::this_thread::sleep_for(std::chrono::milliseconds(j.durationMs * 4));
Assert::IsTrue(j.executed == testExec); // We haven't moved on.
// #1
worker->Resume(); // Breaking before this call means that I won't see the issue.
worker->Finalize();
Assert::IsTrue(j.executed == j.chunks); // Now we've finished.
What am I missing / doing wrong? Why does the Process()ing of the job have to be guarded by the suspend mutex?
EDIT: Resume() should not have been holding on to the mutex at the time of notification; that's fixed -- the issue persists.
Of course the Process()ing of the job does not have to be guarded by the suspend mutex.
The access of j.executed - for the asserts as well as for the incrementing - however does need to be synchronized (either by making it an std::atomic<int> or by guarding it with a mutex etc.).
It's still not clear why the issue manifested the way it did (since I'm not writing to the variable on the main thread) -- might be a case of undefined behaviour propagating backwards in time.

How to search through next available thread to do computation

I am doing multithreading in C++. This may be something very standard but I can't seem to find it anywhere or know any key terms to search for it online.
I want to do some sort of computation many times but with multiple threads. For each iteration of computation, I want to find the next available thread that has finished its previous computation to do the next iteration. I don't want to cycle through the threads in order since the next thread to be called may not have finished its work yet.
E.g.
Suppose I have a vector of int and I want to sum up the total with 5 threads. I have the to-be-updated total sum stored somewhere and the count for which element I am currently up to. Each thread looks at the count to see the next position and then takes that vector value and adds it to the total sum so far. Then it goes back to look for the count to do the next iteration. So for each iteration, the count increments then looks for the next available thread (maybe one already waiting for count; or maybe they are all busy still working) to do the next iteration. We do not increase the number of threads but I want to be able to somehow search through all the 5 threads for the first one that finish to do the next computation.
How would I go about coding this. Every way I know of involves doing a loop through the threads such that I can't check for the next available one which may be out of order.
Use semafor (or mutex, always mix up those two) on a global variable telling you what is next. The semafor will lock the other threads out as long as you access the variable making that threads access clear.
So, assuming you have an Array of X elements. And a global called nextfree witch is initalized to 0, then a psudo code would look like this:
while (1)
{
<lock semafor INT>
if (nextfree>=X)
{
<release semnafor INT>
<exit and terminate thread>
}
<Get the data based on "nextfree">
nextfree++;
<release semafor INT>
<do your stuff withe the chunk you got>
}
The point here is that each thread will be alone and have exlusive access to the data struct within the semafor lock and therefore can access the next available regardless of what the others doing. (The other threads will have to wait in line if they are done while another thread working on getting next data chunk. When you release only ONE that stands in queue will get access. The rest will have to wait.)
There are some things to be ware of. Semafor's might lock your system if you manage to exit in the wrong position (Withour releasing it) or create a deadlock.
This is a thread pool:
template<class T>
struct threaded_queue {
using lock = std::unique_lock<std::mutex>;
void push_back( T t ) {
{
lock l(m);
data.push_back(std::move(t));
}
cv.notify_one();
}
boost::optional<T> pop_front() {
lock l(m);
cv.wait(l, [this]{ return abort || !data.empty(); } );
if (abort) return {};
auto r = std::move(data.back());
data.pop_back();
return std::move(r);
}
void terminate() {
{
lock l(m);
abort = true;
data.clear();
}
cv.notify_all();
}
~threaded_queue()
{
terminate();
}
private:
std::mutex m;
std::deque<T> data;
std::condition_variable cv;
bool abort = false;
};
struct thread_pool {
thread_pool( std::size_t n = 1 ) { start_thread(n); }
thread_pool( thread_pool&& ) = delete;
thread_pool& operator=( thread_pool&& ) = delete;
~thread_pool() = default; // or `{ terminate(); }` if you want to abandon some tasks
template<class F, class R=std::result_of_t<F&()>>
std::future<R> queue_task( F task ) {
std::packaged_task<R()> p(std::move(task));
auto r = p.get_future();
tasks.push_back( std::move(p) );
return r;
}
template<class F, class R=std::result_of_t<F&()>>
std::future<R> run_task( F task ) {
if (threads_active() >= total_threads()) {
start_thread();
}
return queue_task( std::move(task) );
}
void terminate() {
tasks.terminate();
}
std::size_t threads_active() const {
return active;
}
std::size_t total_threads() const {
return threads.size();
}
void clear_threads() {
terminate();
threads.clear();
}
void start_thread( std::size_t n = 1 ) {
while(n-->0) {
threads.push_back(
std::async( std::launch::async,
[this]{
while(auto task = tasks.pop_front()) {
++active;
try{
(*task)();
} catch(...) {
--active;
throw;
}
--active;
}
}
)
);
}
}
private:
std::vector<std::future<void>> threads;
threaded_queue<std::packaged_task<void()>> tasks;
std::atomic<std::size_t> active;
};
You give it how many threads either at construction or via start_thread.
You then queue_task. This returns a std::future that tells you when the task is completed.
As threads finish a task, they go to the threaded_queue and look for more.
When a threaded_queue is destroyed, it aborts all data in it.
When a thread_pool is destroyed, it aborts all future tasks, then waits for all of the outstanding tasks to finish.
Live example.

C++/Cli synchronizing threads writing to file

I am trying to synchronize threads writing data to a text file in a class by using Monitor, but in my code it seems that the else statement is never evaluated, is this the correct use of monitor for thread synchronization?
void Bank::updatefile()
{
Thread^ current = Thread::CurrentThread;
bool open = false;
current->Sleep(1000);
while (!open)
{
if (Monitor::TryEnter(current))
{
String^ fileName = "accountdata.txt";
StreamWriter^ sw = gcnew StreamWriter(fileName);
for (int x = 0; x < 19; x++)
sw->WriteLine(accountData[x]);
sw->Close();
Monitor::Pulse;
Monitor::Exit(current);
current->Sleep(500);
open = true;
}
else
{
Monitor::Wait(current);
current->Sleep(500);
}
}
}
You're passing an object to Monitor::TryEnter that is specific to the thread in which it is executing (i.e. Thread^ current = Thread::CurrentThread;). No other threads are using the same object (they're using the one for their own thread). So there's never a collision or locking conflict.
Try creating some generic object that's shared among the threads, something higher up in the Bank class. Then use that for your TryEnter call.
Your use of Monitor is partially correct. The object you're using for locking is not correct.
Monitor
Monitor::Pulse is unnecessary here. Just Exit the monitor, and the next thread will be able to grab the lock.
Monitor::Wait is incorrect here: Wait is supposed to be used when the thread has the object already locked. Here, you have the object not locked yet.
In general, Pulse and Wait are rarely used. For locking for exclusive access to a shared resource, Enter, TryEnter, and Exit are all you need.
Here's how your use of Monitor should be written:
Object^ lockObj = ...;
bool done = false;
while(!done)
{
if(Monitor::TryEnter(lockObj, 500)) // wait 500 millis for the lock.
{
try
{
// do work
done = true;
}
finally
{
Monitor::Exit(lockObj);
}
}
else
{
// Check some other exit condition?
}
}
or if the else is empty, you can simplify it like this:
Object^ lockObj = ...;
Monitor::Enter(lockObj); // Wait forever for the lock.
try
{
// do work
}
finally
{
Monitor::Exit(lockObj);
}
There is a class that Microsoft provides that makes this all easier: msclr::lock. This class, used without the ^, makes use of the destructor to release the lock, without a try-finally block.
#include <msclr\lock.h>
bool done = false;
while(!done)
{
msclr::lock lock(lockObj, lock_later);
if (lock.try_acquire(500)) // wait 500 millis for the lock to be available.
{
// Do work
done = true;
}
} // <-- Monitor::Exit is called by lock class when it goes out of scope.
{
msclr::lock lock(lockObj); // wait forever for the lock to be available.
// Do work
} // <-- Monitor::Exit is called by lock class when it goes out of scope.
The object to lock on
Thread::CurrentThread is going to return a different object on each thread. Therefore, each thread attempts to lock on a different object, and that's why all of them succeed. Instead, have one object, created before you spawn your threads, that is used for locking.
Also, instead of opening & closing the file on each thread, it would be more efficient to open it once, before the threads are spawned, and then just use that one StreamWriter from each of the threads. This also gives you a obvious object to lock on: You can pass the StreamWriter itself to Monitor::Enter or msclr::lock.

Producer Consumer - using semaphores in child Processes

I had implemented the Bounded buffer(Buffer size 5) problem using three semaphores, two counting (with count MAX 5) and one binary semaphore for critical section.
The producer and consumer processes were separate and were sharing the buffer.
Then I have moved on to try the same problem, this time with One Parent process that sets up the shared memory( Buffer) and two child processes which act like Producer and consumer.
I almost copied whatever Implemented in the earlier code into the new one (Producer goes into the ret ==0 and i==0 block and Consumer goes into ret ==0 and i==1 block..Here i is the count of the child processes);
However my process blocks. The pseudo implementation of the code is as follows:
Please suggest if the steps are correct. I think i may be going wrong with the sharing of semaphores and their values. The shared memory anyways gets shared implicitly between the parent and both child processes.
struct shma{readindex,writeindex,buf_max,char buf[5],used_count};
main()
{
struct shma* shm;
shmid = shmget();
shm = shmat(shmid);
init_shma(0,0,5,0);
while(i++<2)
{
ret = fork();
if(ret > 0 )
continue;
if(ret ==0 )
{
if(i==0)
{
char value;
sembuf[3]; semun u1; values[] = {5,1,0}; // Sem Numbers 1,2,3
semid = semget(3 semaphores);
semctl(SETALL,values);
while(1)
{
getValuefromuser(&value);
decrement(1);
decrement(2); // Critical section
*copy value to shared memory*
increment(2);
increment(3); // used count
}
}
if(i==1)
{
char value;
sembuf[3]; semun u1; values[] = {5,1,0}; // Sem Numbers 1,2,3
semid = semget(3 semaphores);
semctl(SETALL,values);
while(1)
{
decrement(3); // Used Count
decrement(2); // Critical Section
read and print(&value); // From Shared Memory
increment(2);
increment(1); // free slots
}
}
}
}//while
Cleanup Code.
}//main
Should I get semaphore ids in both child processes..or is there something else missing.
The pseudo code implementation would be something like this. Get the semaphore ID in the child process using same key, either using ftok or hardcoded key, then obtain the current value of semaphore then perform appropriate operations.
struct shma{readindex,writeindex,buf_max,char buf[5],used_count};
main()
{
struct shma* shm;
shmid = shmget();
shm = shmat(shmid);
init_shma(0,0,5,0);
sembuf[3]; semun u1; values[] = {5,1,0}; // Sem Numbers 1,2,3
semid = semget(3 semaphores);
semctl(SETALL,values);
while(i++<2)
{
ret = fork();
if(ret > 0 )
continue;
if(ret ==0 )
{
if(i==0)
{
char value;
sembuf[3]; semun u1; values[];
semid = semget(3 semaphores);
while(1)
{
getValuefromuser(&value);
decrement(1);
decrement(2); // Critical section
*copy value to shared memory*
increment(2);
increment(3); // used count
}
}
if(i==1)
{
char value;
sembuf[3]; semun u1; values[];
while(1)
{
getValuefromuser(&value);
decrement(3); // Used Count
decrement(2); // Critical Section
read and print(&value); // From Shared Memory
increment(2);
increment(1); // free slots
}
}
}
}//while
Cleanup Code.
}//main
I ultimately found a problem with my understanding.
I have to create and set up a semaphore or three semaphores in the parent process and then obtain their value in the respective producer and consumer child processes then use them accordingly.
I was earlier, creating semaphores in both producer and consumer.
silly me.!!

Resources