Why are two pthreads in sync even without mutex? - multithreading

I was trying to redo an example of threads. Here are the two functions that I am running from main one after another. They are the typical increment and decrement functions.
void* increment(void *arg)
{
int incr_step = *(int*) arg;
free(arg);
unsigned long int i;
for(i=0; i<5;i++) {
//pthread_mutex_lock(&lock);
counter = counter + incr_step;
//pthread_mutex_unlock(&lock);
printf("Thread ID %lu --> counter = %d\n", pthread_self(), counter);
sleep(1);
}
return NULL;
}
void* decrement(void *arg)
{
int decr_step = *(int*)arg;
free(arg);
unsigned long int i;
for(i=0; i<5;i++) {
//pthread_mutex_lock(&lock);
counter = counter - decr_step;
//pthread_mutex_unlock(&lock);
printf("Thread ID %lu--> counter = %d\n", pthread_self(), counter);
sleep(1);
}
return NULL;
}
In main I just create two pthreads and call these two functions in both of these threads one after another and of course I am also joining them. I have a global variable counter, which is initially 5, and I am testing with passing increment value as 3, and decrement value as 2. So if my threads were synchronized, my final value of counter would be 10(since an increment of 3 happens five times, so counter becomes 5 + 5*3 = 20 and a decrement of 2 happens five times, so counter becomes 20 - 5*2 = 10).
However I have commented the mutex statements and I expect my final value of counter(which was 10 if threads were in sync) to be a different value, but I keep getting 10 again. Why?

The behavior of accessing shared variables without synchronizing mechanisms like mutex lock is non-deterministic.
It is by chance that you are seeing the value of the variable same as with the mutex lock.
No initial conditions guarantee that race conditions won't happen even if you don't implement synchronized access of shared variables by threads.

Related

Create a function that will block until it was called by more than n/2 threads (pseudocode)

There are n threads. I'm trying to implement a function (pseudo code) which will directly block if it's called by a thread. Every thread will be blocked and the function will stop blocking threads if it was called by more than n/2 threads. If more than n/2 threads called the function, the function will no longer block other threads and will immediately return instead.
I did it like this but I'm not sure if I did the last part correctly where the function will immediately return if more than n/2 threads called it? :S
(Pseudocode is highly appreciated because then I have a better chance to understand it! :) )
int n = total amount of threads
sem waiter = 0
sem mutex = 1
int counter = 0
function void barrier()
int x
P(mutex)
if counter > n / 2 then
V(mutex)
for x = 0; x <= n / 2; x++;
V(waiter)
end for
end if
else
counter++
V(mutex)
P(waiter)
end else
end function
What you describe is a non-resetting barrier. Pthreads has a barrier implementation, but it is of the resetting variety.
To implement what you're after with pthreads, you will want a mutex plus a condition variable, and a shared counter. A thread entering the function locks the mutex and checks the counter. If not enough other threads have yet arrived then it waits on the CV, otherwise it broadcasts to it to wake all the waiting threads. If you wish, you can make it just the thread that tips the scale that broadcasts. Example:
struct my_barrier {
pthread_mutex_t barrier_mutex;
pthread_cond_t barrier_cv;
int threads_to_await;
};
void barrier(struct my_barrier *b) {
pthread_mutex_lock(&b->barrier_mutex);
if (b->threads_to_await > 0) {
if (--b->threads_to_await == 0) {
pthread_cond_broadcast(&b->barrier_cv);
} else {
do {
pthread_cond_wait(&b->barrier_cv, &b->barrier_mutex);
} while (b->threads_to_await);
}
}
pthread_mutex_unlock(&b->barrier_mutex);
}
Update: pseudocode
Or since a pseudocode representation is important to you, here's the same thing in a pseudocode language similar to the one used in the question:
int n = total amount of threads
mutex m
condition_variable cv
int to_wait_for = n / 2
function void barrier()
lock(mutex)
if to_wait_for == 1 then
to_wait_for = 0
broadcast(cv)
else if to_wait_for > 1 then
to_wait_for = to_wait_for - 1
wait(cv)
end if
unlock(mutex)
end function
That's slightly higher-level than your pseudocode, in that it does not assume that the mutex is implemented as a semaphore. (And with pthreads, which you tagged, you would need a pthreads mutex, not a semaphore, to go with a pthreads condition variable). It also omits the details of the real C code that deal with spurrious wakeup from waiting on the condition variable and with initializing the mutex and cv. Also, it presents the variables as if they are all globals -- such a function can be implemented that way in practice, but it is poor form.
Note also that it assumes that pthreads semantics for the condition variable: that waiting on the cv will temporarily release the mutex, allowing other threads to lock it, but that a thread that waits on the cv will reacquire the mutex before itself proceeding past the wait.
A few assumptions I am making within my answer:
P(...) is analogous to sem_wait(...)
V(...) is analogous to sem_post(...)
the barrier cannot be reset
I'm not sure if I did the last part correctly where the function will immediately return if more than n/2 threads called it
The pseudocode should work fine for the most part, but the early return/exit conditions could be significantly improved upon.
Some concerns (but nothing major):
The first time the condition counter > n / 2 is met, the waiter semaphore is signaled (i.e. V(...)) (n / 2) + 1 times (since it is from 0 to n / 2 inclusive), instead of n / 2 (which is also the value of counter at that moment).
Every subsequent invocation after counter > n / 2 is first met will also signal (i.e. V(...)) the waiter semaphore another (n / 2) + 1 times. Instead, it should early return and not re-signal.
These can be resolved with a few minor tweaks.
int n = total count of threads
sem mutex = 1;
sem waiter = 0;
int counter = 0;
bool released = FALSE;
function void barrier() {
P(mutex)
// instead of the `released` flag, could be replaced with the condition `counter > n / 2 + 1`
if released then
// ensure the mutex is released prior to returning
V(mutex)
return
end if
if counter > n / 2 then
// more than n/2 threads have tried to wait, mark barrier as released
released = TRUE
// mutex can be released at this point, as any thread acquiring `mutex` after will see that `release` is TRUE and early return
V(mutex)
// release all blocked threads; counter is guaranteed to never be incremeneted again
int x
for x = 0; x < counter; x++
V(waiter)
end for
else
counter++
V(mutex)
P(waiter)
end else
}

Do I need a QMutex for variable that is accessed by single statement?

In this document, a QMutex is used to protect "number" from being modified by multiple threads at same time.
I have a code in which a thread is instructed to do different work according to a flag set by another thread.
//In thread1
if(flag)
dowork1;
else
dowork2;
//In thread2
void setflag(bool f)
{
flag=f;
}
I want to know if a QMutex is needed to protect flag, i.e.,
//In thread1
mutex.lock();
if(flag)
{
mutex.unlock();
dowork1;
}
else
{
mutex.unlock();
dowork2;
}
//In thread2
void setflag(bool f)
{
mutex.lock();
flag=f;
mutex.unlock();
}
The code is different from the document in that flag is accessed(read/written) by single statement in both threads, and only one thread modifies the value of flag.
PS:
I always see the example in multi-thread programming tutorials that one thread does "count++", the other thread does "count--", and the tutorials say you should use a Mutex to protect the variable "count". I cannot get the point of using a mutex. Does it mean the execution of single statement "count++" or "count--" can be interrupted in the middle and produce unexpected result? What unexpected results can be gotten?
Does it mean the execution of single statement "count++" or "count--"
can be interrupted in the middle and produce unexpected result? What
unexpected results can be gotten?
Just answering to this part: Yes, the execution can be interrupted in the middle of a statement.
Let's imagine a simple case:
class A {
void foo(){
++a;
}
int a = 0;
};
The single statement ++a is translated in assembly to
mov eax, DWORD PTR [rdi]
add eax, 1
mov DWORD PTR [rdi], eax
which can be seen as
eax = a;
eax += 1;
a = eax;
If foo() is called on the same instance of A in 2 different threads (be it on a single core, or multiple cores) you cannot predict what will be the result of the program.
It can behave nicely:
thread 1 > eax = a // eax in thread 1 is equal to 0
thread 1 > eax += 1 // eax in thread 1 is equal to 1
thread 1 > a = eax // a is set to 1
thread 2 > eax = a // eax in thread 2 is equal to 1
thread 2 > eax += 1 // eax in thread 2 is equal to 2
thread 2 > a = eax // a is set to 2
or not:
thread 1 > eax = a // eax in thread 1 is equal to 0
thread 2 > eax = a // eax in thread 2 is equal to 0
thread 2 > eax += 1 // eax in thread 2 is equal to 1
thread 2 > a = eax // a is set to 1
thread 1 > eax += 1 // eax in thread 1 is equal to 1
thread 1 > a = eax // a is set to 1
In a well defined program, N calls to foo() should result in a == N.
But calling foo() on the same instance of A from multiple threads creates undefined behavior. There is no way to know the value of a after N calls to foo().
It will depend on how you compiled your program, what optimization flags were used, which compiler was used, what was the load of your CPU, the number of core of your CPU,...
NB
class A {
public:
bool check() const { return a == b; }
int get_a() const { return a; }
int get_b() const { return b; }
void foo(){
++a;
++b;
}
private:
int a = 0;
int b = 0;
};
Now we have a class that, for an external observer, keeps a and b equal at all time.
The optimizer could optimize this class into:
class A {
public:
bool check() const { return true; }
int get_a() const { return a; }
int get_b() const { return b; }
void foo(){
++a;
++b;
}
private:
int a = 0;
int b = 0;
};
because it does not change the observable behavior of the program.
However if you invoke undefined behavior by calling foo() on the same instance of A from multiple threads, you could end up if a = 3, b = 2 and check() still returning true. Your code has lost its meaning, the program is not doing what it is supposed to and can be doing about anything.
From here you can imagine more complex cases, like if A manages network connections, you can end up sending the data for client #10 to client #6. If your program is running in a factory, you can end up activating the wrong tool.
If you want the definition of undefined behavior you can look here : https://en.cppreference.com/w/cpp/language/ub
and in the C++ standard
For a better understanding of UB you can look for CppCon talks on the topic.
For any standard object (including bool) that is accessed from multiple threads, where at least one of the threads may modify the object's state, you need to protect access to that object using a mutex, otherwise you will invoke undefined behavior.
As a practical matter, for a bool that undefined behavior probably won't come in the form of a crash, but more likely in the form of thread B sometimes not "seeing" changes made to the bool by thread A, due to caching and/or optimization issues (e.g. the optimizer "knows" that the bool can't change during a function call, so it doesn't bother checking it more than once)
If you don't want to guard your accesses with a mutex, the other option is to change flag from a bool to a std::atomic<bool>; the std::atomic<bool> type has exactly the semantics you are looking for, i.e. it can be read and/or written from any thread without invoking undefined behavior.
Look here for an explanation: Do I have to use atomic<bool> for "exit" bool variable?
To synchronize access to flag you can make it a std::atomic<bool>.
Or you can use a QReadWriteLock together with a QReadLocker and a QWriteLocker. Compared to using a QMutex this gives you the advantage that you do not need to care about the call to QMutex::unlock() if you use exceptions or early return statements.
Alternatively you can use a QMutexLocker if the QReadWriteLock does not match your use case.
QReadWriteLock lock;
...
//In thread1
{
QReadLocker readLocker(&lock);
if(flag)
dowork1;
else
dowork2;
}
...
//In thread2
void setflag(bool f)
{
QWriteLocker writeLocker(&lock);
flag=f;
}
Keeping your program expressing its intent (ie. accessing shared vars under locks) is a big win for program maintenance and clarity. You need to have some pretty good reasons to abandon that clarity for obscure approaches like the atomics and devising consistent race conditions.
Good reasons include you have measured your program spending too much time toggling the mutex. In any decent implementation, the difference between a non-contested mutex and an atomic is minute -- the mutex lock and unlock typical employ an optimistic compare-and-swap, returning quickly. If your vendor doesn't provide a decent implementation, you might bring that up with them.
In your example, dowork1 and dowork2 are invoked with the mutex locked; so the mutex isn't just protecting flag, but also serializing these functions. If that is just an artifact of how you posed the question, then race conditions (variants of atomics travesty) are less scary.
In your PS (dup of comment above):
Yes, count++ is best thought of as:
mov $_count, %r1
ld (%r1), %r0
add $1, %r0, %r2
st %r2,(%r1)
Even machines with natural atomic inc (x86,68k,370,dinosaurs) instructions might not be used consistently by the compiler.
So, if two threads do count--; and count++; at close to the same time, the result could be -1, 0, 1. (ignoring the language weenies that say your house might burn down).
barriers:
if CPU0 executes:
store $1 to b
store $2 to c
and CPU1 executes:
load barrier -- discard speculatively read values.
load b to r0
load c to r1
Then CPU1 could read r0,r1 as: (0,0), (1,0), (1,2), (0,2).
This is because the observable order of the memory writes is weak; the processor may make them visible in an arbitrary fashion.
So, we change CPU0 to execute:
store $1 to b
store barrier -- stop storing until all previous stores are visible
store $2 to c
Then, if CPU1 saw that r1 (c) was 2, then r0 (b) has to be 1. The store barrier enforces that.
For me, its seems to be more handy to use a mutex here.
In general not using mutex when sharing references could lead to
problems.
The only downside of using mutex here seems to be, that you will slightly decrease the performance, because your threads have to wait for each other.
What kind of errors could happen ?
Like somebody in the comments said its a different situation if
your share fundamental datatype e.g. int, bool, float
or a object references. I added some qt code
example, which emphases 2 possible problems during NOT using mutex. The problem #3 is a fundamental one and pretty well described in details by Benjamin T and his nice answer.
Blockquote
main.cpp
#include <QCoreApplication>
#include <QThread>
#include <QtDebug>
#include <QTimer>
#include "countingthread.h"
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
int amountThread = 3;
int counter = 0;
QString *s = new QString("foo");
QMutex *mutex = new QMutex();
//we construct a lot of thread
QList<CountingThread*> threadList;
//we create all threads
for(int i=0;i<amountThread;i++)
{
CountingThread *t = new CountingThread();
#ifdef TEST_ATOMIC_VAR_SHARE
t->addCounterdRef(&counter);
#endif
#ifdef TEST_OBJECT_VAR_SHARE
t->addStringRef(s);
//we add a mutex, which is shared to read read write
//just used with TEST_OBJECT_SHARE_FIX define uncommented
t->addMutexRef(mutex);
#endif
//t->moveToThread(t);
threadList.append(t);
}
//we start all with low prio, otherwise we produce something like a fork bomb
for(int i=0;i<amountThread;i++)
threadList.at(i)->start(QThread::Priority::LowPriority);
return a.exec();
}
countingthread.h
#ifndef COUNTINGTHREAD_H
#define COUNTINGTHREAD_H
#include <QThread>
#include <QtDebug>
#include <QTimer>
#include <QMutex>
//atomic var is shared
//#define TEST_ATOMIC_VAR_SHARE
//more complex object var is shared
#define TEST_OBJECT_VAR_SHARE
// we add the fix
#define TEST_OBJECT_SHARE_FIX
class CountingThread : public QThread
{
Q_OBJECT
int *m_counter;
QString *m_string;
QMutex *m_locker;
public :
void addCounterdRef(int *r);
void addStringRef(QString *s);
void addMutexRef(QMutex *m);
void run() override;
};
#endif // COUNTINGTHREAD_H
countingthread.cpp
#include "countingthread.h"
void CountingThread::run()
{
//forever
while(1)
{
#ifdef TEST_ATOMIC_VAR_SHARE
//first use of counter
int counterUse1Copy= (*m_counter);
//some other operations, here sleep 10 ms
this->msleep(10);
//we will retry to use a second time
int counterUse2Copy= (*m_counter);
if(counterUse1Copy != counterUse2Copy)
qDebug()<<this->thread()->currentThreadId()<<" problem #1 found, counter not like we expect";
//we increment afterwards our counter
(*m_counter) +=1; //this works for fundamental types, like float, int, ...
#endif
#ifdef TEST_OBJECT_VAR_SHARE
#ifdef TEST_OBJECT_SHARE_FIX
m_locker->lock();
#endif
m_string->replace("#","-");
//this will crash here !!, with problem #2,
//segmentation fault, is not handle by try catch
m_string->append("foomaster");
m_string->append("#");
if(m_string->length()>10000)
qDebug()<<this->thread()->currentThreadId()<<" string is: " << m_string;
#ifdef TEST_OBJECT_SHARE_FIX
m_locker->unlock();
#endif
#endif
}//end forever
}
void CountingThread::addCounterdRef(int *r)
{
m_counter = r;
qDebug()<<this->thread()->currentThreadId()<<" add counter with value: " << *m_counter << " and address : "<< m_counter ;
}
void CountingThread::addStringRef(QString *s)
{
m_string = s;
qDebug()<<this->thread()->currentThreadId()<<" add string with value: " << *m_string << " and address : "<< m_string ;
}
void CountingThread::addMutexRef(QMutex *m)
{
m_locker = m;
}
If you follow up the code you are able to perform 2 tests.
If you uncomment TEST_ATOMIC_VAR_SHARE and comment TEST_OBJECT_VAR_SHARE in countingthread.h
your will see
problem #1 if you use your variable multiple times in your thread, it could be changes in the background from another thread, besides my expectation there was no app crash or weird exception in my build environment during execution using an int counter.
If you uncomment TEST_OBJECT_VAR_SHARE and comment TEST_OBJECT_SHARE_FIX and comment TEST_ATOMIC_VAR_SHARE in countingthread.h
your will see
problem #2 you get a segmentation fault, which is not possible to handle via try catch. This appears because multiple threads are using string functions for editing on the same object.
If you uncomment TEST_OBJECT_SHARE_FIX too you see the right handling via mutex.
problem #3 see answer from Benjamin T
What is Mutex:
I really like the chicken explanation which vallabh suggested.
I also found an good explanation here

Threads issue in c language using pthread library

I declare a global variable and initialize it with 0.
In main () function i create two threads. The first thread function increments the global variable upto the received arguments (function parameter) using a for loop, while the second function decrements the global variable same times using for loop.
When i pass 1000 as arguments the program works fine but when i pass 100000 the global variable value should be zero at the end but i found the value is not zero.
I also called the join function for both threads but doesn't works.
#include "stdio.h"
#include "stdlib.h"
#include "pthread.h"
int globVar =0;
void *incFunct(void* val){
for (int i=0; i<val; i++)
globVar++;
pthread_exit(NULL);
}
void *decFunct(void* val){
for (int i=0; i<val; i++)
globVar--;
pthread_exit(NULL);
}
int main()
{
pthread_t tid[2];
int val = 1000000;
printf("Initial value of Global variable : %d \n", globVar);
pthread_create(&tid[0], NULL, &incFunct, (void*)val);
pthread_create(&tid[1], NULL, &decFunct, (void*)val);
pthread_join(tid[0], NULL);
pthread_join(tid[1], NULL);
printf("Final Value of Global Var : %d \n", globVar);
return 0;
}
Yeah, you can't do that. Reasonably, you could end up with globVar having any value between -10000000 and +1000000; unreasonably, you might have invited the compiler to burn down your home (ask google about undefined behaviour).
You need to synchronize the operations of the two threads. One such synchronization is with a pthread_mutex_t; and you would acquire the lock (pthread_mutex_lock()) before operating on globVar, and release the lock (pthread_mutex_unlock()) after updating globVar.
For this particularly silly case, atomics might be more appropriate if your compiler happens to support them (/usr/include/stdatomic.h).
One thing that might happen is that the inc thread and the dec thread don't see consistent values for globVar. If you increment a variable you think has a value of 592, and, at the same time, I decrement what I think is the same variable but with a value of 311 — who wins? What happens when it's all over?
Without memory synchronization, you can't predict what will happen when multiple threads update the same memory location. You might have problems with cache coherency, variable tearing, and even reordered operations. Mutexes or C11 atomic variables are two ways to avoid these problems.
(As an aside, I suspect you don't see this problem with one thousand iterations because the first thread finishes well before the second even looks at globVar, and your implementation happens to update memory for that latter thread's consistency.)

Multithreading

I have just started learning multi-threading. I have written a simple application. The application creates three threads. Two threads write and one thread reads. The writer threads write to separate location in a global array. The writer thread after incrementing the value in the array notifies the reader. The reader thread then decrements that value in the array and waits again for the writer threads to update their corresponding value in the array. The code for the application is pasted below.
What I see is that the writer(Producer) threads get more time slice than the reader(Consumer) thread. I think I am doing something wrong. If the output of the application is redirected to a file, then it can be observed that there are more consecutive messages from the Producers and the messages from the Consumer occur infrequently. What I was expecting was that, when a Producer updates its data, the Consumer immediately processes it i.e. after every Producer message there should be a Consumer message printed.
Thanks and regards,
~Plug
#include <stdio.h>
#include <pthread.h>
const long g_lProducerCount = 2; /*Number of Producers*/
long g_lProducerIds[2]; /*Producer IDs = 0, 1...*/
long g_lDataArray[2]; /*Data[0] for Producer 0, Data[1] for Producer 1...*/
/*Producer ID that updated the Data. -1 = No update*/
long g_lChangedProducerId = -1;
pthread_cond_t g_CondVar = PTHREAD_COND_INITIALIZER;
pthread_mutex_t g_Mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_t g_iThreadIds[3]; /*3 = 2 Producers + 1 Consumer*/
unsigned char g_bExit = 0; /*Exit application? 0 = No*/
void* Producer(void *pvData)
{
long lProducerId = *(long*)pvData; /*ID of this Producer*/
while(0 == g_bExit) {
pthread_mutex_lock(&g_Mutex);
/*Tell the Consumer who's Data is updated*/
g_lChangedProducerId = lProducerId;
/*Update the Data i.e. Increment*/
++g_lDataArray[lProducerId];
printf("Producer: Data[%ld] = %ld\n",
lProducerId, g_lDataArray[lProducerId]);
pthread_cond_signal(&g_CondVar);
pthread_mutex_unlock(&g_Mutex);
}
pthread_exit(NULL);
}
void* Consumer(void *pvData)
{
while(0 == g_bExit) {
pthread_mutex_lock(&g_Mutex);
/*Wait until one of the Producers update it's Data*/
while(-1 == g_lChangedProducerId) {
pthread_cond_wait(&g_CondVar, &g_Mutex);
}
/*Revert the update done by the Producer*/
--g_lDataArray[g_lChangedProducerId];
printf("Consumer: Data[%ld] = %ld\n",
g_lChangedProducerId, g_lDataArray[g_lChangedProducerId]);
g_lChangedProducerId = -1; /*Reset for next update*/
pthread_mutex_unlock(&g_Mutex);
}
pthread_exit(NULL);
}
void CreateProducers()
{
long i;
pthread_attr_t attr;
pthread_attr_init(&attr);
for(i = 0; i < g_lProducerCount; ++i) {
g_lProducerIds[i] = i;
pthread_create(&g_iThreadIds[i + 1], &attr,
Producer, &g_lProducerIds[i]);
}
pthread_attr_destroy(&attr);
}
void CreateConsumer()
{
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_create(&g_iThreadIds[0], &attr, Consumer, NULL);
pthread_attr_destroy(&attr);
}
void WaitCompletion()
{
long i;
for(i = 0; i < g_lProducerCount + 1; ++i) {
pthread_join(g_iThreadIds[i], NULL);
}
}
int main()
{
CreateProducers();
CreateConsumer();
getchar();
g_bExit = 1;
WaitCompletion();
return 0;
}
You would have to clarify what is it exactly that you want to achieve. For now the producers only increment an integer and the consumer decrements the value. This is not a very useful activity ;) I understand that this is only a test app, but still it is not clear enough what's the purpose of this processing, what are the constraints and so on.
The producers produce some 'items'. The outcome of this production is represented as an integer value. 0 means no items, 1 means there is a pending item, that consumer can take. Is that right? Now, is it possible for the producer to produce several items before any of them gets consumed (incrementing the array cell to a value higher than 1)? Or does he have to wait for the last item to be consumed before the next one can be put into the storage? Is the storage limited or unlimited? If it is limited then is the limit shared among all the producers or is it defined per producer?
What I was expecting was that, when a Producer updates its data,
the Consumer immediately processes it i.e. after every Producer
message there should be a Consumer message printed.
Though it's not really clear what you want to achieve I will hold on to that quote and assume the following: there is a limit of 1 item per producer and the producer has to wait for the consumer to empty the storage before a new item can be put in the cell i.e. the only allowed values in the g_lDataArray are 0 and 1.
To allow maximum concurrency between threads you will need a conditional variable/mutex pair for each cell of g_lDataArray (for each producer). You will also need a queue of updates that is a list of producers that have submitted their work and a conditional variable/mutex pair to guard it, this will replace g_lChangedProducerId which can only hold one value at a time.
Every time a producer wants to put an item into the storage it has to acquire the respective lock, check if the storage is empty (g_lDataArray[lProducerId] == 0), if not wait on the condition variable and then, increment the cell, release the held lock, acquire the consumer lock, add his id to the update queue, notify the consumer, release the consumer lock. Of course if the producer would perform any real computations producing some real item, this work should be performed out of the scope of any lock, before the attempt to put the item in the storage.
In pseudo code this looks like this:
// do some computations
item = compute();
lock (mutexes[producerId]) {
while (storage[producerId] != 0)
wait(condVars[producerId]);
storage[producerId] = item;
}
lock (consumerMutex) {
queue.push(producerId);
signal(consumerCondVar);
}
The consumer should act as follows: acquire his lock, check if there are any pending updates to process, if not wait on the condition variable, take one update out of the queue (that is the number of the updating producer), acquire the lock for producer who's update is going to be processed, decrement the cell, notify the producer, release the producer's lock, release his lock, finally process the update.
lock (consumerMutex) {
while (queue.isEmpty())
wait(consumerCondVar);
producerId = queue.pop();
lock (mutexex[producerId]) {
item = storage[producerId];
storage[producerId] = 0;
signal(condVars[producerId]);
}
}
//process the update
process(item);
Hope this answer is what you needed.
The problem may be that all producers change g_lChangedProducerId, so the value written by one producer may be overwritten by another producer before the consumer sees it.
This means that the consumer effectively doesn't see that the first producer has produced some output.
Well,when you producer produced, it may wake up the ProThread or ConThread.
And If it waked up the ProThread,the producer produced again,and the ConThread didn't consume immediately after data is produced.
That's what you don't want to see.
All you need is to make sure that when it produced,it won't wake the ProThread up.
Here's one kind of solution for this
void* Producer(void *pvData)
{
........
//wait untill consumer consume its number
while(-1!=g_lChangedProducerId)
pthread_cond_wait(&g_CondVar,&g_Mutex);
//here to inform the consumer it produced the data
g_lChangedProducerId = lProducerId;
........
}
void* Consumer(void *pvData)
{
g_lChangedProducerId = -1;
**//wake up the producer when it consume
pthread_cond_signal(&g_CondVar);**
pthread_mutex_unlock(&g_Mutex);
}

Can I assign a per-thread index, using pthreads?

I'm optimizing some instrumentation for my project (Linux,ICC,pthreads), and would like some feedback on this technique to assign a unique index to a thread, so I can use it to index into an array of per-thread data.
The old technique uses a std::map based on pthread id, but I'd like to avoid locks and a map lookup if possible (it is creating a significant amount of overhead).
Here is my new technique:
static PerThreadInfo info[MAX_THREADS]; // shared, each index is per thread
// Allow each thread a unique sequential index, used for indexing into per
// thread data.
1:static size_t GetThreadIndex()
2:{
3: static size_t threadCount = 0;
4: __thread static size_t myThreadIndex = threadCount++;
5: return myThreadIndex;
6:}
later in the code:
// add some info per thread, so it can be aggregated globally
info[ GetThreadIndex() ] = MyNewInfo();
So:
1) It looks like line 4 could be a race condition if two threads where created at exactly the same time. If so - how can I avoid this (preferably without locks)? I can't see how an atomic increment would help here.
2) Is there a better way to create a per-thread index somehow? Maybe by pre-generating the TLS index on thread creation somehow?
1) An atomic increment would help here actually, as the possible race is two threads reading and assigning the same ID to themselves, so making sure the increment (read number, add 1, store number) happens atomically fixes that race condition. On Intel a "lock; inc" would do the trick, or whatever your platform offers (like InterlockedIncrement() for Windows for example).
2) Well, you could actually make the whole info thread-local ("__thread static PerThreadInfo info;"), provided your only aim is to be able to access the data per-thread easily and under a common name. If you actually want it to be a globally accessible array, then saving the index as you do using TLS is a very straightforward and efficient way to do this. You could also pre-compute the indexes and pass them along as arguments at thread creation, as Kromey noted in his post.
Why so averse to using locks? Solving race conditions is exactly what they're designed for...
In any rate, you can use the 4th argument in pthread_create() to pass an argument to your threads' start routine; in this way, you could use your master process to generate an incrementing counter as it launches the threads, and pass this counter into each thread as it is created, giving you your unique index for each thread.
I know you tagged this [pthreads], but you also mentioned the "old technique" of using std::map. This leads me to believe that you're programming in C++. In C++11 you have std::thread, and you can pass out unique indexes (id's) to your threads at thread creation time through an ordinary function parameter.
Below is an example HelloWorld that creates N threads, assigning each an index of 0 through N-1. Each thread does nothing but say "hi" and give it's index:
#include <iostream>
#include <thread>
#include <mutex>
#include <vector>
inline void sub_print() {}
template <class A0, class ...Args>
void
sub_print(const A0& a0, const Args& ...args)
{
std::cout << a0;
sub_print(args...);
}
std::mutex&
cout_mut()
{
static std::mutex m;
return m;
}
template <class ...Args>
void
print(const Args& ...args)
{
std::lock_guard<std::mutex> _(cout_mut());
sub_print(args...);
}
void f(int id)
{
print("This is thread ", id, "\n");
}
int main()
{
const int N = 10;
std::vector<std::thread> threads;
for (int i = 0; i < N; ++i)
threads.push_back(std::thread(f, i));
for (auto i = threads.begin(), e = threads.end(); i != e; ++i)
i->join();
}
My output:
This is thread 0
This is thread 1
This is thread 4
This is thread 3
This is thread 5
This is thread 7
This is thread 6
This is thread 2
This is thread 9
This is thread 8

Resources