Memory coherence with respect to c++ initializers - linux

If I set the value of a variable in one thread and read it in another, I protect it with a lock to ensure that the second thread reads the value most recently set by the first:
Thread 1:
lock();
x=3;
unlock();
Thread 2:
lock();
<use the value of x>
unlock();
So far, so good. However, suppose I have a c++ object that sets the value of x in an initializer:
theClass::theClass() : x(3) ...
theClass theInstance;
Then, I spawn a thread that uses theInstance. Is there any guarantee that the newly spawned thread will see the proper value of x? Or is it necessary to place a lock around the declaration of theInstance? I am interested primarily in c++ on Linux.

Prior to C++11, the C++ standard had nothing to say about multiple threads of execution and so made no guarantees of anything.
C++11 introduced a memory model that defines under what circumstances memory written on one thread is guaranteed to become visible to another thread.
Construction of an object is not inherently synchronized across threads. In your particular case though, you say you first construct the object and then 'spawn a thread'. If you 'spawn a thread' by constructing an std::thread object and you do it after constructing some object x on the same thread then you are guaranteed to see the proper value of x on the newly spawned thread. This is because the completion of the thread constructor synchronizes-with the beginning of your thread function.
The term synchronizes-with is a specific term used in defining the C++ memory model and it's worth understanding exactly what it means to understand more complex synchronization but for the case you outline things 'just work' without needing any additional synchronization.
This is all assuming you're using std::thread. If you're using platform threading APIs directly then the C++ standard has nothing to say about what happens but in practice you can assume it will work without needing a lock on any platform I know of.

You seem to have a misconception on locks:
If I set the value of a variable in one thread and read it in another,
I protect it with a lock to ensure that the second thread reads the
value most recently set by the first.
This is incorrect. Locks are used to prevent data races. Locks do not schedule the instructions of Thread 1 to happen before the instructions of Thread 2. With your lock in place, Thread 2 can still run before Thread 1 and read the value of x before Thread 1 changes the value of x.
As for your question:
If your initialization of theInstance happens-before the initialization/start of a certain thread A, then thread A is guaranteed to see the proper value of x.
Example
#include <thread>
#include <assert.h>
struct C
{
C(int x) : x_{ x } {}
int x_;
};
void f(C const& c)
{
assert(c.x_ == 42);
}
int main()
{
C c{ 42 }; // A
std::thread t{ f, std::ref(c) }; // B
t.join();
}
In the same thread: A is sequenced-before B, therefore A happens-before B. The assert in thread t will thus never fire.
If your initialization of 'theInstance' inter-thread happens-before its usage by a certain thread A, then thread A is guaranteed to see the proper value of x.
Example
#include <thread>
#include <atomic>
#include <assert.h>
struct C
{
int x_;
};
std::atomic<bool> is_init;
void f0(C& c)
{
c.x_ = 37; // B
is_init.store(true); // C
}
void f1(C const& c)
{
while (!is_init.load()); // D
assert(c.x_ == 37); // E
}
int main()
{
is_init.store(false); // A
C c;
std::thread t0{ f0, std::ref(c) };
std::thread t1{ f1, std::ref(c) };
t0.join();
t1.join();
}
The inter-thread happens-before relationship occurs between t0 and t1. As before, A happens-before the creation of threads t0 and t1.
The assignment c.x_ = 37 (B) happens-before the store to the is_init flag (C). The loop in f1 is the source of the inter-thread happens-before relationship: f1 only proceeds once is_init is set, therefore C happens before E. Since these relationships are transitive, B inter-thread happens-before D. Thus, the assert will never fire in f1.

First of all, your example above doesn't warrant any locks. All you need to do is to declare your variable atomic. No locks, no worries.
Second, your question does not really make a lot of sence. Since you can not use your object (instance of the class) before it is constructed, and construction is happening within single thread, there is no need to lock anything which is done in class constructor. You simply can not access non-constructed class from multiple threads, it is impossible.

Related

Where can PTHRED_MUTEX_ADAPTIVE_NP be specified and how does it work?

I found that there's a macro called PTHRED_MUTEX_ADAPTIVE_NP which is somehow given as a value to a mutex so that the mutex does an adaptive spinning, meaning that it spins in the magnitude of an immediate wakeup through the kernel would last. But how do I utilize this configuration-macro to a thread ?
And as I've developed an improved shared readers-writer lock (it needs only one atomic operation at best in contrast to the three operations given in the Wikipedia-solution) with relative writer-priority (further readers are stalled when there's a writer and the readers before are allowed to proceed) which could also make use of adaptive spinning: how is the number of spinning-cycles calculated ?
I found that there's a macro called PTHRED_MUTEX_ADAPTIVE_NP
Some pthreads implementations provide a macro PTHREAD_MUTEX_ADAPTIVE_NP (note spelling) that is one of the possible values of the kind_np mutex attribute, but neither that attribute nor the macro are standard. It looks like at least BSD and AIX have them, or at least did at one time, but this is not something you should be using in new code.
But how do I utilize this configuration-macro to a thread ?
You don't. Even if you are using a pthreads implementation that supports it, this is the value of a mutex attribute, not a thread attribute. You obtain a mutex with that attribute value by explicitly requesting it when you initialize the mutex. It would look something like this:
pthread_mutexattr_t attr;
pthread_mutex_t mutex;
int rval;
// Return-value checks omitted for brevity and clarity
rval = pthread_mutexattr_init(&attr);
rval = pthread_mutexattr_setkind_np(&attr, PTHREAD_MUTEX_ADAPTIVE_NP);
rval = pthread_mutex_init(&mutex, &attr);
There are other mutex attributes that you can set in analogous ways, which is one of the reasons I wrote this answer. Although you should not be using the kind_np attribute, you can follow this general model for other mutex attributes. There are also thread attributes, which work similarly.
I found the code in the glibc:
That's the "adaptive" mutex locking code of pthread_mutex_lock
in the glibc 2.31:
else if (__builtin_expect (PTHREAD_MUTEX_TYPE (mutex)
== PTHREAD_MUTEX_ADAPTIVE_NP, 1))
{
if (! __is_smp)
goto simple;
if (LLL_MUTEX_TRYLOCK (mutex) != 0)
{
int cnt = 0;
int max_cnt = MIN (max_adaptive_count (),
mutex->__data.__spins * 2 + 10);
do
{
if (cnt++ >= max_cnt)
{
LLL_MUTEX_LOCK (mutex);
break;
}
atomic_spin_nop ();
}
while (LLL_MUTEX_TRYLOCK (mutex) != 0);
mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8;
}
assert (mutex->__data.__owner == 0);
}
So the spin count is doubled up to a maximum plus 10 first (system configurable or 1000 if thre's no configuration) and after the locking the difference between the actual spins and the predefined spins divided by 8 is added to the next spin-count.

Accessing an atomic member of a class held by a shared_ptr

I'm trying to create a small class that will allow me to facilitate a communication between two threads.
Those threads most probably will outlive the context in which the above mentioned class was created as they are queued onto a thread pool.
What I have tried so far (on coliru as well):
class A
{
public:
A(int maxVal) : maxValue(maxVal) {}
bool IsOverMax() const { return cur >= maxValue; }
void Increase() { cur++; }
private:
const int maxValue;
atomic_int cur{ 0 };
};
possible usage:
void checking(const shared_ptr<A> counter)
{
while(!counter->IsOverMax())
{
cout<<"Working\n"; // do work
std::this_thread::sleep_for(10ms);
}
}
void counting(shared_ptr<A> counter)
{
while (!counter->IsOverMax())
{
cout<<"Counting\n";
counter->Increase(); // does this fall under `...uses a non-const member function of shared_ptr then a data race will occur`? http://en.cppreference.com/w/cpp/memory/shared_ptr/atomic
std::this_thread::sleep_for(9ms);
}
}
int main()
{
unique_ptr<thread> t1Ptr;
unique_ptr<thread> t2Ptr;
{
auto aPtr = make_shared<A>(100); // This might be out of scope before t1 and t2 end
t1Ptr.reset(new thread(checking, aPtr)); // To simbolize that t1,t2 will outlive the scope in which aPtr was originaly created
t2Ptr.reset(new thread(counting, aPtr));
}
t2Ptr->join();
t1Ptr->join();
//cout<< aPtr->IsOverMax();
}
The reason I'm concerned is that the documentation says that:
If multiple threads of execution access the same std::shared_ptr object without synchronization and any of those accesses uses a non-const member function of shared_ptr then a data race will occur unless all such access is performed through these functions, which are overloads of the corresponding atomic access functions (std::atomic_load, std::atomic_store, etc.)
So Increase is a non const function, are the copies of aPtr are the same std::shared_ptr for this context or not ?
Is this code thread-safe?
Would this be OK for a non atomic object (say using an std::mutex to lock around reads and writes to a regular int)?
In any case why?
So Increase is a non const function, are the copies of aPtr are the same std::shared_ptr for this context or not ?
At std::thread creation, aPtr is passed by value. Therefore, it is guaranteed that:
You don't introduce a data race since each thread gets its own instance of shared_ptr (although they manage the same object A).
The documentation you are referring to describes a scenario whereby multiple threads operate on the same shared_ptr instance.
In that case, only const member functions can be called (see below), or synchronization is required.
shared_ptr reference-count is incremented before aPtr goes out of scope in main
So yes, this is a correct way to use shared_ptr.
Is this code thread-safe?
Your code does not introduce a data race, neither with access to shared_ptr instances, nor with access to the managed object A.
This means that there are no conflicting, non-atomic, read and write operations to the same memory location performed by multiple threads.
However, keep in mind that, in checking(), the call to IsOverMax() is separated from the actual work that follows
(Increase() could be called by the second thread after IsOverMax() but before 'do work'). Therefore, you could 'do work' while cur has gone over its maximum.
Whether or not that is a problem depends on your specification, but it is called a race condition which is not necessarily a programming error (unlike a data race which causes undefined behavior).
Would this be OK for a non atomic object (say using an std::mutex to lock around reads and writes to a regular int)?
cur can be a regular int (non-atomic) if you protect it with a std::mutex. The mutex must be locked for both write and read access in order to prevent a data race.
One remark on calling const member functions on objects shared by multiple threads.
The use of const alone does not guarantee that no data race is introduced.
In this case, the guarantee applies to shared_ptr const member functions, because the documentation says so.
I cannot find in the C++ standard whether that guarantee applies to all const member functions in the Standard Library
That documentation is talking about the member functions of shared_ptr, not the member functions of your class. Copies of shared_ptr objects are different objects.
I believe the code is thread safe, because the only changing variable written and read on different threads is cur, and that variable is atomic.
If cur was not atomic and access to it in both Increase() and IsOverMax() was protected by locking a std::mutex, that code would also be thread safe.

about race condition of weak_ptr

1.
i posted the question(About thread-safety of weak_ptr) several days ago,and I have the other related question now.
If i do something like this,will introduce a race condition as g_w in above example ?(my platform is ms vs2013)
std::weak_ptr<int> g_w;
void f3()
{
std::shared_ptr<int>l_s3 = g_w.lock(); //2. here will read g_w
if (l_s3)
{
;/.....
}
}
void f4() //f4 run in main thread
{
std::shared_ptr<int> p_s = std::make_shared<int>(1);
g_w = p_s;
std::thread th(f3); // f3 run in the other thread
th.detach();
// 1. p_s destory will motify g_w (write g_w)
}
2.As i know std::shared_ptr/weak_ptr derived from std::tr1::shared_ptr/weak_ptr, and std::tr1::shared_ptr/weak_ptr derived from boost::shared_ptr/weak_ptr, are there any difference on the implement,especially, in the relief of thread-safe.
The completed construction of a std::thread synchronizes with the invocation of the specified function in the thread being created, i.e., everything that happens in f4 before the construction of std::thread th is guaranteed to be visible to the new thread when it starts executing f3. In particular the write to g_w in f4 (g_w = p_s;) will be visible to the new thread in f4.
The statement in your comment // 1. p_s destory will motify g_w (write g_w) is incorrect. Destruction of p_s does not access g_w in any way. In most implementations it does modify a common control block that's used to track all shared and weak references to the pointee. Any such modifications to objects internal to the standard library implementation are the library's problem to make threadsafe, not yours, per C++11 ยง 17.6.5.9/7 "Implementations may share their own internal objects between threads if the objects are not visible to users and are protected against data races."
Assuming no concurrent modifications to g_w somewhere else in the program, and no other threads executing f3, there is no data race in this program on g_w.
#Casey
Firstly, I complete my code.
int main()
{
f4();
getchar();
retrun 0;
}
And I find some code in my visual studio 2013.

implement a semaphore

It appears that glib provides mutexes and conditions as thread synchronization primitives, but what about generic semaphores (in the sense that they support the original P and V operations?) Am I correct in understanding a GCond as equivalent to a binary semaphore, with g_cond_signal being equivalent to P, and g_cond_wait being equivalent to V? But what about semaphores not restricted to a maximum value of 1?
I thought of something like this:
struct semaphore {
int n;
GMutex sem_lock;
GCond sem_cond;
}
Where the P operation would now look something like this:
void semaphore_P (struct semaphore *sem)
{
g_mutex_lock(sem->sem_lock);
while (sem->n == 0)
g_cond_wait(sem->sem_cond, sem->sem_lock);
--sem->n;
g_mutex_unlock(sem->sem_lock);
}
Is there a simpler way to get at the functionality of pthreads' sem_wait and sem_post from within glib?
An asynchronous queue can be used as a semaphore:
initialization: GAsyncQueue *queue = g_async_queue_new();
the V operation: g_async_queue_push(queue, GINT_TO_POINTER(1));
the P operation: g_async_queue_pop(queue);
The size of the queue serves as the counter of the semaphore.
The second parameter to g_async_queue_push may be any pointer except for NULL.
However, if you want to use the semaphore for some consumer/producer task, then sending in a pointer to some data will be useful.
In some cases, a thread pool may fit even better.

Can I assign a per-thread index, using pthreads?

I'm optimizing some instrumentation for my project (Linux,ICC,pthreads), and would like some feedback on this technique to assign a unique index to a thread, so I can use it to index into an array of per-thread data.
The old technique uses a std::map based on pthread id, but I'd like to avoid locks and a map lookup if possible (it is creating a significant amount of overhead).
Here is my new technique:
static PerThreadInfo info[MAX_THREADS]; // shared, each index is per thread
// Allow each thread a unique sequential index, used for indexing into per
// thread data.
1:static size_t GetThreadIndex()
2:{
3: static size_t threadCount = 0;
4: __thread static size_t myThreadIndex = threadCount++;
5: return myThreadIndex;
6:}
later in the code:
// add some info per thread, so it can be aggregated globally
info[ GetThreadIndex() ] = MyNewInfo();
So:
1) It looks like line 4 could be a race condition if two threads where created at exactly the same time. If so - how can I avoid this (preferably without locks)? I can't see how an atomic increment would help here.
2) Is there a better way to create a per-thread index somehow? Maybe by pre-generating the TLS index on thread creation somehow?
1) An atomic increment would help here actually, as the possible race is two threads reading and assigning the same ID to themselves, so making sure the increment (read number, add 1, store number) happens atomically fixes that race condition. On Intel a "lock; inc" would do the trick, or whatever your platform offers (like InterlockedIncrement() for Windows for example).
2) Well, you could actually make the whole info thread-local ("__thread static PerThreadInfo info;"), provided your only aim is to be able to access the data per-thread easily and under a common name. If you actually want it to be a globally accessible array, then saving the index as you do using TLS is a very straightforward and efficient way to do this. You could also pre-compute the indexes and pass them along as arguments at thread creation, as Kromey noted in his post.
Why so averse to using locks? Solving race conditions is exactly what they're designed for...
In any rate, you can use the 4th argument in pthread_create() to pass an argument to your threads' start routine; in this way, you could use your master process to generate an incrementing counter as it launches the threads, and pass this counter into each thread as it is created, giving you your unique index for each thread.
I know you tagged this [pthreads], but you also mentioned the "old technique" of using std::map. This leads me to believe that you're programming in C++. In C++11 you have std::thread, and you can pass out unique indexes (id's) to your threads at thread creation time through an ordinary function parameter.
Below is an example HelloWorld that creates N threads, assigning each an index of 0 through N-1. Each thread does nothing but say "hi" and give it's index:
#include <iostream>
#include <thread>
#include <mutex>
#include <vector>
inline void sub_print() {}
template <class A0, class ...Args>
void
sub_print(const A0& a0, const Args& ...args)
{
std::cout << a0;
sub_print(args...);
}
std::mutex&
cout_mut()
{
static std::mutex m;
return m;
}
template <class ...Args>
void
print(const Args& ...args)
{
std::lock_guard<std::mutex> _(cout_mut());
sub_print(args...);
}
void f(int id)
{
print("This is thread ", id, "\n");
}
int main()
{
const int N = 10;
std::vector<std::thread> threads;
for (int i = 0; i < N; ++i)
threads.push_back(std::thread(f, i));
for (auto i = threads.begin(), e = threads.end(); i != e; ++i)
i->join();
}
My output:
This is thread 0
This is thread 1
This is thread 4
This is thread 3
This is thread 5
This is thread 7
This is thread 6
This is thread 2
This is thread 9
This is thread 8

Resources