volatile keyword for objects in C++ - multithreading

I have a thread safe counter object( it's a class which uses std::atomic load() and store() )
as one of the class members. Thread 1 increments the counter and Thread 2 reads the counter.
Usually, primitive types ( int etc ) which are shared by different threads are declared volatile to prevent any compiler optimizations. Do I have to declare this thread safe counter object which is shared by 2 different threads as volatile ?
Could someone provide more insight into this ?

No. There is no need if the object is declared atomic.
A C or C++ compiler may not reorder reads and writes to volatile memory locations, nor may it omit a read or write to a volatile memory location.
By using atomic, it already achieves the what volatile intended to do, so no need to declare volatile.

Take a look at: volatile (C++) msdn article
You don't have to because
"The volatile keyword in C++11 ISO Standard code is to be used only for hardware access; do not use it for inter-thread communication. For inter-thread communication, use mechanisms such as std::atomic from the C++ Standard Template Library."

Related

Will Mutex protection failed for register promotion?

In an article about c++11 memory order, author show an example reasoning "threads lib will not work in c++03"
for (...){
...
if (mt) pthread_mutex_lock(...);
x=...x...
if (mt) pthread_mutex_unlock(...);
}
//should not have data-race
//but if "clever" compiler use a technique called
//"register promotion" , code become like this:
r = x;
for (...){
...
if (mt) {
x=r; pthread_mutex_lock(...); r=x;
}
r=...r...
if (mt) {
x=r; pthread_mutex_unlock(...); r=x;
}
x=r;
There are 3 question:
1.Is this promotion only break the mutex protection in c++03?What about c language?
2.c++03 thread libs become unwork?
3.Any other promotion may caused same problem?
If it's wrong example, then thread libs work, what about the 《Threads Cannot be Implemented as a Library》by Hans Boehm.
POSIX functions pthread_mutex_lock and pthread_mutex_unlock are memory barriers, the compiler and/or CPU cannot reorder loads and stores around them. Otherwise the mutexes would be useless. That article is probably inaccurate.
See POSIX 4.12 Memory Synchronization:
Applications shall ensure that access to any memory location by more than one thread of control (threads or processes) is restricted such that no thread of control can read or modify a memory location while another thread of control may be modifying it. Such access is restricted using functions that synchronize thread execution and also synchronize memory with respect to other threads. The following functions synchronize memory with respect to other threads: [see the list on the website]
For single thread code, the state in the abstract machine is not directly observable: objects that aren't volatile are not guaranteed to have any particular state when you pause the only thread with a signal and observe it via ptrace or the equivalent. The only requirement is that the program execution has the same observable behavior as a behavior of one possible execution of the abstract machine.
The observables are the interactions with external world; basically, input/output on streams and actions on volatile objects.
A compiler for mono-thread code can generate code that perform operations on global variables or other object that happen to be shared between threads, as long as the single thread semantic is respected. This is obviously the case if a global variable to changed in such a way that it gets back its original value.
For example, a compiler might emit code that increment then decrement a variable, at least in some rare cases; the goal would be to emit simple code, at the cost of the occasional few unneeded operations.
Such changes to shared variables that don't exist in the abstract machine would obviously break multithreaded code that concurrently performs a real operation; such code does not have any race condition on the accesses of the shared variable, that are properly serialized, but the generated code introduced a race that breaks the program.

Shared variables in OpenMP

I have a very basic question (maybe stupid) regarding shared variables in OpenMP. Consider the following code:
void main()
{
int numthreads;
#pragma omp parallel default(none) shared(numthreads)
{
numthreads = omp_get_num_threads();
printf("%d\n",numthreads);
}
}
Now the value of numthreads is the same for all threads. is there a possibility that since various threads are writing the same value to the same variable, the value might get garbled/mangled ? Or is this operation on a primitive datatype guaranteed to be atomic ?
As per the standard, this is not safe:
A single access to a variable may be implemented with multiple load or store instructions, and
hence is not guaranteed to be atomic with respect to other accesses to the same variable.
[...]
If multiple threads write without synchronization to the same memory unit, including cases due to
atomicity considerations as described above, then a data race occurs. [...] If a data race occurs then the result of the program is unspecified.
I strongly recommend reading 1.4.1 Structure of the OpenMP Memory Model. While it's not the easiest read, it's very specific and quite clear. By far better than I could describe it here.
Two things need to be considered about shared variables in OpenMP: atomicity of access and the temporary view of memory.

Atomic Operation in Linux Kernel

typedef struct { int counter; } atomic_t;
what does atomic_t means? HOW does compiled treats it? Historically, counter has been declared volatile, which implied it's a CPU register right?
The reason it is declared as a struct like that is so that the programmer using it is forced (gently reminded, rather) to use the access functions to manipulate it. For example, aval = 27 would not compile. Neither would aval++.
The volatile keyword has always meant the opposite of a CPU register: it means a value that has to be read from and written to memory directly.
If counter was historically volatile it was wrong because volatile has never been good enough on its own to ensure proper atomic updates. I believe that the current atomic manipulator functions use a cast through a volatile pointer combined with the appropriate write barrier functions, and machine code for some operations that the compiler cannot do properly.
atomic_t indicates it's an atomic type. Compiler will treats it as typedefed struct. I don't know what history says, but volatile is usually used to skip compiler optimizations and it doesn't imply CPU register.
Well, as it's name implies, all of it's operation is atomic i.e done at once, can't be scheduled out. atomic_t types have few helpers (like atomic_{inc,dec}, atomic_or and many) for manipulating any atomic type data. During manipulation of an atomic type, helpers usually inserts bus lock, as if they're not interrupted and make the whole thing atomic.

lookup tables in C++ 11 with multithreading

I have 2 similar situations in a multithreaded C++11 software :
an array that I'm using as a lookup table inside a method declaration
an array that I'm using as a lookup table declared outside a method and that is being used by different and several methods by reference or with pointers.
now if we forget for a minute about this LUTs and we just consider C++11 and a multithreaded approach for a generic method, the most appropriate qualifier for this methods in terms of storage duration is probably thread_local.
This way if i feed a method foo() that is thread_local to 3 threads I basically end up having 3 instances of foo() for each thread, this move "solves" the problem with foo() being shared and accessed between 3 different threads, avoiding cache missings, but I basically have 3 possible different behaviours for my foo(), for example if I have the same PRNG implemented in foo() and i provide a seed that is time-dependant with a really good and high resolution, I probably will get 3 different results with each thread and a real mess in terms of consistency.
But let's say that I'm fine with how thread_local works, how I can write down the fact that I need to keep a LUT always ready and cached for my methods ?
I read something about a relaxed or less relaxed memory model, but in C++11 I have never seen a keyword or a practical application that can inject the caching of an array/LUT .
I'm on x86 or ARM.
I probably need something that is the opposite thing of volatile basically.
If the LUTs are read-only, so that you can share them without locks, you should just use one of them (i.e. declare them static).
Threads do not have their own caches. But even if they did (cores typically have their own L1 cache, and you might be able to lock a thread to a core), there would be no problem for two different threads to cache different parts of the same memory structure.
"Thread-local storage" does not mean that the memory is somehow physically tied to the thread. Rather, it's a way to let the same name refer to a different object in each thread. In no way does it restrict the ability of any thread to access the object, if given its address.
The CPU cache is not programmable. It uses its own internal logic to determine which memory regions to cache. Typically it will cache the memory that either has just been accessed by the CPU, or its prediction logic determines will shortly be accessed by the CPU. In a multiprocessor system, each CPU may have its own cache, or different CPUs may share a cache. If there are multiple caches, a memory region may be cached in more than one simultaneously.
If all threads must see the same values in the look-up tables, then a single table would be best. This could be achieved with a variable with static storage duration. If the data can be modified then you would probably also need a std::mutex to protect accesses to the table and avoid data races. Read-only data can be shared without additional synchronization; in this case it is best to declare it const to make the read-only nature explicit and avoid accidental modifications.
void foo(){
static const int lut[]={...};
}
You use thread_local where each thread must have its own copy of the data, usually because each copy will be modified independently. For example, you may choose to use thread_local for your random-number generator, so that each thread has its own RNG which is independent of the other threads, and does not require synchronization.
void bar(){
thread_local RandomNumberGenerator rng; // one per thread
auto val=rng.nextRandomNumber(); // use the instance for the current thread
}

Are incrementers / decrementers (var++, var--) etc thread safe?

Inspired by this question: In Complexity Analysis why is ++ considered to be 2 operations?
Take the following psuedo code:
class test
{
int _counter;
void Increment()
{
_counter++;
}
}
Would this be considered thread safe on an x86 architechure? Further more are the Inc / Dec assembly instructions thread safe?
No, incrementing is not thread-safe. Neither are the INC and DEC instructions. They all require a load and a store, and a thread running on another CPU could do its own load or store on the same memory location interleaved between those operations.
Some languages have built-in support for thread synchronization, but it's usually something you have to ask for, not something you get automatically on every variable. Those that don't have built-in support usually have access to a library that provides similar functionality.
In a word, no.
You can use something like InterlockedIncrement() depending on your platform. On .NET you can use the Interlocked class methods (Interlocked.Increment() for example).
A Rob Kennedy mentioned, even if the operation is implemented in terms of a single INC instruction, as far as the memory is concerned a read/increment/write set of steps is performed. There is the opportunity on a multi-processor system for corruption.
There's also the volatile issue, which would be a necessary part of making the operation thread-safe - however, marking the variable volatile is not sufficient to make it thread-safe. Use the interlocked support the platform provides.
This is true in general, and on x86/x64 platforms certainly.

Resources