nonatomic operations on std::atomic - multithreading

I have a huge array of pointers and I need to use them atomically. C++11 provides std::atomic class and relative functions for these purposes, and it is quite fine for me in general. But at the initialization and cleanup stages I do not need it to be atomic, it is known that only one thread will operate the data. It would be easy if atomic pointers would be implemented as plain volatile variables like it actually was before C++11, but now here is my problem: std::atomic forces me to use only atomic operations.
Is there a way of nonatomic use of std::atomic?

Related

magic statics: similar constructs, interesting non-obvious uses?

C++11 introduced threadsafe local static initialization, aka "magic statics": Is local static variable initialization thread-safe in C++11?
In particular, the spec says:
If control enters the declaration concurrently while the variable is
being initialized, the concurrent execution shall wait for completion
of the initialization.
So there's an implicit mutex lock here. This is very interesting, and seems like an anomaly-- that is, I don't know of any other implicit mutexes built into c++ (i.e. mutex semantics without any use of things like std::mutex). Are there any others, or is this unique in the spec?
I'm also curious whether magic static's implicit mutex (or other implicit mutexes, if there are any) can be leveraged to implement other synchronization primitives. For example, I see that they can be used to implement std::call_once, since this:
std::call_once(onceflag, some_function);
can be expressed as this:
static int dummy = (some_function(), 0);
Note, however, that the magic static version is more limited than std::call_once, since with std::call_once you could re-initialize onceflag and so use the code multiple times per program execution, whereas with magic statics, you really only get to use it once per program execution.
That's the only somewhat non-obvious use of magic statics that I can think of.
Is it possible to use magic static's implicit mutex to implement other synchronization primitives, e.g. a general std::mutex, or other useful things?
Initialization of block-scope static variables is the only place where the language requires synchronization. Several library functions require synchronization, but aren't directly synchronization functions (e.g. atexit).
Since the synchronization on the initialization of a local static is a one-time affair, it would be hard, if not impossible, to implement a general purpose synchronization mechanism on top of it, since every time you needed a synchronization point you would need to be initializing a different local static object.
Though they can be used in place of call_once in some circumstances, they can't be used as a general replacement for that, since a given once_flag object may be used from many places.

In Java 7 ConcurrentHashMap, why segment lock is required while writing? Why can't we use Unsafe again to keep things non-blocking?

While going through the internal implementation of Java 7 ConcurrentHashMap, I noticed that to set the new Segment, we are using the Unsafe class which does Ordered write and make use of Compare and Swap algorithm thus supporting non-blocking behaviour.
My doubt is why Concurrent HashMap don't use compare & swap algorithm or something similar to keep the read & write to Singly linked list non-blocking instead of acquiring lock on segment and then writing to singly linked list?
The methods in the Unsafe class are generally unsafe to use. They are only used when the software engineer writing the libraries has figured out that they can be used because of other constraints, such as knowledge of what might simultaneously be accessing the data, or knowledge about how primitives are implemented on various processor architectures.
In this case, some engineer at Oracle has determined that the Unsafe class can be used for the Segment manipulation, but not for the singly linked list.

What constructs are not possible using Ponylang's lock-free model?

Ponylang is a new language that is lock-free and datarace-free. My impression is that to accomplish this, Ponylang looks at the sentence "if two threads can see the same object, then writes must prohibit any other operation by another thread", and uses a type system to enforce the various special cases. For example, there's a type descriptor that says, "no other thread can see this object", and one that says, "this reference is read-only", and various others. Admittedly my understanding of this is quite poor, and ponylang's documentation is short on examples.
My question is: are there operations possible with a lock-based language that aren't translatable into ponylang's type-based system at all? Also, are there such operations that are not translatable into efficient constructs in ponylang?
[...] are there operations possible with a lock-based language that aren't translatable into ponylang's type-based system at all?
The whole point with reference capabilities, in Pony, is to prevent you from doing things that are possible and even trivial, in other languages, like sharing a list between two threads and add elements to it concurrently. So, yes, in languages like Java, you can share data between threads in a way that is impossible in Pony.
Also, are there such operations that are not translatable into efficient constructs in ponylang?
If you're asking if the lock-based languages can be more efficient in some situations, than pony, then I think so. You can always create a situation that benefits from N threads and 1 lock and is worse when you use the actor model which forces you to pass information around in messages.
This thing is not to see the actor model as superior in all cases. It's a different model of concurrency and problems are solved differently. For example, to compute N values and accumulate the results in a list:
In a thread-model you would
create a thread pool,
create thread-safe list,
Create N tasks sharing the list, and
wait for N tasks to finish.
In an actor-model you would
create an actor A waiting for N values,
create N actors B sharing the actor A, and
wait for A to produce a list.
Obviously, each task would add a value to the list and each actor B would send the value to actor A. Depending on how messages are passed between actors, it can be a slower to send N values than to lock N times. Typically it will be slower but, on the other hand, you will never get a list with an unexpected size.
I believe it can do anything that a shared everything + locks can do. with just iso objects and consume it is basically pure a message passing system which can do anything that a lock system does. As in mach3 can do anything linux can.

C++ Threads writing to different parts of array of vector

I have an std::array<std::vector, NUM_THREADS> and I basically want each thread to go get some data, and store it in its own std::vector, and also to read from its vector.
Is this safe? Or am I going to have to use a mutex or something?
The rule regarding data-races is that if every memory location is either accessed by no more than one thread at a time, or is only read (by any number of threads, but no writes), you don't need atomicity. Otherwise, you need either atomicity or synchronization (such as mutual-exclusion).
If every thread is only writing to and reading from its own vector, this would be safe. If two threads are writing to the same vector elements without synchronization, or if they're both writing to the same vector itself (e.g., appending or truncating the vector), you're pretty much clobbered --- that's two simultaneous writes. If two threads are each writing to elements of their own vectors and reading from both vectors, it's more complicated, but in general I would expect it to be unsafe. There are very specific arrangements where it may be safe/legal, but they will be very brittle, and likely hard to maintain, so it's probably better to re-architect to avoid it.
As an example of a usage like this where it would be legal (but again, brittle and hard to retain safety during code maintenance) would be where none of the vectors are changing size (a reallocation is going to be a write to the vector itself which would preclude any reads on the vector or its elements by other threads) and each thread is able to avoid reading from any specific element of a vector that is written to by any other thread (for example, you have two threads, one reading from and writing to even elements of the vectors and the other reading from and writing to odd elements of the vectors).
The above example is very artificial and probably not all that useful for real access patterns that might be desired. Other examples I could think of would probably also be artificial and unhelpful. And it's very easy to do some simple operation that would destroy the whole guarantee. In particular, if any thread performs push_back() on their own vector, any threads that may be concurrently reading the vector are almost guaranteed to result in undefined behavior. (You might be able to align the stars using reserve() very carefully and make code that is legal, but I certainly wouldn't attempt it myself.)

"Wait-free" data in Haskell

I've been led to believe that the GHC implementation of TVars is lock-free, but not wait-free. Are there any implementations that are wait-free (e.g. a package on Hackage)?
Wait-freedom is a term from distributed computing. An algorithm is wait-free if a thread (or distributed node) is able to terminate correctly even if all input from other threads is delayed/lost at any time.
If you care about consistency, then you cannot guarantee wait-freedom (assuming that you always want to terminate correctly, i.e. guarantee availability). This follows from the CAP theorem [1], since wait-freedom essentially implies partition-tolerance.
[1] http://en.wikipedia.org/wiki/CAP_theorem
Your question "Are there any implementations that are wait-free?" is a bit incomplete. STM (and thus TVar) is rather complex and has support built into the compiler - you can't build it properly with Haskell primitives.
If you're looking for any data container that allows mutation and can be non-blocking then you want IORefs or MVars (but those can block if no value is available).

Resources