I understand that algorithms for mutual exclusion such as Dekker's algorithm and Peterson's algorithm require an underlying sequentially consistent memory model to work (or use of memory barriers), but it's not clear to me if they require atomic loads and stores. In the wikipedia entry for Peterson's algorithm it says this:
The algorithm satisfies the three essential criteria to solve the critical section problem, provided that changes to the variables turn, flag[0], and flag1 propagate immediately and atomically.
I'm not clear on the quoted bit above and whether or not that means the algorithm requires atomic loads and stores to work. Looking at Peterson's algorithm I don't see why atomic loads and stores are required for it to work. I don't see mention of this atomicity requirement in the Dekker's algorithm wikipedia entry. So does Peterson's algorithm require atomic loads and stores and does this extend to all mutual exclusion algorithms?
Yes, it does require them to be atomic. Dekker's algorithm does too.
Suppose turn and flag are not atomic. That means that P0 can set turn and flag[0], but there exists an interval of time when P1 can still view them as not set, and can enter its critical section. Symmetrically, P0 can enter the CS too.
Related
There's a useful pattern in imperative programming, namely, a doubly-linked-list coupled with a hash-table for constant time lookup in the linked list.
One application of this pattern is in LRU cache. The head of the doubly-linked-list will contain the least recently used entry in the cache and the last element in the doubly-linked-list will contain the most recently used entry. The keys in the hash-table are keys of the entries and the values are pointers to nodes in the linked-list corresponding to the key/entry. When an entry is queried in the cache, hash-table will be used to point to its node in the linked-list and then the node will be removed from its current location in the linked-list and be placed at the end of the linked-list making it the most-recently-used entry. For eviction, we simply remove entries from the head of the linked-list as they are the least recently used ones. Both lookup and eviction operations will take constant time.
I can think of implementing this in Haskell using two TreeMaps and I know that the time complexity will be O(log n). But I am a little uncomfortable as the constant factor in the time complexity seems a little high. Specifically, to perform a look-up, first I need to check if the entry exists and save its value, then I need to first delete it from the LRU map and re-insert it with a new key. This means that each lookup will result in a root-to-node traversal three times.
Is there a better way of doing this in Haskell?
As comments indicate, mutable vectors are perfectly acceptable when required. However, I think there's an issue with the way you've stated the question - unless the idea is to duplicate "as closely as possible" (without mutable structures) the imperative code, why bother having 2 treemaps? A single priority search queue (see packages pqueue or PSQueue) would be an appropriate structure whilst maintaining purity. It supports efficiently both priorities (for eviction) and searching (for lookups of your desired cached argument).
On a related note, some structures support eg. Data.Map's alterF, which effectively provides you with a continuation allowing you to "do something else" dependent on the Maybe value at a key, but "remembering" where you are and thus avoiding to pay the full cost to re-traverse the structure to subsequently modify at this key. See also the at lens.
Which mechanism does it use form the following?
https://en.wikipedia.org/wiki/Hash_table#Collision_resolution
Open addressing with quadratic probing (reference: source code).
Note 1: Not everything that acts like an associative array is actually implemented as a hash table under the hood. In particular, small/dense arrays like [3, 1, 4, 1.5] are backed by an actual array (similar to a C array) for fast index-based access.
Note 2: The answer to this question may or may not change over time if/when the team experiments with alternative implementations. For example, open addressing requires relatively low load factors in order to provide quick accesses; it would be interesting to find an implementation that's more memory efficient (without being slower).
Ponylang is a new language that is lock-free and datarace-free. My impression is that to accomplish this, Ponylang looks at the sentence "if two threads can see the same object, then writes must prohibit any other operation by another thread", and uses a type system to enforce the various special cases. For example, there's a type descriptor that says, "no other thread can see this object", and one that says, "this reference is read-only", and various others. Admittedly my understanding of this is quite poor, and ponylang's documentation is short on examples.
My question is: are there operations possible with a lock-based language that aren't translatable into ponylang's type-based system at all? Also, are there such operations that are not translatable into efficient constructs in ponylang?
[...] are there operations possible with a lock-based language that aren't translatable into ponylang's type-based system at all?
The whole point with reference capabilities, in Pony, is to prevent you from doing things that are possible and even trivial, in other languages, like sharing a list between two threads and add elements to it concurrently. So, yes, in languages like Java, you can share data between threads in a way that is impossible in Pony.
Also, are there such operations that are not translatable into efficient constructs in ponylang?
If you're asking if the lock-based languages can be more efficient in some situations, than pony, then I think so. You can always create a situation that benefits from N threads and 1 lock and is worse when you use the actor model which forces you to pass information around in messages.
This thing is not to see the actor model as superior in all cases. It's a different model of concurrency and problems are solved differently. For example, to compute N values and accumulate the results in a list:
In a thread-model you would
create a thread pool,
create thread-safe list,
Create N tasks sharing the list, and
wait for N tasks to finish.
In an actor-model you would
create an actor A waiting for N values,
create N actors B sharing the actor A, and
wait for A to produce a list.
Obviously, each task would add a value to the list and each actor B would send the value to actor A. Depending on how messages are passed between actors, it can be a slower to send N values than to lock N times. Typically it will be slower but, on the other hand, you will never get a list with an unexpected size.
I believe it can do anything that a shared everything + locks can do. with just iso objects and consume it is basically pure a message passing system which can do anything that a lock system does. As in mach3 can do anything linux can.
I have an std::array<std::vector, NUM_THREADS> and I basically want each thread to go get some data, and store it in its own std::vector, and also to read from its vector.
Is this safe? Or am I going to have to use a mutex or something?
The rule regarding data-races is that if every memory location is either accessed by no more than one thread at a time, or is only read (by any number of threads, but no writes), you don't need atomicity. Otherwise, you need either atomicity or synchronization (such as mutual-exclusion).
If every thread is only writing to and reading from its own vector, this would be safe. If two threads are writing to the same vector elements without synchronization, or if they're both writing to the same vector itself (e.g., appending or truncating the vector), you're pretty much clobbered --- that's two simultaneous writes. If two threads are each writing to elements of their own vectors and reading from both vectors, it's more complicated, but in general I would expect it to be unsafe. There are very specific arrangements where it may be safe/legal, but they will be very brittle, and likely hard to maintain, so it's probably better to re-architect to avoid it.
As an example of a usage like this where it would be legal (but again, brittle and hard to retain safety during code maintenance) would be where none of the vectors are changing size (a reallocation is going to be a write to the vector itself which would preclude any reads on the vector or its elements by other threads) and each thread is able to avoid reading from any specific element of a vector that is written to by any other thread (for example, you have two threads, one reading from and writing to even elements of the vectors and the other reading from and writing to odd elements of the vectors).
The above example is very artificial and probably not all that useful for real access patterns that might be desired. Other examples I could think of would probably also be artificial and unhelpful. And it's very easy to do some simple operation that would destroy the whole guarantee. In particular, if any thread performs push_back() on their own vector, any threads that may be concurrently reading the vector are almost guaranteed to result in undefined behavior. (You might be able to align the stars using reserve() very carefully and make code that is legal, but I certainly wouldn't attempt it myself.)
I've been led to believe that the GHC implementation of TVars is lock-free, but not wait-free. Are there any implementations that are wait-free (e.g. a package on Hackage)?
Wait-freedom is a term from distributed computing. An algorithm is wait-free if a thread (or distributed node) is able to terminate correctly even if all input from other threads is delayed/lost at any time.
If you care about consistency, then you cannot guarantee wait-freedom (assuming that you always want to terminate correctly, i.e. guarantee availability). This follows from the CAP theorem [1], since wait-freedom essentially implies partition-tolerance.
[1] http://en.wikipedia.org/wiki/CAP_theorem
Your question "Are there any implementations that are wait-free?" is a bit incomplete. STM (and thus TVar) is rather complex and has support built into the compiler - you can't build it properly with Haskell primitives.
If you're looking for any data container that allows mutation and can be non-blocking then you want IORefs or MVars (but those can block if no value is available).