What is the analogue of ConcurrentHashMap in Haskell? - multithreading

update: please, bear in mind, I'm just started learning Haskell
Let's say we're writing an application with the following general functionality:
when starting, it gathers some data from an external source;
this data are a set of complex structures which contain lists,
arrays, ints, string, etc.;
when running, the application serves web API (servlets) that provides
access to the data.
Now, if the application would be written in Java, we could use static ConcurrentHashMap object where the data could be stored (representing Java classes). So that, during start, the app could fill the map with data, and then servlets could access it providing some API to the clients.
If the application would be written in Erlang, we could use ETS/DETS for storing the data (as native Erlang structures).
Now the question: what is the proper Haskell way for implementing such design?
It shouldn't be DB, it should be some sort of a lightweight in-memory something, that could store complex structures (Haskell native structures), and that could be accessible from different threads (servlets, talking by Java-world entities). In Haskell: no static global vars as in Java, no ETS and OTP as in Erlang, - so how to do it the right way (with no using external solutions like Redis)?
Thanks
update: another important part of the question - since Haskell doesn't (?) have 'global static' variables, then what would be the right way for implementing this 'global accessible' data keeping object (say, it is "stm-containers")? Should I initialize it somewhere in the 'main' function and then just pass it to every REST API handler? Or is there any other more correct way?

It's not clear from your question whether the client API will provide ways of mutating the data.
If not (i.e., the API will only be about querying), then any immutable data-structure will suffice, since one beauty of immutable data is that it can be accessed from multiple threads safely with you being sure that it can't change. No need for the overhead of locks or other strategies for working with concurrency. You'll simply construct the immutable data during the initialisation and then just query it. For this consider a package like "unordered-containers".
If your API will also be mutating the data, then you will need mutable data-structures, which are optimised for concurrency. "stm-containers" is one package, which provides those.

First off, I'm going to assume you mean it needs to be available to multiple threads, not multiple processes. (The difference being that threads share memory, processes do not.) If that assumption is wrong, much of your question doesn't make sense.
So, the first important point: Haskell has mutable data structures. They can easily be shared between threads. Here's a small example:
import Control.Concurrent
import Control.Monad
main :: IO ()
main = do
v <- newMVar 0 :: IO (MVar Int)
forkIO . forever $ do
x <- takeMVar v
putMVar v $! x + 1
forM_ [1..10] $ \_ -> do
x <- readMVar v
threadDelay 100
print x
Note the use of ($!) when putting the value in the MVar. MVars don't enforce that their contents are evaluated. There's some subtlety in making sure everything works properly. You will get lots of space leaks until you understand Haskell's evaluation model. That's part of why this sort of thing is usually done in a library that handles all those details.
Given this, the first pass approach is to just store a map of some sort in an MVar. Unless it's under a lot of contention, that actually has pretty good performance properties.
When it is under contention, you have a good fallback secondary approach, especially when using a hash map. That's striping. Instead of storing one map in one MVar, use N maps in N MVars. The first step in a lookup is using the hash to determine which of the N MVars to look in.
There are fancy lock-free algorithms, which could be implemented using finer-grained mutable values. But in general, they are a lot of engineering effort for a few percent improvement in performance that doesn't really matter in most use cases.

Related

Are purely functional data structures always lock-free?

I've seen this claimed in several places, including on SO: https://stackoverflow.com/a/20467457/243238, https://stackoverflow.com/a/4400389/243238. I get the point that locks are not needed to modify the data but you end up with multiple versions of it after concurrent modifications. That doesn't seem very useful in practice. I've tried to describe this with a simple scenario below:
Let's say we have 2 threads A and B. They are both modifying a purely functional dictionary D. They don't need locks at this point because the data is immutable so they output new dictionaries DA and DB. Now my question is how do you reconcile DA and DB so that later operations can see a single view of the data?
EDIT: The answers are suggesting to use a merge function over DA and DB. I don't see how that solves the problem since there could be another thread C which runs concurrently with the merge operation. The claim is that purely functional data structures are lock-free but using a merge function sounds more like eventual consistency which is a different thing.
Good observation. But this just means that parallelism in the presence of a "global state" is difficult or impossible, regardless of the programming language.
In impure languages, this is just less obvious. You may think that you can get away with locking, synchronization and all the highly complex code around it.
Whereas in pure languages, you have n pure state transformers and cannot but realize that if you apply them in pareallel to some initial state S0 you end up with so many states S1, S2, S3 ... Sn
Now, the purity just guarantees that the parallel transformations of S0 do not interfere with each other in any way. It doesn't solve your program design issues, such as what that state means, if it is maybe too coarse, or too fine grained and if you can compute an end result form various transformed states.
The point with functional data structures is atomicity. In your scenario, DA and DB are guaranteed to be consistent, how to merge them into a consistent value depends on your application (functional programming will not solve ALL your problems for you). This is still however more than what you get in imperative paradigm where DA and DB depend on how the threads are sequenced.
In general there are a lot of ways to do it. In practice, you may want to ensure that either DA or DB is canonical. If that cannot be done then there should be a merge process
Thread A/B:
da <- readSharedMemory at D
da' <- modifyDict da
mergeSharedMemory da'
where mergeSharedMemory is capable of combining two dictionaries with a similar history in a consistent fashion. Generally, these kinds of structures are well-studied in the database literature as CRDTs—"convergent replicated data types" and "commutative replicated data types". The INRIA Paper has a great deal of detail about this.
Purely functional data structures are the easiest way to make lock-free mutable data structures. Given an arbitrary purely functional data structure T, you can define a lock-free mutable data structure newtype U = U (IORef T). Given an arbitrary function f :: T -> (T, a), you can write
mf :: U -> IO a
mf (U ref) = atomicModifyIORef' ref f
Why, then, is there a whole literature of fancy techniques for making lock-free data structures? Because not all data structures can be implemented efficiently enough in a purely functional way. Any operation that takes too long will slow all the threads down and lead to a backlog of updates.
You create a third function taking DA and DB that produces DAB.

How do I serialize or save to a file a Thunk?

In Haskell, you can have infinite lists, because it doesn't completely compute them, it uses thunks. I am wondering if there is a way to serialize or otherwise save to a file a piece of data's thunk. For example let us say you have a list [0..]. Then you do some processing on it (I am mostly interested in tail and (:), but it should support doing filter or map as well.) Here is an example of sort of what I am looking for.
serial::(SerialThunk a)=>a->serThunk
serialized = serial ([0..] :: [Int])
main=writeToFile "foo.txt" serialized
And
deserial::(SerialThunk a)=>serThunk->a
main=do
deserialized <- readFromFile "foo.txt" :: IO [Int]
print $ take 10 deserialized
No. There is no way to serialize a thunk in Haskell. Once code is compiled it is typically represented as assembly (for example, this is what GHC does) and there is no way to recover a serializable description of the function, let alone the function and environment that you'd like to make a thunk.
Yes. You could build custom solutions, such as describing and serializing a Haskell expression. Deserialization and execution could happen by way of interpretation (ex. Using the hint package).
Maybe. Someone (you?) could make a compiler or modify an existing compiler to maintain more information in a platform-agnostic manner such that things could be serialized without the user manually leveraging hint. I imaging this is an are under exploration by the Cloud Haskell (aka distributed-haskell) developers.
Why? I have also wanted an ability to serialize functions so that I could pass closures around in a flexible manner. Most of the time, though, that flexibility isn't actually needed and instead people want to pass certain types of computations that can be easily expressed as a custom data type and interpretation function.
packman: "Evaluation-orthogonal serialisation of Haskell data, as a library" (thanks to a reddit link) -- is exactly what we have been looking for!
...this serialisation is orthogonal to evaluation: the argument is
serialised in its current state of evaluation, it might be
entirely unevaluated (a thunk) or only partially evaluated (containing
thunks).
...The library enables sending and receiving data between different nodes
of a distributed Haskell system. This is where the code originated:
the Eden runtime system.
...Apart from this obvious application, the functionality can be used to
optimise programs by memoisation (across different program runs), and
to checkpoint program execution in selected places. Both uses are
exemplified in the slide set linked above.
...Another limitation is that serialised data can only be used by the
very same binary. This is however common for many approaches to
distributed programming using functional languages.
...
Cloud Haskell supports serialization of function closures. http://www.haskell.org/haskellwiki/Cloud_Haskell
Apart from the work in Cloud Haskell and HdpH on "closures", and part from the answers stating that thunks are not analyzable at runtime, I've found that:
:sprint in GHCi seems to have access to internal thunk representation -- . Perhaps GHCi works with some special, non-optimized code. So in principle one could use this representation and the implementation of :sprint if one wants to serialize thunks, isn't that true?
http://hackage.haskell.org/package/ghc-heap-view-0.5.3/docs/GHC-HeapView.html -- "With this module, you can investigate the heap representation of Haskell values, i.e. to investigate sharing and lazy evaluation."
I'd be very curious to know what kind of working solutions for seriliazing closures can be made out of this stuff...

Haskell: Lists, Arrays, Vectors, Sequences

I'm learning Haskell and read a couple of articles regarding performance differences of Haskell lists and (insert your language)'s arrays.
Being a learner I obviously just use lists without even thinking about performance difference.
I recently started investigating and found numerous data structure libraries available in Haskell.
Can someone please explain the difference between Lists, Arrays, Vectors, Sequences without going very deep in computer science theory of data structures?
Also, are there some common patterns where you would use one data structure instead of another?
Are there any other forms of data structures that I am missing and might be useful?
Lists Rock
By far the most friendly data structure for sequential data in Haskell is the List
data [a] = a:[a] | []
Lists give you ϴ(1) cons and pattern matching. The standard library, and for that matter the prelude, is full of useful list functions that should litter your code (foldr,map,filter). Lists are persistant , aka purely functional, which is very nice. Haskell lists aren't really "lists" because they are coinductive (other languages call these streams) so things like
ones :: [Integer]
ones = 1:ones
twos = map (+1) ones
tenTwos = take 10 twos
work wonderfully. Infinite data structures rock.
Lists in Haskell provide an interface much like iterators in imperative languages (because of laziness). So, it makes sense that they are widely used.
On the other hand
The first problem with lists is that to index into them (!!) takes ϴ(k) time, which is annoying. Also, appends can be slow ++, but Haskell's lazy evaluation model means that these can be treated as fully amortized, if they happen at all.
The second problem with lists is that they have poor data locality. Real processors incur high constants when objects in memory are not laid out next to each other. So, in C++ std::vector has faster "snoc" (putting objects at the end) than any pure linked list data structure I know of, although this is not a persistant data structure so less friendly than Haskell's lists.
The third problem with lists is that they have poor space efficiency. Bunches of extra pointers push up your storage (by a constant factor).
Sequences Are Functional
Data.Sequence is internally based on finger trees (I know, you don't want to know this) which means that they have some nice properties
Purely functional. Data.Sequence is a fully persistant data structure.
Darn fast access to the beginning and end of the tree. ϴ(1) (amortized) to get the first or last element, or to append trees. At the thing lists are fastest at, Data.Sequence is at most a constant slower.
ϴ(log n) access to the middle of the sequence. This includes inserting values to make new sequences
High quality API
On the other hand, Data.Sequence doesn't do much for the data locality problem, and only works for finite collections (it is less lazy than lists)
Arrays are not for the faint of heart
Arrays are one of the most important data structures in CS, but they dont fit very well with the lazy pure functional world. Arrays provide ϴ(1) access to the middle of the collection and exceptionally good data locality/constant factors. But, since they dont fit very well into Haskell, they are a pain to use. There are actually a multitude of different array types in the current standard library. These include fully persistant arrays, mutable arrays for the IO monad, mutable arrays for the ST monad, and un-boxed versions of the above. For more check out the haskell wiki
Vector is a "better" Array
The Data.Vector package provides all of the array goodness, in a higher level and cleaner API. Unless you really know what you are doing, you should use these if you need array like performance. Of-course, some caveats still apply--mutable array like data structures just dont play nice in pure lazy languages. Still, sometimes you want that O(1) performance, and Data.Vector gives it to you in a useable package.
You have other options
If you just want lists with the ability to efficiently insert at the end, you can use a difference list. The best example of lists screwing up performance tends to come from [Char] which the prelude has aliased as String. Char lists are convient, but tend to run on the order of 20 times slower than C strings, so feel free to use Data.Text or the very fast Data.ByteString. I'm sure there are other sequence oriented libraries I'm not thinking of right now.
Conclusion
90+% of the time I need a sequential collection in Haskell lists are the right data structure. Lists are like iterators, functions that consume lists can easily be used with any of these other data structures using the toList functions they come with. In a better world the prelude would be fully parametric as to what container type it uses, but currently [] litters the standard library. So, using lists (almost) every where is definitely okay.
You can get fully parametric versions of most of the list functions (and are noble to use them)
Prelude.map ---> Prelude.fmap (works for every Functor)
Prelude.foldr/foldl/etc ---> Data.Foldable.foldr/foldl/etc
Prelude.sequence ---> Data.Traversable.sequence
etc
In fact, Data.Traversable defines an API that is more or less universal across any thing "list like".
Still, although you can be good and write only fully parametric code, most of us are not and use list all over the place. If you are learning, I strongly suggest you do too.
EDIT: Based on comments I realize I never explained when to use Data.Vector vs Data.Sequence. Arrays and Vectors provide extremely fast indexing and slicing operations, but are fundamentally transient (imperative) data structures. Pure functional data structures like Data.Sequence and [] let efficiently produce new values from old values as if you had modified the old values.
newList oldList = 7 : drop 5 oldList
doesn't modify old list, and it doesn't have to copy it. So even if oldList is incredibly long, this "modification" will be very fast. Similarly
newSequence newValue oldSequence = Sequence.update 3000 newValue oldSequence
will produce a new sequence with a newValue for in the place of its 3000 element. Again, it doesn't destroy the old sequence, it just creates a new one. But, it does this very efficiently, taking O(log(min(k,k-n)) where n is the length of the sequence, and k is the index you modify.
You cant easily do this with Vectors and Arrays. They can be modified but that is real imperative modification, and so cant be done in regular Haskell code. That means operations in the Vector package that make modifications like snoc and cons have to copy the entire vector so take O(n) time. The only exception to this is that you can use the mutable version (Vector.Mutable) inside the ST monad (or IO) and do all your modifications just like you would in an imperative language. When you are done, you "freeze" your vector to turn in into the immutable structure you want to use with pure code.
My feeling is that you should default to using Data.Sequence if a list is not appropriate. Use Data.Vector only if your usage pattern doesn't involve making many modifications, or if you need extremely high performance within the ST/IO monads.
If all this talk of the ST monad is leaving you confused: all the more reason to stick to pure fast and beautiful Data.Sequence.

Are new vectors created even if the old ones aren't used anymore?

This question is about the Data.Vector package.
Given the fact that I'll never use the old value of a certain cell once the cell is updated. Will the update operation always create a new vector, reflecting the update, or will it be done as an in-place update ?
Note: I know about Data.Vector.Mutable
No, but something even better can happen.
Data.Vector is built using "stream fusion". This means that if the sequence of operations that you are performing to build up and then tear down the vector can be fused away, then the Vector itself will never even be constructed and your code will turn into an optimized loop.
Fusion works by turning code that would build vectors into code that builds up and tears down streams and then puts the streams into a form that the compiler can see to perform optimizations.
So code that looks like
foo :: Int
foo = sum as
where as, bs, cs, ds, es :: Vector Int
as = map (*100) bs
bs = take 10 cs
cs = zipWith (+) (generate 1000 id) ds
ds = cons 1 $ cons 3 $ map (+2) es
es = replicate 24000 0
despite appearing to build up and tear down quite a few very large vectors can fuse all the way down to an inner loop that only calculates and adds 10 numbers.
Doing what you proposed is tricky, because it requires that you know that no references to a term exist anywhere else, which imposes a cost on any attempt to copy a reference into an environment. Moreover, it interacts rather poorly with laziness. You need to attach little affine addenda to the thunk you conspicuously didn't evaluate yet. But to do this in a multithreaded environment is scarily race prone and hard to get right.
Well, how exactly should the compiler see that "the old vector is not used anywhere"? Say we have a function that changes a vector:
changeIt :: Vector Int -> Int -> Vector Int
changeIt vec n = vec // [(0,n)]
Just from this definition, the compiler cannot assume that vec represents the only reference to the vector in question. We would have to annotate the function so it can only be used in this way - which Haskell doesn't support (but Clean does, as far as I know).
So what can we do in Haskell? Let us say we have another silly function:
changeItTwice vec n = changeIt (changeIt vec n) (n+1)
Now GHC could inline changeIt, and indeed "see" that no reference to the intermediate structure escapes. But typically, you would use this information to not produce that intermediate data structure, instead directly generating the end result!
This is a pretty common optimization (for lists, there is fusion, for example) - and I think it plays pretty much exactly the role you have in mind: Limiting the number of times a data structure needs to be copied. Whether or not this approach is more flexible than in-place-updates is up for debate, but you can definitely recover a lot of performance without having to break abstraction by annotating uniqueness properties.
(However, I think that Vector currently does not, in fact, perform this specific optimization. Might need a few more optimizer rules...)
IMHO this is certainly impossible as the GHC garbage collector may go havoc if you randomly change an object (even if it is not used anymore). That is because the object may be moved into an older generation and mutation could introduce pointers to a younger generation. If now the younger generation gets garbage collected, the object may move and thus the pointer may become invalid.
AFAIK, all mutable objects in Haskell are located on a special heap that is treated differently by the GC, so that such problems can't occur.
Not necessarily. Data.Vector uses stream fusion, so depending on your use the vector may not be created at all and the program may compile to an efficient constant space loop.
This mostly applies to operations that transform the entire vector rather than just updating a single cell, though.

Lock-free programming in Haskell

Does anyone know if it is possible to do lock-free programming in Haskell? I'm interested both in the question of whether the appropriate low-level primitives are available, and (if they are) on any information on what works in terms of using these to build working larger-scale systems in the pure functional context. (I've never done lock-free programming in a pure functional context before.) For instance, as I understand it the Control.Concurrent.Chan channels are built on top of MVars, which (as I understand things) use locks---could one in principle build versions of the Chan primitive which are lock free internally? How much performance gain might one hope to get?
I shoudl also say that I'm familiar with the existence of TVars, but don't understand their internal implementation---I've been given to understand that they are mostly lock free, but I'm not sure if they're entirely lock free. So any information on the internal implementation of TVars would also be helpful!
(This thread provides some discussion, but I wonder if there's anything more up to date/more comprehensive.)
Not only does an MVar use locks, it is a lock abstraction. And, as I recall, individual STM primitives are optimistic, but there are locks used in various places in the STM implementation. Just remember the handy rhyme: "If it can block, then beware of locks".
For real lock-free programming you want to use IORefs directly, along with atomicModifyIORef.
Edit: regarding black holes, as I recall the implementation is lock free, but I can't vouch for the details. The mechanism is described in "Runtime Support for Multicore Haskell": http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel/multicore-ghc.pdf
But that implementation underwent some tweaks, I think, as described in Simon Marlow's 2010 Haskell Implementors Workshop talk "Scheduling Lazy Evaluation on Multicore": http://haskell.org/haskellwiki/HaskellImplementorsWorkshop/2010. The slides are unfortunately offline, but the video should still work.
Lock free programming is trivial in Haskell. The easiest way to have a shared piece of data that needs to be modified by many threads is to start with any normal haskell type (list, Map, Maybe, whatever you need), and place it in an IORef. Once you've done this, you have the ability to use atomicModifyIORef to perform modifications in place, which are guaranteed to take next to no time.
type MyDataStructure = [Int]
type ConcMyData = IORef MyDataStructure
main = do
sharedData <- newIORef []
...
atomicModifyIORef sharedData (\xs -> (1:xs,()))
The reason this works is that a pointer to the think that will eventually evaluate the result inside the IORef is stored, and whenever a thread reads from the IORef, they get the thunk, and evaluate as much of the structure as it needs. Since all threads could read this same thunk, it will only be evaluated once (and if it's evaluated more than once, it's guaranteed to always end up with the same result, so concurrent evaluations are ok). I believe this is correct, I'm happy to be corrected though.
The take home message from this is that this sort of abstraction is only easily implemented in a pure language, where the value of things never change (except of course when they do, with types like IORef, MVars, and the STM types). The copy on write nature of Haskell's data structures means that modified structures can share a lot of data with the original structure, while only allocating anything that's new to the structure.
I don't think i've done a very good explaining how this works, but I'll come back tomorrow and clarify my answer.
For more information, see the slides for the talk Multicore programming in Haskell by Simon Marlow of Microsoft Research (and one of the main GHC implementors).
Look into stm, specifically its TChan type.

Resources