How can I emulate pointers in Haskell? - haskell

I am trying to implement Dijkstra's algorithm in Haskell. I have already implemented a binary heap using a tree. In the algorithm neighbours keys of the current vertex should be updated in the heap. How can I emulate pointer to value in the heap in Haskell? How can I have fast access to elements in the heap, while the heap is changing after each operation?

Check out the packages Data.IORef and Data.STRef which give you access to mutable references. Use IORefs if you also need to perform IO, and STRefs if you don't.
However, my suspicion is that you're probably doing it wrong. It is entirely possible to implement Dijkstra's algorithm without mutable state (although you need to be a little careful, as you can easily cause the asymptotic running time to blow up if you are continually recomputing function evaluations that could be cached).

You are probably looking for Data.Vector.Mutable from the vector package, which you might want to combine with the ST or IO monad.

You can use Data.FingerTree.PSQueue which supports an adjust operation to update the heap. In this case you don't need pointers to the values in the heap because you update them through their keys.

Related

In Haskell, how can I freeze/thaw a mutable vector of mutable vectors in O(1)?

I'm using MVector's from the vector library to represent some adjacency list for a graph. This question requires good knowledge of the vector package.
I want some of my graph functions to take as input a mutable graph, because they might change it. And some other functions should take as input an immutable graph because they are not meant to change it.
Until now each graph function I have is taking a mutable graph as input. This is not nice.
I like how unsafeFreeze from here https://hackage.haskell.org/package/vector-0.12.1.2/docs/Data-Vector-Generic.html allows to get an immutable version of a vector in O(1).
I believe that under the hood it really does nothing else than casting the type.
Essentially I want to be able to "unsafe" freeze frequently any complex mutable structure at zero cost, so that I can pass it to functions where an immutable argument is required. Just like in any common programming language. Despite the "unsafe" prefix, I feel this is not unsafe practice if the frozen structure is discarded right after its use as an immutable argument. Garbage collection should not happen as the mutable version is not discarded, and frozen version should point to the same memory representation as the mutable version.
What I'm having trouble with is freezing a graph seen as a mutable vector of mutable vectors. The only way I figured out so far, is to generate a new immutable version of the main vector with unsafeFreeze called on each mutable subvector. But that is O(n) in the length of the main vector. This is not acceptable.
To clarify, unsafe freezing the main vector alone, is not enough because it gets me a frozen vector of mutable vectors. In theory there should be a way to freeze everything at no cost, because internal memory representation should not change at all. It should be nothing but a cast.
Is there a way I can freeze a mutable vector of mutable vectors in (very cheap) O(1) ?
I am aware that this question is just very specialized on the vector package. IF you have alternative package suggestions for mutable vectors I'm happy to hear them.
Please don't answer "Use immutable structures. Period.". My opinion is that graphs, immutability, and performance, often don't go well together.
Thanks,
You can't. Sorry. Freezing is a one-array-at-a-time operation, period. What you might be able to do is build yourself an API that represents immutable structures using mutable structures. Depending on how you're using these, you may encounter some performance degradation in GC.
Another option is to use both unsafeFreeze and unsafeThaw. See for example the way HashMaps are constructed from lists in unorderedContainers. You have to be careful with this, because the type system won't be much help remembering what's going to be mutated and what won't be.

Suitable Haskell type for large, frequently changing sequence of floats

I have to pick a type for a sequence of floats with 16K elements. The values will be updated frequently, potentially many times a second.
I've read the wiki page on arrays. Here are the conclusions I've drawn so far. (Please correct me if any of them are mistaken.)
IArrays would be unacceptably slow in this case, because they'd be copied on every change. With 16K floats in the array, that's 64KB of memory copied each time.
IOArrays could do the trick, as they can be modified without copying all the data. In my particular use case, doing all updates in the IO monad isn't a problem at all. But they're boxed, which means extra overhead, and that could add up with 16K elements.
IOUArrays seem like the perfect fit. Like IOArrays, they don't require a full copy on each change. But unlike IOArrays, they're unboxed, meaning they're basically the Haskell equivalent of a C array of floats. I realize they're strict. But I don't see that being an issue, because my application would never need to access anything less than the entire array.
Am I right to look to IOUArrays for this?
Also, suppose I later want to read or write the array from multiple threads. Will I have backed myself into a corner with IOUArrays? Or is the choice of IOUArrays totally orthogonal to the problem of concurrency? (I'm not yet familiar with the concurrency primitives in Haskell and how they interact with the IO monad.)
A good rule of thumb is that you should almost always use the vector library instead of arrays. In this case, you can use mutable vectors from the Data.Vector.Mutable module.
The key operations you'll want are read and write which let you mutably read from and write to the mutable vector.
You'll want to benchmark of course (with criterion) or you might be interested in browsing some benchmarks I did e.g. here (if that link works for you; broken for me).
The vector library is a nice interface (crazy understatement) over GHC's more primitive array types which you can get to more directly in the primitive package. As are the things in the standard array package; for instance an IOUArray is essentially a MutableByteArray#.
Unboxed mutable arrays are usually going to be the fastest, but you should compare them in your application to IOArray or the vector equivalent.
My advice would be:
if you probably don't need concurrency first try a mutable unboxed Vector as Gabriel suggests
if you know you will want concurrent updates (and feel a little brave) then first try a MutableArray and then do atomic updates with these functions from the atomic-primops library. If you want fine-grained locking, this is your best choice. Of course concurrent reads will work fine on whatever array you choose.
It should also be theoretically possible to do concurrent updates on a MutableByteArray (equivalent to IOUArray) with those atomic-primops functions too, since a Float should always fit into a word (I think), but you'd have to do some research (or bug Ryan).
Also be aware of potential memory reordering issues when doing concurrency with the atomic-primops stuff, and help convince yourself with lots of tests; this is somewhat uncharted territory.

How can I obtain constant time access (like in an array) in a data structure in Haskell?

I'll get straight to it - is there a way to have a dynamically sized constant-time access data-structure in Haskell, much like an array in any other imperative language?
I'm sure there is a module somewhere that does this for us magically, but I'm hoping for a general explanation of how one would do this in a functional manner :)
As far as I'm aware, Map uses a binary tree representation so it has O(log(n)) access time, and lists of course have O(n) access time.
Additionally, if we made it so that it was immutable, it would be pure, right?
Any ideas how I could go about this (beyond something like Array = Array { one :: Int, two :: Int, three :: Int ...} in template Haskell or the like)?
If your key is isomorphic to Int then you can use IntMap as most of its operations are O(min(n,W)), where n is the number of elements and W is the number of bits in Int (usually 32 or 64), which means that as the collection gets large the cost of each individual operation converges to a constant.
a dynamically sized constant-time access data-structure in Haskell,
Data.Array
Data.Vector
etc etc.
For associative structures you can choose between:
Log-N tree and trie structures
Hash tables
Mixed hash mapped tries
With various different log-complexities and constant factors.
All of these are on hackage.
In addition to the other good answers, it might be useful to say that:
When restricted to Algebraic Data Types and purity, all dynamically
sized data structure must have at least logarithmic worst-case access
time.
Personally, I like to call this the price of purity.
Haskell offers you three main ways around this:
Change the problem: Use hashes or prefix trees.
For constant-time reads use pure Arrays or the more recent Vectors; they are not ADTs and need compiler support / hidden IO inside. Constant-time writes are not possible since purity forbids the original data structure to be modified.
For constant-time writes use the IO or ST monad, preferring ST when you can to avoid externally visible side effects. These monads are implemented in the compiler.
It's true that you can't have constant time access arrays in Haskell without compiler/runtime magic.
However, this isn't (just) because Haskell is functional. Arrays in Java and C# also require runtime magic. In Rust you might be able to implement them in unsafe code, but not in safe Rust.
The truth is any language that doesn't allow you to allocate memory of dynamic size, or that doesn't allow you to use pointers is going to require runtime magic to implement arrays.
That excludes any safe language, whether object oriented, or functional.
The only difference between Haskell and eg. Java with respect to Arrays, is that arrays are far less useful in Haskell than in Java, but in Java arrays are so core to everything we do that we don't even notice that they're magic.
There is one way though that Haskell requires more magic for arrays than eg. Java.
With Java you can initialise an empty array (which requires magic) and then fill it up with values (which doesn't).
With Haskell this would obviously go against immutability. So any array would have to be initialised with its values. Thus the compiler magic doesn't just stretch to giving you an empty chunk of memory to index into. It also requires giving you a way to initialise the array with values. So creation and initialisation of the array has to be a single step, entirely handled by the compiler.

Is it possible to make fast big circular buffer arrays for stream recording in Haskell?

I'm considering converting a C# app to Haskell as my first "real" Haskell project. However I want to make sure it's a project that makes sense. The app collects data packets from ~15 serial streams that come at around 1 kHz, loads those values into the corresponding circular buffers on my "context" object, each with ~25000 elements, and then at 60 Hz sends those arrays out to OpenGL for waveform display. (Thus it has to be stored as an array, or at least converted to an array every 16 ms). There are also about 70 fields on my context object that I only maintain the current (latest) value, not the stream waveform.
There are several aspects of this project that map well to Haskell, but the thing I worry about is the performance. If for each new datapoint in any of the streams, I'm having to clone the entire context object with 70 fields and 15 25000-element arrays, obviously there's going to be performance issues.
Would I get around this by putting everything in the IO-monad? But then that seems to somewhat defeat the purpose of using Haskell, right? Also all my code in C# is event-driven; is there an idiom for that in Haskell? It seems like adding a listener creates a "side effect" and I'm not sure how exactly that would be done.
Look at this link, under the section "The ST monad":
http://book.realworldhaskell.org/read/advanced-library-design-building-a-bloom-filter.html
Back in the section called “Modifying array elements”, we mentioned
that modifying an immutable array is prohibitively expensive, as it
requires copying the entire array. Using a UArray does not change
this, so what can we do to reduce the cost to bearable levels?
In an imperative language, we would simply modify the elements of the
array in place; this will be our approach in Haskell, too.
Haskell provides a special monad, named ST, which lets us work
safely with mutable state. Compared to the State monad, it has some
powerful added capabilities.
We can thaw an immutable array to give a mutable array; modify the
mutable array in place; and freeze a new immutable array when we are
done.
...
The IO monad also provides these capabilities. The major difference between the two is that the ST monad is intentionally designed so that we can escape from it back into pure Haskell code.
So should be possible to modify in-place, and it won't defeat the purpose of using Haskell after all.
Yes, you would probably want to use the IO monad for mutable data. I don't believe the ST monad is a good fit for this problem space because the data updates are interleaved with actual IO actions (reading input streams). As you would need to perform the IO within ST by using unsafeIOToST, I find it preferable to just use IO directly. The other approach with ST is to continually thaw and freeze an array; this is messy because you need to guarantee that old copies of the data are never used.
Although evidence shows that a pure solution (in the form of Data.Sequence.Seq) is often faster than using mutable data, given your requirement that data be pushed out to OpenGL, you'll possible get better performance from working with the array directly. I would use the functions from Data.Vector.Storable.Mutable (from the vector package), as then you have access to the ForeignPtr for export.
You can look at arrows (Yampa) for one very common approach to event-driven code. Another area is Functional Reactivity (FRP). There are starting to be some reasonably mature libraries in this domain, such as Netwire or reactive-banana. I don't know if they'd provide adequate performance for your requirements though; I've mostly used them for gui-type programming.

Haskell: Pure data structure that efficiently models a block of memory?

I'm interested in writing a virtual machine that works with a block of memory. I'd like to model the block of memory (say 1MB) with a pure data structure that's still efficient to reads and writes anywhere within the block. Not very interested in a mutable structure. Does such a structure exist?
The vector package offers immutable (and mutable) boxed and unboxed vectors, with all the time complexities you'd expect. The mutable ones are usable from both IO and ST, and you can have an unboxed array of any instance of Storable. It's a lot nicer than the standard array modules.
However, since you mentioned efficient immutable updates, I would suggest using a Map-like data structure; perhaps a HashMap from unordered-containers. It might even be worthwhile to have a map with small unboxed vectors at the leaves, to avoid some tree overhead.
Depending on your use-case, you might also be interested in the standard Data.Sequence, which has O(1) access to the beginning and end, and pretty good access times into the middle of the sequence.
Is Data.Array.ST good enough for you?

Resources