When I see Haskell's Vector, that's not a physics, linear algebra vector, is it? Vectors in Java (as I recall) is an array one can dynamically add to. But that's not usable as a science vector, is it? How could I have a science vector data structure in Haskell?
I assume you are referring to Data.Vector, and yes, you're right. It's not the vector the mathematical parts of us know. For a "legit" vector and any other linear algebra on free vector spaces there is the linear package.
Data.Vector is pretty much a memory array (very optimized, very good API).
Data.Vector really is a misleading name, and I assume it's directly or indirectly inspired from C++'s standard library vector class template which is also misleading.
From here, in C++'s standard library...
It's called a vector because Alex Stepanov, the designer of the
Standard Template Library, was looking for a name to distinguish it
from built-in arrays. He admits now that he made a mistake, because
mathematics already uses the term 'vector' for a fixed-length sequence
of numbers.
...
Alex's lesson: be very careful every time you name something.
I guess you want hmatrix, a Haskell interface to LAPACK and BLAS.
Related
Using Haskell, I am doing exercises on HackerRank in order to familiarize myself with the language. For the particular problem I am currently doing, I will have to do a matrix multiply. Unlike in Python where I could just use Numpy, I've checked on Ideone and it seems Haskell does not have any linear algebra packages plugged in, so I am going to do it by hand. If I was doing this problem in F# I would just use a plain array, but in Haskell I am not sure as it has various array classes. I am looking for some advice on what I should be looking into here as I have a total of three days experience in the language so far.
I am also wondering whether tuples are stack or heap allocated in Haskell as I might have to use them to encode (index,value) positions.
To answer my own question, the goto class in Haskell for plain arrays would be the Data.Vector.Unboxed. Haskell has a distinction between boxed and unboxed arrays, and even though I knew that it still somehow surprised me that in a vector of vectors, the outer vector would have to be a boxed type.
Also regarding tuples, per documentation for efficiency a vector of tuples will get compiled as tuple of vectors which definitely means that the elements will get allocated to a contiguous area on the heap.
I'll get straight to it - is there a way to have a dynamically sized constant-time access data-structure in Haskell, much like an array in any other imperative language?
I'm sure there is a module somewhere that does this for us magically, but I'm hoping for a general explanation of how one would do this in a functional manner :)
As far as I'm aware, Map uses a binary tree representation so it has O(log(n)) access time, and lists of course have O(n) access time.
Additionally, if we made it so that it was immutable, it would be pure, right?
Any ideas how I could go about this (beyond something like Array = Array { one :: Int, two :: Int, three :: Int ...} in template Haskell or the like)?
If your key is isomorphic to Int then you can use IntMap as most of its operations are O(min(n,W)), where n is the number of elements and W is the number of bits in Int (usually 32 or 64), which means that as the collection gets large the cost of each individual operation converges to a constant.
a dynamically sized constant-time access data-structure in Haskell,
Data.Array
Data.Vector
etc etc.
For associative structures you can choose between:
Log-N tree and trie structures
Hash tables
Mixed hash mapped tries
With various different log-complexities and constant factors.
All of these are on hackage.
In addition to the other good answers, it might be useful to say that:
When restricted to Algebraic Data Types and purity, all dynamically
sized data structure must have at least logarithmic worst-case access
time.
Personally, I like to call this the price of purity.
Haskell offers you three main ways around this:
Change the problem: Use hashes or prefix trees.
For constant-time reads use pure Arrays or the more recent Vectors; they are not ADTs and need compiler support / hidden IO inside. Constant-time writes are not possible since purity forbids the original data structure to be modified.
For constant-time writes use the IO or ST monad, preferring ST when you can to avoid externally visible side effects. These monads are implemented in the compiler.
It's true that you can't have constant time access arrays in Haskell without compiler/runtime magic.
However, this isn't (just) because Haskell is functional. Arrays in Java and C# also require runtime magic. In Rust you might be able to implement them in unsafe code, but not in safe Rust.
The truth is any language that doesn't allow you to allocate memory of dynamic size, or that doesn't allow you to use pointers is going to require runtime magic to implement arrays.
That excludes any safe language, whether object oriented, or functional.
The only difference between Haskell and eg. Java with respect to Arrays, is that arrays are far less useful in Haskell than in Java, but in Java arrays are so core to everything we do that we don't even notice that they're magic.
There is one way though that Haskell requires more magic for arrays than eg. Java.
With Java you can initialise an empty array (which requires magic) and then fill it up with values (which doesn't).
With Haskell this would obviously go against immutability. So any array would have to be initialised with its values. Thus the compiler magic doesn't just stretch to giving you an empty chunk of memory to index into. It also requires giving you a way to initialise the array with values. So creation and initialisation of the array has to be a single step, entirely handled by the compiler.
A while ago, I ran across an article on FingerTrees (See Also an accompanying Stack Overflow Question) and filed the idea away. I have finally found a reason to make use of them.
My problem is that the Data.FingerTree package seems to have a little bit rot around the edges. Moreover, Data.Sequence in the Containers package which makes use of the data structure re-implements a (possibly better) version, but doesn't export it.
As theoretically useful as this structure seems to be, it doesn't seem to get a lot of actual use or attention. Have people found that FingerTrees are not useful as a practical matter, or is this a case not enough attention?
further explanation:
I'm interested in building a data structure holding text that has good concatenation properties. Think about building an HTML document from assorted fragments. Most pre-built solutions use bytestrings, but I really want something that deals with Unicode text properly. My plan at the moment is to layer Data.Text fragments into a FingerTree.
I would also like to borrow the trick from Data.Vector of taking slices without copying using (offset,length) manipulation. Data.Text.Text has this built in to the data type, but only uses it for efficient uncons and unsnoc opperations. In FingerTree this information could very easily becomes the v or annotation of the tree.
To answer your question about finger trees in particular, I think the problem is that they have relatively high constant costs compared to arrays, and are more complex than other ways of achieving efficient concatenation. A Builder has a more efficient interface for just appending chunks, and they're usually readily available (see the links in #informatikr's answer). Suppose that Data.Text.Lazy is implemented with a linked list of chunks, and you're creating a Data.Text.Lazy from a builder. Unless you have a lot of chunks (probably more than 50), or are accessing data near the end of the list repeatedly, the high constant cost of a finger tree probably isn't worth it.
The Data.Sequence implementation is specialized for performance reasons, and isn't as general as the full interface provided by the fingertree package. That's why it isn't exported; it's not really possible to use it for anything other than a Sequence.
I also suspect that many programmers are at a loss as to how to actually use the monoidal annotation, as it's behind a fairly significant abstraction barrier. So many people wouldn't use it because they don't see how it can be useful compared to other data types.
I didn't really get it until I read Chung-chieh Shan's blog series on word numbers (part2, part3, part4). That's proof that the idea can definitely be used in practical code.
In your case, if you need to both inspect partial results and have efficient appends, using a fingertree may be better than a builder. Depending on the builder's implementation, you may end up doing a lot of repeated work as you convert to Text, add more stuff to the builder, convert to Text again, etc. It would depend on your usage pattern though.
You might be interested in my splaytree package, which provides splay trees with monoidal annotations, and several different structures build upon them. Other than the splay tree itself, the Set and RangeSet modules have more-or-less complete API's, the Sequence module is mostly a skeleton I used for testing. It's not a "batteries included" solution to what you're looking for (again, #informatikr's answer provides those), but if you want to experiment with monoidal annotations it may be more useful than Data.FingerTree. Be aware that a splay tree can get unbalanced if you traverse all the elements in sequence (or continually snoc onto the end, or similar), but if appends and lookups are interleaved performance can be excellent.
In addition to John Lato's answer, I'll add some specific details about the performance of finger trees, since I spent some time looking at that in the past.
The broad summary is:
Data.Sequence has great constant factors and asymptotics: it is almost as fast as [] when accessing the front of the list (where both data structures have O(1) asymptotics), and much faster elsewhere in the list (where Data.Sequence's logarithmic asymptotics trounce []'s linear asymptotics).
Data.FingerTree has the same asymptotics as Data.Sequence, but is about an order of magnitude slower.
Just like lists, finger trees have high per-element memory overheads, so they should be combined with chunking for better memory and cache use. Indeed, a few packages do this (yi, trifecta, rope). If Data.FingerTree could be brought close to Data.Sequence in performance, I would hope to see a Data.Text.Sequence type, which implemented a finger tree of Data.Text values. Such a type would lose the streaming behaviour of Data.Text.Lazy, but benefit from improved random access and concatenation performance. (Similarly, I would want to see Data.ByteString.Sequence and Data.Vector.Sequence.)
The obstacle to implementing these now is that no efficient and generic implementation of finger trees exists (see below where I discuss this further). To produce efficient implementations of Data.Text.Sequence one would have to completely reimplement finger trees, specialised to Text - just as Data.Text.Lazy completely reimplements lists, specialised to Text. Unfortunately, finger trees are much more complex than lists (especially concatenation!), so this is a considerable amount of work.
So as I see it the answer is:
specialised finger trees are great, but a lot of work to implement
chunked finger trees (e.g. Data.Text.Sequence) would be great, but at present the poor performance of Data.FingerTree means they are not a viable alternative to chunked lists in the common case
builders and chunked lists achieve many of the benefits of chunked finger trees, and so they suffice for the common case
in the uncommon case where builders and chunked lists don't suffice, we grit our teeth and put up with the poor constant factors of chunked finger trees (e.g. in yi and trifecta).
Obstacles to an efficient and generic finger tree
Much of the performance gap between Data.Sequence and Data.FingerTree is due to two optimisations in Data.Sequence:
The measure type is specialised to Int, so measure manipulations will compile down to efficient integer arithmetic rather
The measure type is unpacked into the Deep constructor, which saves pointer dereferences in the inner loops of the tree operations.
It is possible to apply these optimisations in the general case of Data.FingerTree by using data families for generic unpacking and by exploiting GHC's inliner and specialiser - see my fingertree-unboxed package, which brings generic finger tree performance almost up to that of Data.Sequence. Unfortunately, these techniques have some significant problems:
data families for generic unpacking is unpleasant for the user, because they have to define lots of instances. There is no clear solution to this problem.
finger trees use polymorphic recursion, which GHC's specialiser doesn't handle well (1, 2). This means that, to get sufficient specialisation on the measure type, we need lots of INLINE pragmas, which causes GHC to generate huge amounts of code.
Due to these problems, I never released the package on Hackage.
Ignoring your Finger Tree question and only responding to your further explanation: did you look into Data.Text.Lazy.Builder or, specifically for building HTML, blaze-html?
Both allow fast concatenation. For slicing, if that is important for solving your problem, they might not have ideal performance.
I'm writing a small tool for generating php checks from javascript code, and I would like to know if anyone knows of a standard way of transforming functional code into imperative code?
I found this paper: Defunctionalization at Work it explains defunctionalization pretty well.
Lambdalifting and defunctionalization somewhat answered the question, but what about datastructures, we are still parsing lists as if they are all linkedlists. Would there be a way of transforming the linkedlists of functional languages into other high-level datastructures like c++ vectors or java arraylists?
Here are a few additions to the list of #Artyom:
you can convert tail recursion into loops and assignments
linear types can be used to introduce assignments, e.g. y = f x can be replaced with x := f x if x is linear and has the same type as y
at least two kinds of defunctionalization are possible: Reynolds-type defunctionalization when you replace a high-order application with a switch full of first-order applications, and inlining (however, recursive functions is not always possible to inline)
Perhaps you are interested in removing some language elements (such as higher-order functions), right?
For eliminating HOFs from a program, there are techniques such as defunctionalization. For removing closures, you can use lambda-lifting (aka closure conversion). Is this something you are interested in?
I think you need to provide a concrete example of code you have, and the target code you intend to produce, so that others may propose solutions.
Added:
Would there be a way of transforming the linkedlists of functional languages into other high-level datastructures like c++ vectors or java arraylists?
Yes. Linked lists are represented with pointers in C++ (a structure "node" with two fields: one for the "payload", another for the "next" pointer; empty list is then represented as a NULL pointer, but sometimes people prefer to use special "sentinel values"). Note that, if the code in the source language does not rely on the representation of singly linked lists (in the source language implementation), you can also implement the "cons"/"nil" operations using a vector in the target language (not sure if this suits your needs, though). The idea here is to give an alternative implementations for the familiar operations.
No, there is not.
The reason is that there is no such concrete and well defined thing like functional code or imperative code.
Such transformations exist only for concrete instances of your abstraction: for example, there are transformations from Haskell code to LLVM bytecode, F# code to CLI bytecode or Frege code to Java code.
(I doubt if there is one from Javascript to PHP.)
Depends on what you need. The usual answer is "there is no such tool", because the result will not be usable. However look at this from this standpoint:
The set of Assembler instructions in a computer defines an imperative machine. Hence the compiler needs to do such a translation. However I assume you do not want to have assembler code but something more readable.
Usually these kinds of heavy program transformations are done manually, if one is interested in the result, or automatically if the result will never be looked at by a human.
I am starting to doubt if my plan of getting into Haskell and functional programming by using Haskell for my next course on algorithms is a good one.
To get some Haskell lines under my belt I started trying to implement some simple algos. First: Gale-Shapley for the Stable Marriage Problem. Having not yet gotten into monads, all that mutable state looks daunting, so instead I used the characterization of stable matchings as fixed-points of a mapping on the lattice of semi-matchings. It was fun, but its no longer Gale-Shapley and the complexity isn't nice (those chains in the lattice can get pretty long apparently :)
Next up I have the algorithm for Closest Pair of points in the plane, but am stuck on getting the usual O(n*log n) complexity because I can't work out how to get a set-like data structure with O(1) checking for membership.
So my question is: Can one in general implement most algorithms eg. Dijkstra, Ford-Fulkerson (Gale-Shapley !?) getting the complexities from procedural implementations if one gets a better command of Haskell and functional programming in general ?
This probably can't be answered in general. A lot of standard algorithms are designed around mutability, and translations exist in some cases, not in others. Sometimes alternate algorithms exist that give equivalent performance characteristics, sometimes you really do need mutability.
A good place to start, if you want understanding of how to approach algorithms in this setting, is Chris Okasaki's book Purely Functional Data Structures. The book is an expanded version of his thesis, which is available online in PDF format.
If you want help with specific algorithms, such as the O(1) membership checking (which is actually misleading--there's no such thing, such data structures usually have something like O(k) where k is the size of elements being stored) you'd be better off asking that as a specific, single question instead of a very general question like this.
Since you have the ST monad in Haskell you can do anything with mutable state at the same speed of an imperative language. To the outside it can have a non-monadic interface.
See for instance Launchbury and Peyton-Jones: "Lazy functional state threads"
http://portal.acm.org/citation.cfm?id=178246
Existence proof for implementing algorithms with mutable data structures. Just recurse over an IO record. In this case, a Game record that holds the relevant variables.