Effective way to build a (Lazy) Map in Haskell

Effective way to build a (Lazy) Map in Haskell - haskell

I know a few different a way to build a Map in Haskell :
build it from a list using fromList
build it from a sorted list using fromAscList
Use the fact that Map is a Monoid (or a Semigroup) and concat singletons.
I understand that the amortized complexity of #1 is O(n*log(n)) where as #2 is O(n).
I Guess #3 should be roughly equivalent to #1 and might be subject to fusion.
The amortized is important, because Haskell being lazy by default, even though the lookup from a Map is O(log(n)), it can be in practice interleaved with the construction of the Map itself which is O(n * log(n)), which can make in practice the lookup being O(n * log(n)) (especially if you are building the map each time you need it). This also might append if you use hardcoded Map
For example, am I right to think that lookup 'b' (fromList [('a', 1), ('b', 2)]) is actually equivalent to just to d lookup in the list without using a intermediate Map ?
So is there a difference between #1 and #3 , or sorting the list and the calling fromList ?
Update
Also, If I need a map to be only computed once, do I need to make sure GHC doesn't inline it, so it is shared between functions ?
Use case
I realized that the question might be a bit blur and in fact correspond to different use cased I encountered recently.
The first one correspond to a "static join". I have an app which manages items and each item code can be split in style and variation (For example 'T-Shirt-Red' => ('T-Shirt', 'Red'). The split is based on rules (and regexp) and is quite slow. To not recompute the rules all the times, this is done once and stored in a db table. I have a few pure function which need to be able to split an item code so I pass them a function Text -> (Text, Text). The function is actually a lookup partially applied on a Map. The code is similar to this
getSplitter :: Handler (Text -> (Text, Text))
getSplitter = do
sku'style'vars <- runDB $ rawSQL "SELECT sku, style, var FROM style_cache " [] -- load the split table
let sku'map = fromList [ (sku, (style,var))
| (sku, style, var) <- sku'style'vars
]
return $ flip lookup sku'map
This one can be easily speed up by sorting the item by sku and using fromDistinctAscList (which is actually faster than fromAscList). However I still have some issue about how to cache it between different request.
The second case is to do a manual join between two tables, I usually do something along
do
sku'infos <- selectList [] [] -- load item info
let skuInfo = fromList sku'infos
orderLines <- selectList [] [] -- load orders
-- find infos corresponding to order items
mapM (\o -> (o, lookup (orderSku o) skuInfo) orderLines
There again I can sort sku'infos in SQL and use fromDistinctAscList.
A third case is when fetching miscellaneous info related to an category item in different table.
For example I want to be able to compare the sales (sales table) and the purchase (purchases table) by category.
In pure SQL I would do something along
SELECT style, sum(sales.amount), sum(purchase.amount)
FROM style_cache
LEFT JOIN sales USING(sku)
LEFT JOIN purchases USING(sku)
GROUP by style
However, this is a simplified example and in practice, the aggregation is much more complicated and have to be done in Haskell as well as the join. To do so I'm loading each table separately (grouping what I can in sql) and return a Map Style SalesInfo, Map Style PurchaseInfo etc ... and merge them. The table are quite big and I realize I end up loading everything in memory whereas I could probably by "zipping" things manually be much more efficient but I'm not sure how.

I'm not sure I understood the entire motivation behind this question, but I'll make a few comments:
Map is spine-strict -- which means the tree structure of a Map and the keys themselves are forced (at least far enough to do all the requisite comparisons) on every Map operation. So I would expect Data.Map.lookup k (fromList xs) to take O(n*log(n)) comparisons (n the length of xs) whereas I would expect Prelude.lookup k xs to take O(n) comparisons (actually just equality checks, but usually that's pretty much the same complexity as a comparison).
If fromAscList . sort is reliably faster than fromList, this is a performance bug in Data.Map and the library should just be changed to define fromList = fromAscList . sort. I would be very surprised if this were the case. People have spent a fair bit of time optimizing containers, so I wouldn't expect to see any fruit hanging as low as that.
Yes, inlining breaks sharing.

Related

"List" with efficient insert/delete

I am looking for a "List" with log(N) insert/delete at index i. I put the word "List" in quotes, because I don't mean it to be an actual Haskell List, but just any ordered container with a bunch of objects (in fact, it probably needs some sort of tree internally). I am surprised that I haven't found a great solution yet....
This is the best solution I have so far- Use Data.Sequence with these two functions.
doInsert position value seq = before >< ( value <| after)
where
(before, after) = splitAt position seq
doDelete position seq = before >< (drop 1 after)
where
(before, after) = splitAt position seq
While these are technically all log(N) functions, it feels like I am doing a bunch of extra stuff for each insert/delete.... In other words, this scales like K*log(N) for a K that is larger than it should be. Plus, since I have to define this stuff myself, I feel like I am using Sequence for something it wasn't designed for.
Is there a better way?
This is a related question (although it only deals with Sequences, I would happily use anything else): Why doesn't Data.Sequence have `insert' or `insertBy', and how do I efficiently implement them?
And yes, this question is related to this other one I posted a few days ago: Massive number of XML edits

There are structures such as catenable seqs, catenable queues, etc. that can give you O(1) join. However, all such structures I know of get away with this by giving you O(i) split. If you want split and join both to be as optimal as possible, I think a fingertree (aka Sequence) is your best bet. The entire point of the way Sequence is structured is to ensure that splitting, joining, etc. have an asymptotically not terrible amount of juggling to do when splitting and joining, among other things.
If you could get away with an IntMap which started sparse and got denser, and was occasionally "rebuilt" when things got too dense, then that might give you better performance -- but that really depends on your patterns of use.

If you work carefully with predefined Haskell Lists, there should be no problem with them.
(For example when concatenating two lists).
If you'd like to find an implementation for lists, with efficient insertion and deletion, an AVLTree or any kind or balanced binary tree would work.
For example store in an AVLTree a tuple (Int, a) where Int is the index of the list and a the elem.
Therefore, on average complexity, the operations would be logarithmic for insertion and deletion.
I hope this answers your question.

Haskell: `Map (a,b) c` versus `Map a (Map b c)`?

Thinking of maps as representations of finite functions, a map of two or more variables can be given either in curried or uncurried form; that is, the types Map (a,b) c and Map a (Map b c) are isomorphic, or something close to it.
What practical considerations are there — efficiency, etc — for choosing between the two representations?

The Ord instance of tuples uses lexicographic order, so Map (a, b) c is going to sort by a first anyway, so the overall order will be the same. Regarding practical considerations:
Because Data.Map is a binary search tree splitting at a key is comparable to a lookup, so getting a submap for a given a in the uncurried form won't be significantly more expensive than in the curried form.
The curried form may produce a less balanced tree overall, for the obvious reason of having multiple trees instead of just one.
The curried form will have a bit of extra overhead to store the nested maps.
The nested maps of the curried form representing "partial applications" can be shared if some a values produce the same result.
Similarly, "partial application" of the curried form gives you the existing inner map, while the uncurried form must construct a new map.
So the uncurried form is clearly better in general, but the curried form may be better if you expect to do "partial application" often and would benefit from sharing of Map b c values.
Note that some care will be necessary to ensure you actually benefit from that potential sharing; you'll need to explicitly define any shared inner maps and reuse the single value when constructing the full map.
Edit: Tikhon Jelvis points out in the comments that the memory overhead of the tuple constructors--which I did not think to account for--is not at all negligible. There is certainly some overhead to the curried form, but that overhead is proportional to how many distinct a values there are. The tuple constructor overhead in the uncurried form, on the other hand, is proportional to the total number of keys.
So if, on average, for any given value of a there are three or more distinct keys using it you'll probably save memory using the curried version. The concerns about unbalanced trees still apply, of course. The more I think about it, the more I suspect the curried form is unequivocally better except perhaps if your keys are very sparse and unevenly distributed.
Note that because arity of definitions does matter to GHC, the same care is required when defining functions if you want subexpressions to be shared; this is one reason you sometimes see functions defined in a style like this:
foo x = go
where z = expensiveComputation x
go y = doStuff y z

Tuples are lazy in both elements, so the tuple version introduces a little extra laziness. Whether this is good or bad strongly depends on your usage. (In particular, comparisons may force the tuple elements, but only if there are lots of duplicate a values.)
Beyond that, I think it's going to depend on how many duplicates you have. If a is almost always different whenever b is, you're going to have a lot of small trees, so the tuple version might be better. On the other hand, if the opposite is true, the non-tuple version may save you a little time (not constantly recomparing a once you've found the appropriate subtree and you're looking for b).
I'm reminded of tries, and how they store common prefixes once. The non-tuple version seems to be a bit like that. A trie can be more efficient than a BST if there's lots of common prefixes, and less efficient if there aren't.
But the bottom line: benchmark it!! ;-)

Apart from the efficiency aspects, there's also a pragmatic side to this question: what do you want to do with this structure?
Do you, for instance, want to be able to store an empty map for a given value of type a? If so, then the uncurried version might be more practical!
Here's a simple example: let's say we want to store String-valued properties of persons - say the value of some fields on that person's stackoverflow profile page.
type Person = String
type Property = String
uncurriedMap :: Map Person (Map Property String)
uncurriedMap = fromList [
("yatima2975", fromList [("location","Utrecht"),("age","37")]),
("PLL", fromList []) ]
curriedMap :: Map (Person,Property) String
curriedMap = fromList [
(("yatima2975","location"), "Utrecht"),
(("yatima2975","age"), "37") ]
With the curried version, there is no nice way to record the fact that user "PLL" is known to the system, but hasn't filled in any information. A person/property pair ("PLL",undefined) is going to cause runtime crashes, since Map is strict in the keys.
You could change the type of curriedMap to Map (Person,Property) (Maybe String) and store Nothings in there, and that might very well be the best solution in this case; but where there's a unknown/varying number of properties (e.g. depending on the kind of Person) that will also run into difficulties.
So, I guess it also depends on whether you need a query function like this:
data QueryResult = PersonUnknown | PropertyUnknownForPerson | Value String
query :: Person -> Property -> Map (Person, Property) String -> QueryResult
This is hard to write (if not impossible) in the curried version, but easy in the uncurried version.

Multiple lookup structures for same data: Memory duplication?

Suppose I have data on a bunch of people, and I want to be able to look them up in different ways. Maybe there's some kind of data structure (like a binary tree) that facilitates lookup by name. And maybe there's another (like a list) that's by order of creation. And perhaps many more.
In many languages, you would have each person allocated exactly once on the heap. Each data structure would contain pointers to that memory. Thus, you're not allocating a new set of people every time you add a new way to look them up.
How about in Haskell? Is there any way to avoid memory duplication when different data structures need to index the same data?

I feel sure there's a deeper, more knowledgeable answer to this question, but for the time being...
Since in a pure functional programming language data is immutable, there's no need to do anything other than copy the pointer instead of copying its target.
As a quick and very dirty example, I fired up the ghci interpreter:
Prelude> let x = replicate 10000 'm' in all (==x) $ replicate 10000 x
True
(1.61 secs, 0 bytes)
I admit that these stats are unreliable, but what it's not doing is allocating memory for all 10000 copies of a list 10000 characters long.
Summary:
The way to avoid memory duplication is to
(a) use haskell
(b) avoid pointlessly reconstructing your data.
How can I pointlessly reconstruct my data?
A very simple and pointless example:
pointlessly_reconstruct_list :: [a] -> [a]
pointlessly_reconstruct_list [] = []
pointlessly_reconstruct_list (x:xs) = x:xs
This kind of thing causes a duplicate of the list structure.
Have you got any examples that are a little less pointless but still simple?
Interestingly, if you do xs ++ ys you essentially reconstruct xs in order to place ys at the end of it (replacing []), so the list structure of xs is nearly copied wholesale. However, there's no need to replicate the actual data, and there certainly only needs to be one copy of ys.

Benefit of avoiding multiple list traversals

I've seen many examples in functional languages about processing a list and constructing a function to do something with its elements after receiving some additional value (usually not present at the time the function was generated), such as:
Calculating the difference between each element and the average
(the last 2 examples under "Lazy Evaluation")
Staging a list append in strict functional languages such as ML/OCaml, to avoid traversing the first list more than once
(the section titled "Staging")
Comparing a list to another with foldr (i.e. generating a function to compare another list to the first)
listEq a b = foldr comb null a b
where comb x frec [] = False
comb x frec (e:es) = x == e && frec es
cmp1To10 = listEq [1..10]
In all these examples, the authors generally remark the benefit of traversing the original list only once. But I can't keep myself from thinking "sure, instead of traversing a list of N elements, you are traversing a chain of N evaluations, so what?". I know there must be some benefit to it, could someone explain it please?
Edit: Thanks to both for the answers. Unfortunately, that's not what I wanted to know. I'll try to clarify my question, so it's not confused with the (more common) one about creating intermediate lists (which I already read about in various places). Also thanks for correcting my post formatting.
I'm interested in the cases where you construct a function to be applied to a list, where you don't yet have the necessary value to evaluate the result (be it a list or not). Then you can't avoid generating references to each list element (even if the list structure is not referenced anymore). And you have the same memory accesses as before, but you don't have to deconstruct the list (pattern matching).
For example, see the "staging" chapter in the mentioned ML book. I've tried it in ML and Racket, more specifically the staged version of "append" which traverses the first list and returns a function to insert the second list at the tail, without traversing the first list many times. Surprisingly for me, it was much faster even considering it still had to copy the list structure as the last pointer was different on each case.
The following is a variant of map which after applied to a list, it should be faster when changing the function. As Haskell is not strict, I would have to force the evaluation of listMap [1..100000] in cachedList (or maybe not, as after the first application it should still be in memory).
listMap = foldr comb (const [])
where comb x rest = \f -> f x : rest f
cachedList = listMap [1..100000]
doubles = cachedList (2*)
squares = cachedList (\x -> x*x)
-- print doubles and squares
-- ...
I know in Haskell it doesn't make a difference (please correct me if I'm wrong) using comb x rest f = ... vs comb x rest = \f -> ..., but I chose this version to emphasize the idea.
Update: after some simple tests, I couldn't find any difference in execution times in Haskell. The question then is only about strict languages such as Scheme (at least the Racket implementation, where I tested it) and ML.

Executing a few extra arithmetic instructions in your loop body is cheaper than executing a few extra memory fetches, basically.
Traversals mean doing lots of memory access, so the less you do, the better. Fusion of traversals reduces memory traffic, and increases the straight line compute load, so you get better performance.
Concretely, consider this program to compute some math on a list:
go :: [Int] -> [Int]
go = map (+2) . map (^3)
Clearly, we design it with two traversals of the list. Between the first and the second traversal, a result is stored in an intermediate data structure. However, it is a lazy structure, so only costs O(1) memory.
Now, the Haskell compiler immediately fuses the two loops into:
go = map ((+2) . (^3))
Why is that? After all, both are O(n) complexity, right?
The difference is in the constant factors.
Considering this abstraction: for each step of the first pipeline we do:
i <- read memory -- cost M
j = i ^ 3 -- cost A
write memory j -- cost M
k <- read memory -- cost M
l = k + 2 -- cost A
write memory l -- cost M
so we pay 4 memory accesses, and 2 arithmetic operations.
For the fused result we have:
i <- read memory -- cost M
j = (i ^ 3) + 2 -- cost 2A
write memory j -- cost M
where A and M are the constant factors for doing math on the ALU and memory access.
There are other constant factors as well (two loop branches) instead of one.
So unless memory access is free (it is not, by a long shot) then the second version is always faster.
Note that compilers that operate on immutable sequences can implement array fusion, the transformation that does this for you. GHC is such a compiler.

There is another very important reason. If you traverse a list only once, and you have no other reference to it, the GC can release the memory claimed by the list elements as you traverse them. Moreover, if the list is generated lazily, you always have only a constant memory consumption. For example
import Data.List
main = do
let xs = [1..10000000]
sum = foldl' (+) 0 xs
len = foldl' (\_ -> (+ 1)) 0 xs
print (sum / len)
computes sum, but needs to keep the reference to xs and the memory it occupies cannot be released, because it is needed to compute len later. (Or vice versa.) So the program consumes a considerable amount of memory, the larger xs the more memory it needs.
However, if we traverse the list only once, it is created lazily and the elements can be GC immediately, so no matter how big the list is, the program takes only O(1) memory.
{-# LANGUAGE BangPatterns #-}
import Data.List
main = do
let xs = [1..10000000]
(sum, len) = foldl' (\(!s,!l) x -> (s + x, l + 1)) (0, 0) xs
print (sum / len)

Sorry in advance for a chatty-style answer.
That's probably obvious, but if we're talking about the performance, you should always verify hypotheses by measuring.
A couple of years ago I was thinking about the operational semantics of GHC, the STG machine. And I asked myself the same question — surely the famous "one-traversal" algorithms are not that great? It only looks like one traversal on the surface, but under the hood you also have this chain-of-thunks structure which is usually quite similar to the original list.
I wrote a few versions (varying in strictness) of the famous RepMin problem — given a tree filled with numbers, generate the tree of the same shape, but replace every number with the minimum of all the numbers. If my memory is right (remember — always verify stuff yourself!), the naive two-traversal algorithm performed much faster than various clever one-traversal algorithms.
I also shared my observations with Simon Marlow (we were both at an FP summer school during that time), and he said that they use this approach in GHC. But not to improve performance, as you might have thought. Instead, he said, for a big AST (such as Haskell's one) writing down all the constructors takes much space (in terms of lines of code), and so they just reduce the amount of code by writing down just one (syntactic) traversal.
Personally I avoid this trick because if you make a mistake, you get a loop which is a very unpleasant thing to debug.

So the answer to your question is, partial compilation. Done ahead of time, it makes it so that there's no need to traverse the list to get to the individual elements - all the references are found in advance and stored inside the pre-compiled function.
As to your concern about the need for that function to be traversed too, it would be true in interpreted languages. But compilation eliminates this problem.
In the presence of laziness this coding trick may lead to the opposite results. Having full equations, e.g. Haskell GHC compiler is able to perform all kinds of optimizations, which essentially eliminate the lists completely and turn the code into an equivalent of loops. This happens when we compile the code with e.g. -O2 switch.
Writing out the partial equations may prevent this compiler optimization and force the actual creation of functions - with drastic slowdown of the resulting code. I tried your cachedList code and saw a 0.01s execution time turn into 0.20s (don't remember right now the exact test I did).

O(1) circular buffer in haskell?

I'm working on a small concept project in Haskell which requires a circular buffer. I've managed to create a buffer using arrays which has O(1) rotation, but of course requires O(N) for insertion/deletion. I've found an implementation using lists which appears to take O(1) for insertion and deletion, but since it maintains a left and right list, crossing a certain border when rotating will take O(N) time. In an imperative language, I could implement a doubly linked circular buffer with O(1) insertion, deletion, and rotation. I'm thinking this isn't possible in a purely functional language like Haskell, but I'd love to know if I'm wrong.

If you can deal with amortized O(1) operations, you could probably use either Data.Sequence from the containers package, or Data.Dequeue from the dequeue package. The former uses finger trees, while the latter uses the "Banker's Dequeue" from Okasaki's Purely Functional Data Structures (a prior version online here).

The ST monad allows to describe and execute imperative algorithms in Haskell. You can use STRefs for the mutable pointers of your doubly linked list.
Self-contained algorithms described using ST are executed using runST. Different runST executions may not share ST data structures (STRef, STArray, ..).
If the algorithm is not "self contained" and the data structure is required to be maintained with IO operations performed in between its uses, you can use stToIO to access it in the IO monad.
Regarding whether this is purely functional or not - I guess it's not?

It sounds like you might need something a bit more complicated than this (since you mentioned doubly-linked lists), but maybe this will help. This function acts like map over a mutable cyclic list:
mapOnCycling f = concat . tail . iterate (map f)
Use like:
*Main> (+1) `mapOnCycling` [3,2,1]
[4,3,2,5,4,3,6,5,4,7,6,5,8,7,6,9,8,7,10,9...]
And here's one that acts like mapAccumL:
mapAccumLOnCycling f acc xs =
let (acc', xs') = mapAccumL f acc xs
in xs' ++ mapAccumLOnCycling f acc' xs'
Anyway, if you care to elaborate even more on what exactly your data structure needs to be able to "do" I would be really interested in hearing about it.
EDIT: as camccann mentioned, you can use Data.Sequence for this, which according to the docs should give you O1 time complexity (is there such a thing as O1 amortized time?) for viewing or adding elements both to the left and right sides of the sequence, as well as modifying the ends along the way. Whether this will have the performance you need, I'm not sure.
You can treat the "current location" as the left end of the Sequence. Here we shuttle back and forth along a sequence, producing an infinite list of values. Sorry if it doesn't compile, I don't have GHC at the moment:
shuttle (viewl-> a <: as) = a : shuttle $ rotate (a+1 <| as)
where rotate | even a = rotateForward
| otherwise = rotateBack
rotateBack (viewr-> as' :> a') = a' <| as'
rotateForward (viewl-> a' <: as') = as' |> a'

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string