I have a list of key-value pairs and I want to count how many times each key occurs and what values it occurs with, but when I try, I get a stack overflow. Here's a simplified version of the code I'm running:
import Array
add (n, vals) val = n `seq` vals `seq` (n+1,val:vals)
histo = accumArray add (0,[]) (0,9) [(0, n) | n <- [0..5000000]]
main = print histo
When I compile this with 'ghc -O' and run it, I get "Stack space overflow: current size 8388608 bytes."
I think I know what's going on: accumArray has the same properties as foldl, and so I need a strict version of accumArray. Unfortunately, the only one I've found is in Data.Array.Unboxed, which doesn't work for an array of lists.
The documentation says that when the accumulating function is strict, then accumArray should be too, but I can't get this to work, and the discussion here claims that the documentation is wrong (at least for GHC).
Is there a strict version of accumArray other than the one in Data.Array.Unboxed? Or is there a better way to do what I want?
Well, strict doesn't necessarily mean that no thunks are created, it just means that if an argument is bottom, the result is bottom too. But accumArray is not that strict, it just writes bottoms to the array if they occur. It can't really do anything else, since it must allow for non-strict functions that could produce defined values from intermediate bottoms. And the strictness analyser can't rewrite it so that the accumulation function is evaluated to WHNF on each write if it is strict, because that would change the semantics of the programme in a rather drastic way (an array containing some bottoms vs. bottom).
That said, I agree that there's an unfortunate lack of strict and eager functions in several areas.
For your problem, you can use a larger stack (+RTS -K128M didn't suffice here, but 256M did), or you can use
import Data.Array.Base (unsafeRead, unsafeWrite)
import Data.Array.ST
import GHC.Arr
strictAccumArray :: Ix i => (e -> a -> e) -> e -> (i,i) -> [(i,a)] -> Array i e
strictAccumArray fun ini (l,u) ies = case iar of
Array _ _ m barr -> Array l u m barr
where
iar = runSTArray $ do
let n = safeRangeSize (l,u)
stuff = [(safeIndex (l,u) n i, e) | (i, e) <- ies]
arr <- newArray (0,n-1) ini
let go ((i,v):ivs) = do
old <- unsafeRead arr i
unsafeWrite arr i $! fun old v
go ivs
go [] = return arr
go stuff
With a strict write, the thunks are kept small, so there's no stack overflow. But beware, the lists take a lot of space, so if your list is too long, you may get a heap exhaustion.
Another option would be to use a Data.Map (or Data.IntMap, if the version of containers is 0.4.1.0 or later) instead of an array, since that comes with insertWith', which forces the result of the combining function on use. The code could for example be
import qualified Data.Map as M -- or Data.IntMap
import Data.List (foldl')
histo :: M.Map Int (Int,[Int]) -- M.IntMap (Int,[Int])
histo = foldl' upd M.empty [(0,n) | n <- [0 .. 15000000]]
where
upd mp (i,n) = M.insertWith' add i (1,[n]) mp
add (j,val:_) (k,vals) = k `seq` vals `seq` (k+j,val:vals)
add _ pr = pr -- to avoid non-exhaustive pattern warning
Disadvantages of using a Map are
the combining function must have type a -> a -> a, so it needs to be a bit more complicated in your case.
an update is O(log size) instead of O(1), so for large histograms, it will be considerably slower.
Maps and IntMaps have some book-keeping overhead, so that will use more space than an array. But if the list of updates is large compared to the number of indices, the difference will be negligible (the overhead is k words per key, independent of the size of the values) in this case, where the size of the values grows with each update.
Related
The below two code are taken from the RWH book's concurrency chapter:
force :: [a] -> ()
force xs = go xs `pseq` ()
where go (_:xs) = go xs
go [] = 1
randomInts :: Int -> StdGen -> [Int]
randomInts k g = let result = take k (randoms g)
in force result `seq` result
randomInts is a function for generating list of random number for testing the performance of parallel sorting algorithm. It has been mentioned in the book that they have avoided some potential problem in the above code. This is what has been said in the book:
Invisible data dependencies.
When we generate the list of random numbers, simply printing the
length of the list would not perform enough evaluation. This wouls
evaluate the spine of the list, but not its elements. The actual
random numbers would not be evaluated until the sort compares them.
This can have serious consequences for performance. The value of a
random number depends on the value of the preceding random number in
the list, but we have scattered the list elements randomly among our
processor cores. If we did not evaluate the list elements prior to
sorting, we would suffer a terrible “ping pong” effect: not only would
evaluation bounce from one core to another, performance would suffer.
Try snipping out the application of force from the body of main above:
you should find that the parallel code can easily end up three times
slower than the non-parallel code.
So basically they are saying that by using the force function they have avoided the ping-pong problem. But again during the explanation of the force function, they describe it like this:
Notice that we don't care what's in the list; we walk down its spine
to the end, then use pseq once. There is clearly no magic involved
here: we are just using our usual understanding of Haskell's
evaluation model. And because we will be using force on the left hand
side of par or pseq, we don't need to return a meaningful value.
As seen from the definition of the force function and the explanation above, the elements in the individual list elements are not evaluated. So how does the randomInts function is actually avoiding the ping-pong effect. Is this an error in the book or am I understanding something wrong ?
randomInts actually doesn't seem to suffer from ping-pong effect. The function force is actually not only traversing the entire spline of the list, but also evaluating the elements of the list.
import Control.Parallel (par, pseq)
force :: [a] -> ()
force xs = go xs `pseq` ()
where go (_:xs) = go xs
go [] = 1
In ghci:
ghci > let a = [1..10]
ghci > :sprint a
a = _
ghci > force a
()
ghci > :sprint a
a = [1,2,3,4,5,6,7,8,9,10]
So the force function fully evaluates the list, saving it from the ping-pong effect.
I try to parse large log files in haskell. I'm using System.IO.Streams but it seems to eat a lot of memory when I fold over the input. Here are two (ugly) examples:
First load 1M Int to memory in a list.
let l = foldl (\aux p -> p:aux) [] [1..1000000]
return (sum l)
Memory consumption is beautiful. Ints eat 3Mb and the list needs 6Mb:
see memory consumption of building list of 1M Int
Then try the same with Stream of ByteStrings. We need an ugly back and forth conversation but I don't think makes any difference
let s = Streams.fromList $ map (B.pack . show) [1..1000000]
l <- s >>=
Streams.map bsToInt >>=
Streams.fold (\aux p -> p:aux) []
return (sum l)
see memory consumption of building a list of Ints from a stream
Why does it need more memory? And it's even worse if I read it from a file. It needs 90Mb
result <- withFileAsInput file load
putStrLn $ "loaded " ++ show result
where load is = do
l <- Streams.lines is >>=
Streams.map bsToInt >>=
Streams.fold (\aux p -> p:aux) []
return (sum l)
My assumption is Streams.fold has some issues. Because the library's built in countInput method doesn't use it. Any idea?
EDIT
after investigation I reduced the question to this: why does this code needs an extra 50Mb?
do
let l = map (Builder.toLazyByteString . intDec ) [1..1000000]
let l2 = map (fst . fromJust . B.readInt) l
return (foldl' (\aux p -> p:aux) [] l2)
without the conversions it only needs 30Mb, with the conversions 90Mb.
In your first example, the foldl (\aux p -> p:aux) [] is redundant. It constructs a list with the same elements as the list it takes as an argument! Without the redundancy, the example is equivalent to sum [1..1000000] or foldl (+) 0 [1..1000000]. Also, it would be better to use the strict left fold foldl' to avoid the accumulation of reducible expressions on the heap. See Foldr Foldl Foldl' on the Haskell wiki.
In your last example, you are using System.IO.Streams.Combinators.fold for building a list of all the integers which are read from the file, and then try to sum the list like you did in your first example.
The problem is that, because of the sequencing of file read operations imposed by the IO monad, all the data in the file has been read before you start summing the list, and is lurking on the heap, possibly still untransformed from the original Strings and taking even more memory.
The solution is to perform the actual sum inside the fold as each new element arrives; that way you don't need to have the full list in memory at any time, only the current element (being able to do this while performing I/O is one of the aims of streaming libraries). And the fold provided by io-streams is strict, analogous to foldl'. So you don't accumulate reducible expressions on the heap, either.
Try something like System.IO.Streams.Combinators.fold (+) 0.
So the problem was the lazy creation of ByteStrings and not with the iterator.
See
Why creating and disposing temporal ByteStrings eats up my memory in Haskell?
In an attempt to learn Haskell, I have come across a situation in which I wish to do a fold over a list but my accumulator is a Maybe. The function I'm folding with however takes in the "extracted" value in the Maybe and if one fails they all fail. I have a solution I find kludgy, but knowing as little Haskell as I do, I believe there should be a better way. Say we have the following toy problem: we want to sum a list, but fours for some reason are bad, so if we attempt to sum in a four at any time we want to return Nothing. My current solution is as follows:
import Maybe
explodingFourSum :: [Int] -> Maybe Int
explodingFourSum numberList =
foldl explodingFourMonAdd (Just 0) numberList
where explodingFourMonAdd =
(\x y -> if isNothing x
then Nothing
else explodingFourAdd (fromJust x) y)
explodingFourAdd :: Int -> Int -> Maybe Int
explodingFourAdd _ 4 = Nothing
explodingFourAdd x y = Just(x + y)
So basically, is there a way to clean up, or eliminate, the lambda in the explodingFourMonAdd using some kind of Monad fold? Or somehow currying in the >>=
operator so that the fold behaves like a list of functions chained by >>=?
I think you can use foldM
explodingFourSum numberList = foldM explodingFourAdd 0 numberList
This lets you get rid of the extra lambda and that (Just 0) in the beggining.
BTW, check out hoogle to search around for functions you don't really remember the name for.
So basically, is there a way to clean up, or eliminate, the lambda in the explodingFourMonAdd using some kind of Monad fold?
Yapp. In Control.Monad there's the foldM function, which is exactly what you want here. So you can replace your call to foldl with foldM explodingFourAdd 0 numberList.
You can exploit the fact, that Maybe is a monad. The function sequence :: [m a] -> m [a] has the following effect, if m is Maybe: If all elements in the list are Just x for some x, the result is a list of all those justs. Otherwise, the result is Nothing.
So you first decide for all elements, whether it is a failure. For instance, take your example:
foursToNothing :: [Int] -> [Maybe Int]
foursToNothing = map go where
go 4 = Nothing
go x = Just x
Then you run sequence and fmap the fold:
explodingFourSum = fmap (foldl' (+) 0) . sequence . foursToNothing
Of course you have to adapt this to your specific case.
Here's another possibility not mentioned by other people. You can separately check for fours and do the sum:
import Control.Monad
explodingFourSum xs = guard (all (/=4) xs) >> return (sum xs)
That's the entire source. This solution is beautiful in a lot of ways: it reuses a lot of already-written code, and it nicely expresses the two important facts about the function (whereas the other solutions posted here mix those two facts up together).
Of course, there is at least one good reason not to use this implementation, as well. The other solutions mentioned here traverse the input list only once; this interacts nicely with the garbage collector, allowing only small portions of the list to be in memory at any given time. This solution, on the other hand, traverses xs twice, which will prevent the garbage collector from collecting the list during the first pass.
You can solve your toy example that way, too:
import Data.Traversable
explodingFour 4 = Nothing
explodingFour x = Just x
explodingFourSum = fmap sum . traverse explodingFour
Of course this works only because one value is enough to know when the calculation fails. If the failure condition depends on both values x and y in explodingFourSum, you need to use foldM.
BTW: A fancy way to write explodingFour would be
import Control.Monad
explodingFour x = mfilter (/=4) (Just x)
This trick works for explodingFourAdd as well, but is less readable:
explodingFourAdd x y = Just (x+) `ap` mfilter (/=4) (Just y)
I have the following code:
{-# NOINLINE i2i #-}
i2i :: Int -> Integer
i2i x = toInteger x
main = print $ i2i 2
Running GHC with -ddump-simpl flag gives:
[Arity 1
NoCafRefs
Str: DmdType U(L)]
Main.i2i = GHC.Real.toInteger1
Seems that conversion from Int to Integer is lazy. Why is it so - is there a case when I can have
(toInteger _|_ ::Int) /= _|_
?
Edit: the question has more to do with GHC strictness analyzer, than with laziness per se. This code was derived from exploring standard mean function:
--mean :: Integer -> Integer -> [Integer] -> Double
mean :: Integer -> Int -> [Integer] -> Double
mean acc n [] = fromIntegral acc / fromIntegral n
mean acc n (x:xs) = mean (acc + x) (n + 1) xs
main = print $ mean 0 0 [1..1000000]
This code runs on O(N) space. When I uncomment first line, space consumption changes to O(1). Seems that it comes down to fromIntegral call, which in turn comes down to toInteger. Strictness analyzer somehow cannot infer that conversion is strict, which seems strange to me.
Response to your edit: the dangers of O(N) space leaks for accumulating parameters are well known, at least to Haskell programmers. What ought to be well known but isn't is that no matter what the language, you should never trust to the optimizer to provide asymptotic guarantees for the space and time behavior of your programs. I don't understand the implications of simple optimizers I've written myself, let alone something hairy like GHC's front end, what with a strictness analyzer, inliner, and all the rest.
As to your two questions,
Why doesn't GHC's strictness analyzer optimize this particular code, when it does optimize very similar code?
Who knows?? (Maybe Simon PJ knows, maybe not.) If you care about performance, you shouldn't be relying on the strictness analyzer. Which brings us to the second, implied question:
How can I avoid O(N) space costs on this function and on every other function that uses accumulating parameters?
By putting strictness annotations on the accumluating parameters that force them to be evaluated at each tail-recursive call.
I think you're looking at this the wrong way. Consider the following, silly fragment of code
let x = [undefined]
let y = map toInteger x
If we evaluate
y == []
we get False, whereas if we evaluate
head y
we get an undefined exception. There's no reason that applying map or comparing y with [] should diverge just because the only element of x is undefined. That's the essence of non-strictness.
As an exercise in Haskell, I'm trying to implement heapsort. The heap is usually implemented as an array in imperative languages, but this would be hugely inefficient in purely functional languages. So I've looked at binary heaps, but everything I found so far describes them from an imperative viewpoint and the algorithms presented are hard to translate to a functional setting. How to efficiently implement a heap in a purely functional language such as Haskell?
Edit: By efficient I mean it should still be in O(n*log n), but it doesn't have to beat a C program. Also, I'd like to use purely functional programming. What else would be the point of doing it in Haskell?
There are a number of Haskell heap implementations in an appendix to Okasaki's Purely Functional Data Structures. (The source code can be downloaded at the link. The book itself is well worth reading.) None of them are binary heaps, per se, but the "leftist" heap is very similar. It has O(log n) insertion, removal, and merge operations. There are also more complicated data structures like skew heaps, binomial heaps, and splay heaps which have better performance.
Jon Fairbairn posted a functional heapsort to the Haskell Cafe mailing list back in 1997:
http://www.mail-archive.com/haskell#haskell.org/msg01788.html
I reproduce it below, reformatted to fit this space. I've also slightly simplified the code of merge_heap.
I'm surprised treefold isn't in the standard prelude since it's so useful. Translated from the version I wrote in Ponder in October 1992 -- Jon Fairbairn
module Treefold where
-- treefold (*) z [a,b,c,d,e,f] = (((a*b)*(c*d))*(e*f))
treefold f zero [] = zero
treefold f zero [x] = x
treefold f zero (a:b:l) = treefold f zero (f a b : pairfold l)
where
pairfold (x:y:rest) = f x y : pairfold rest
pairfold l = l -- here l will have fewer than 2 elements
module Heapsort where
import Treefold
data Heap a = Nil | Node a [Heap a]
heapify x = Node x []
heapsort :: Ord a => [a] -> [a]
heapsort = flatten_heap . merge_heaps . map heapify
where
merge_heaps :: Ord a => [Heap a] -> Heap a
merge_heaps = treefold merge_heap Nil
flatten_heap Nil = []
flatten_heap (Node x heaps) = x:flatten_heap (merge_heaps heaps)
merge_heap heap Nil = heap
merge_heap node_a#(Node a heaps_a) node_b#(Node b heaps_b)
| a < b = Node a (node_b: heaps_a)
| otherwise = Node b (node_a: heaps_b)
You could also use the ST monad, which allows you to write imperative code but expose a purely functional interface safely.
As an exercise in Haskell, I implemented an imperative heapsort with the ST Monad.
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Monad (forM, forM_)
import Control.Monad.ST (ST, runST)
import Data.Array.MArray (newListArray, readArray, writeArray)
import Data.Array.ST (STArray)
import Data.STRef (newSTRef, readSTRef, writeSTRef)
heapSort :: forall a. Ord a => [a] -> [a]
heapSort list = runST $ do
let n = length list
heap <- newListArray (1, n) list :: ST s (STArray s Int a)
heapSizeRef <- newSTRef n
let
heapifyDown pos = do
val <- readArray heap pos
heapSize <- readSTRef heapSizeRef
let children = filter (<= heapSize) [pos*2, pos*2+1]
childrenVals <- forM children $ \i -> do
childVal <- readArray heap i
return (childVal, i)
let (minChildVal, minChildIdx) = minimum childrenVals
if null children || val < minChildVal
then return ()
else do
writeArray heap pos minChildVal
writeArray heap minChildIdx val
heapifyDown minChildIdx
lastParent = n `div` 2
forM_ [lastParent,lastParent-1..1] heapifyDown
forM [n,n-1..1] $ \i -> do
top <- readArray heap 1
val <- readArray heap i
writeArray heap 1 val
writeSTRef heapSizeRef (i-1)
heapifyDown 1
return top
btw I contest that if it's not purely functional then there is no point in doing so in Haskell. I think my toy implementation is much nicer than what one would achieve in C++ with templates, passing around stuff to the inner functions.
And here is a Fibonacci Heap in Haskell:
https://github.com/liuxinyu95/AlgoXY/blob/algoxy/datastruct/heap/other-heaps/src/FibonacciHeap.hs
Here are the pdf file for some other k-ary heaps based on Okasaki's work.
https://github.com/downloads/liuxinyu95/AlgoXY/kheap-en.pdf
Just like in efficient Quicksort algorithms written in Haskell, you need to use monads (state transformers) to do stuff in-place.
Arrays in Haskell aren't as hugely inefficient as you might think, but typical practice in Haskell would probably be to implement this using ordinary data types, like this:
data Heap a = Empty | Heap a (Heap a) (Heap a)
fromList :: Ord a => [a] -> Heap a
toSortedList :: Ord a => Heap a -> [a]
heapSort = toSortedList . fromList
If I were solving this problem, I might start by stuffing the list elements into an array, making it easier to index them for heap creation.
import Data.Array
fromList xs = heapify 0 where
size = length xs
elems = listArray (0, size - 1) xs :: Array Int a
heapify n = ...
If you're using a binary max heap, you might want to keep track of the size of the heap as you remove elements so you can find the bottom right element in O(log N) time. You could also take a look at other types of heaps that aren't typically implemented using arrays, like binomial heaps and fibonacci heaps.
A final note on array performance: in Haskell there's a tradeoff between using static arrays and using mutable arrays. With static arrays, you have to create new copies of the arrays when you change the elements. With mutable arrays, the garbage collector has a hard time keeping different generations of objects separated. Try implementing the heapsort using an STArray and see how you like it.
I tried to port standard binary heap into functional settings. There is an article with described idea: A Functional Approach to Standard Binary Heaps. All the source code listings in the article are in Scala. But it might be ported very easy into any other functional language.
Here is a page containing an ML version of HeapSort. It's quite detailed and should provide a good starting point.
http://flint.cs.yale.edu/cs428/coq/doc/Reference-Manual021.html