STArray and stack overflow

STArray and stack overflow - haskell

I am struggling to understand why the following attempts to find a minimum element in STArray lead to stack space overflow when compiled with ghc (7.4.1, regardless of -O level), but work fine in ghci:
import Control.Monad
import Control.Monad.ST
import Control.Applicative
import Data.Array.ST
n = 1000 :: Int
minElem = runST $ do
arr <- newArray ((1,1),(n,n)) 0 :: ST s (STArray s (Int,Int) Int)
let ixs = [(i,j) | i <- [1..n], j <- [1..n]]
forM_ ixs $ \(i,j) -> writeArray arr (i,j) (i*j `mod` 7927)
-- readArray arr (34,56) -- this works OK
-- findMin1 arr -- stackoverflows when compiled
findMin2 arr -- stackoverflows when compiled
findMin1 arr = do
es <- getElems arr
return $ minimum es
findMin2 arr = do
e11 <- readArray arr (1,1)
foldM (\m ij -> min m <$> readArray arr ij) e11 ixs
where ixs = [(i,j) | i <- [1..n], j <- [1..n]]
main = print minElem
Note: switching to STUArray or ST.Lazy doesn't seem to have any effect.
The main question though: What would be the proper way to implement such "fold-like" operation over big STArray while inside ST?

That's probably a result of getElems being a bad idea. In this case an array is a bad idea altogether:
minimum (zipWith (\x y -> (x, y, mod (x*y) 7927)) [1..1000] [1..1000])
This one gives you the answer right away: (1, 1, 1).
If you want to use an array anyway I recommend converting the array to an Array or UArray first and then using elems or assocs on that one. This has no additional cost, if you do it using runSTArray or runSTUArray.

The big problem in findMin1 is getElems:
getElems :: (MArray a e m, Ix i) => a i e -> m [e]
getElems marr = do
(_l, _u) <- getBounds marr -- Hmm, why is that there?
n <- getNumElements marr
sequence [unsafeRead marr i | i <- [0 .. n - 1]]
Using sequence on a long list is a common cause for stack overflows in monads whose (>>=) isn't lazy enough, since
sequence ms = foldr k (return []) ms
where
k m m' = do { x <- m; xs <- m'; return (x:xs) }
then it has to build a thunk of size proportional to the length of the list. getElems would work with the Control.Monad.ST.Lazy, but then the filling of the array with
forM_ ixs $ \(i,j) -> writeArray arr (i,j) (i*j `mod` 7927)
creates a huge thunk that overflows the stack. For the strict ST variant, you need to replace getElems with something that builds the list without sequence or - much better - compute the minimum without creating a list of elements at all. For the lazy ST variant, you need to ensure that the filling of the array doesn't build a huge thunk e.g. by forcing the result of the writeArray calls
let fill i j
| i > n = return ()
| j > n = fill (i+1) 1
| otherwise = do
() <- writeArray arr (i,j) $ (i*j `mod` 7927)
fill i (j+1)
() <- fill 1 1
The problem in findMin2 is that
foldM (\m ij -> min m <$> readArray arr ij) e11 ixs
is lazy in m, so it builds a huge thunk to compute the minimum. You can easily fix that by using seq (or a bang-pattern) to make it strict in m.
The main question though: What would be the proper way to implement such "fold-like" operation over big STArray while inside ST?
Usually, you'll use the strict ST variant (and for types like Int, you should almost certainly use STUArrays instead of STArrays). Then the most important rule is that your functions be strict enough. The structure of findMin2 is okay, the implementation is just too lazy.
If performance matters, you will often have to avoid the generic higher order functions like foldM and write your own loops to avoid allocating lists and control strictness exactly as the problem at hand requires.

The problem is that minimum is a non-strict fold, so it is causing a thunk buildup. Use (foldl' min).
Now we add a bunch of verbiage to ignore because stackoverflow has turned worthless and no longer allows posting a straightforward answer.

Related

Which recursive calls are allowed in Haskell array creation?

As a short exercise in using Haskell arrays I wanted to implement a function giving the first n (odd) prime numbers. The code below (compiled with GHC 7.10.3) produces a loop error at runtime. "A Gentle Introduction to Haskell" uses recursive calls in array creation to compute Fibonacci numbers (https://www.haskell.org/tutorial/arrays.html, 13.2, code below for reference), which works just fine. My question is:
Where is the difference between the two ways of recursive creation? Which recursive calls are generally allowed when creating arrays?
My code:
import Data.Array.Unboxed
main = putStrLn $ show $ (primes 500)!500 --arbitrary example
primes :: Int -> UArray Int Int
primes n = a
where
a = array (1,n) $ primelist 1 [3,5..]
primelist i (m:ms) =
if all (not . divides m) [ a!j | j <- [1..(i-1)]]
then (i ,m) : primelist (succ i) ms
else primelist i ms
divides m k = m `mod` k == 0
Code from "A Gentle Introduction to Haskell":
fibs :: Int -> Array Int Int
fibs n = a where a = array (0,n) ([(0, 1), (1, 1)] ++
[(i, a!(i-2) + a!(i-1)) | i <- [2..n]])
Thanks in advance for any answers!

Update: I think I finally understood what's going on. array is lazy on the list elements, but is unnecessarily strict on its spine!
This causes a <<loop>> exception, for instance
test :: Array Int Int
test = array (1,2) ((1,1) : if test!1 == 1 then [(2,2)] else [(2,100)])
unlike
test :: Array Int Int
test = array (1,2) ((1,1) : [(2, if test!1 == 1 then 2 else 100)])
So, recursion works as long as it only affects the values.
A working version:
main :: IO ()
main = do
putStrLn $ show $ (primes 500)!500 --arbitrary example
-- A spine-lazy version of array
-- Assumes the list carries indices lo..hi
arraySpineLazy :: (Int, Int) -> [(Int, a)] -> Array Int a
arraySpineLazy (lo,hi) xs = array (lo,hi) $ go lo xs
where
go i _ | i > hi = []
go i ~((_,e):ys) = (i, e) : go (succ i) ys
primes :: Int -> Array Int Int
primes n = a
where
a :: Array Int Int
a = arraySpineLazy (1,n) $ primelist 1 (2: [3,5..])
primelist :: Int -> [Int] -> [(Int, Int)]
primelist i _ | i > n = []
primelist _ [] = [] -- remove warnings
primelist i (m:ms) =
if all (not . divides m) [ a!j | j <- [1..(i-1)]]
then (i ,m) : primelist (succ i) ms
else primelist i ms
divides m k = m `mod` k == 0
Arguably, we should instead write a lazier variant of listArray instead, since our array variant discard the first components of the pair.
This is a strictness issue: you can't generate unboxed arrays recursively, only boxed (regular) ones, since only boxed ones have a lazy semantics.
Forget arrays, and consider the following recursive pair definition
let (x,y) = (0,x)
This defines x=0 ; y=0, recursively. However, for the recursion to work, it is necessary that the pair is lazy. Otherwise, it generates an infinite recursion, much as the following would do:
let p = case p of (x,y) -> (0,x)
Above, p evaluates itself before it can expose the (,) pair constructor, so an infinite loop arises. By comparison,
let p = (0, case p of (x,y) -> x)
would work, since p produces the (,) before calling itself. Note however that this relies on the constructor (,) not evaluating the components before returning -- it has to be lazy, and return immediately leaving the components to be evaluated later.
Operationally, a pair is constructed having inside tho thunks: two pointers to code, which will evaluate the result later on. Hence the pair is not really a pair of integers, but a pair of indirections-to-integer. This is called "boxing", and is needed to achieve laziness, even if it carries a little computational cost.
By definition, unboxed data structures, like unboxed arrays, avoid boxing, so they are strict, not lazy, and they can not support the same recursion approaches.

Building up a list in Haskell

What I am wanting to do is create a list of random integers, with no duplicates. As a first step, I have a function which makes a list of n random samples. How does one write this in a more Haskell idiomatic way, where an empty list does not need to be passed in to start the list off? I am sure I am missing something basic and fundamental.
-- make a list of random integers.
-- takes a size, and an empty list.
-- returns a list of that length of random numbers.
f :: Int -> [Int] -> IO [Int]
f l xs | length xs >= l = return (xs)
f l xs = do
r <- randomRIO (1, 40) :: IO Int
f l $ r : x
Usage:
*Main> f 6 []
[10,27,33,35,31,28]
Ultimately this function will have filtering to check for duplicate insertions, but that is a separate question. Although this may look like homework, it is not, but part of my own attempt to come to grips with the State monad as used for random number generation, and finding I am stuck at a much earlier spot.

Well, you can operate on the output of the recursive call:
f :: Int -> IO [Int]
f 0 = return []
f n = do
r <- randomRIO (1, 40)
xs <- f (n-1)
return $ r : xs
Note however that it's important the the operation you perform on the result is fast. In this case r : xs is constant time. However if you replace the last line with (say):
return $ xs ++ [r]
this would change the complexity of the function from linear to quadratic because every ++ call will have to scan all the sequence of previously generated numbers before appending the new one.
However you could simply do:
f n = sequence $ replicate n (randomRIO (1, 40))
replicate creates a [IO Int] list of length n made of randomRIO actions and sequence takes an [IO a] and turns it into an IO [a] by executing all the actions in order and collecting the results.
Even simpler, you could use replicateM which is already the function you want:
import Control.Monad(replicateM)
f n = replicateM n (randomRIO (1, 40))
or in point-free style:
f :: Int -> IO [Int]
f = flip replicateM $ randomRIO (1, 40)

This uses a Set to keep track of numbers already generated:
import System.Random
import qualified Data.Set as Set
generateUniqueRandoms :: (Int, Int) -> Int -> IO [Int]
generateUniqueRandoms range#(low, high) n =
let maxN = min (high - low) n
in
go maxN Set.empty
where
go 0 _ = return []
go n s = do
r <- getUniqueRandom s
xs <- go (n-1) (Set.insert r s)
return $ r : xs
getUniqueRandom s = do
r <- randomRIO range
if (Set.member r s) then getUniqueRandom s
else return r
Here is some sample output:
Main> generateUniqueRandoms (1, 40) 23
[29,22,2,17,5,8,24,27,10,16,6,3,14,37,25,34,30,28,7,31,15,20,36]
Main> generateUniqueRandoms (1, 40) 1000
[33,35,24,16,13,1,26,7,14,11,15,2,4,30,28,6,32,25,38,22,17,12,20,5,18,40,36,39,27,9,37,31,21,29,8,34,10,23,3]
Main> generateUniqueRandoms (1, 40) 0
[]
However, it is worth noting that if n is close to the width of the range, it'd be much more efficient to shuffle a list of all numbers in the range and take the first n of that.

Random numbers without duplicates

Simulating a lottery of 6 numbers chosen from 40, I want to create a list of numbers in Haskell using the system random generator but eliminate duplicates, which often arise.
If I have the following:
import System.Random
main :: IO ()
main = do
rs <- forM [1..6] $ \_x -> randomRIO (1, 40) :: (IO Int)
print rs
this is halfway. But how do I filter out duplicates? It seems to me I need a while loop of some sort to construct a list filtering elements that are already in the list until the list is the required size. If I can generate an infinite list of random numbers and filter it inside the IO monad I am sure that would work, but I do not know how to approach this. It seems while loops are generally deprecated in Haskell, so I am uncertain of the true Haskeller's way here. Is this a legitimate use case for a while loop, and if so, how does one do that?

The function you are looking for is nub from Data.List, to filter dublicates.
import Data.List
import System.Random
main = do
g <- newStdGen
print . take 6 . nub $ (randomRs (1,40) g :: [Int])

If you don't mind using a library, then install the random-shuffle package and use it like this:
import System.Random.Shuffle
import Control.Monad.Random
main1 = do
perm <- evalRandIO $ shuffleM [1..10]
print perm
If you want to see how to implement a naive Fischer-Yates shuffle using lists in Haskell, have a look at this code:
shuffle2 xs g = go [] g (length xs) xs
where
go perm g n avail
| n == 0 = (perm,g)
| otherwise = let (i, g') = randomR (0,n-1) g
a = avail !! i
-- can also use splitAt to define avail':
avail' = take i avail ++ drop (i+1) avail
in go (a:perm) g' (n-1) avail'
main = do
perm <- evalRandIO $ liftRand $ shuffle2 [1..10]
print perm
The parameters to the go helper function are:
perm - the constructed permutation so far
g - the current generator value
n - the length of the available items
avail - the available items - i.e. items not yet selected to be part of the permutation
go simply adds a random element from avail to the permutation being constructed and recursively calls itself with the new avail list and new generator.
To only draw k random elements from xs, just start go at k instead of length xs:
shuffle2 xs k g = go [] g k xs
...
You could also use a temporary array (in the ST or IO monad) to implement a Fischer-Yates type algorithm. The shuffleM function in random-shuffle uses a yet completely different approach which you might find interesting.
Update: Here is an example of using an ST-array in a F-Y style algorithm:
import Control.Monad.Random
import Data.Array.ST
import Control.Monad
import Control.Monad.ST (runST, ST)
shuffle3 :: RandomGen g => Int -> g -> ([Int], g)
shuffle3 n g0 = runST $ do
arr <- newListArray (1,n) [1..n] :: ST s (STUArray s Int Int)
let step g i = do let (j,g') = randomR (1,n) g
-- swap i and j
a <- readArray arr i
b <- readArray arr j
writeArray arr j a
writeArray arr i b
return g'
g' <- foldM step g0 [1..n]
perm <- getElems arr
return (perm, g')
main = do
perm <- evalRandIO $ liftRand $ shuffle3 20
print perm

I've used the Fisher Yates Shuffle in C++ with a decent random number generator to great success. This approach is very efficient if you are willing to allocate an array for holding numbers 1 to 40.

Going the strict IO way requires to break down nub, bringing the condition into the tail recursion.
import System.Random
randsf :: (Eq a, Num a, Random a) => [a] -> IO [a]
randsf rs
| length rs > 6 = return rs
| otherwise = do
r <- randomRIO (1,40)
if elem r rs
then randsf rs
else randsf (r:rs)
main = do
rs <- randsf [] :: IO [Int]
print rs
If you know what you do unsafeInterleaveIO from System.IO.Unsafe can be handy, allowing you to generate lazy lists from IO. Functions like getContents work this way.
import Control.Monad
import System.Random
import System.IO.Unsafe
import Data.List
rands :: (Eq a, Num a, Random a) => IO [a]
rands = do
r <- randomRIO (1,40)
unsafeInterleaveIO $ liftM (r:) rands
main = do
rs <- rands :: IO [Int]
print . take 6 $ nub rs

You commented:
The goal is to learn how to build a list monadically using filtering. It's a raw newbie question
Maybe you should change the question title then! Anyways, this is quite a common task. I usually define a combinator with a general monadic type that does what I want, give it a descriptive name (which I didn't quite succeed in here :-) and then use it, like below
import Control.Monad
import System.Random
-- | 'myUntil': accumulate a list with unique action results until the list
-- satisfies a test
myUntil :: (Monad m, Eq a) => ([a] -> Bool) -> m a -> m [a]
myUntil test action = myUntil' test [] action where
myUntil' test resultSoFar action = do
if test resultSoFar then
return resultSoFar
else do
x <- action
let newResults = if x `elem` resultSoFar then resultSoFar
else resultSoFar ++ [x] -- x:resultSoFar
myUntil' test newResults action
main :: IO ()
main = do
let enough xs = length xs == 6
drawNumber = randomRIO (0, 40::Int)
numbers <- myUntil enough drawNumber
print numbers
NB: this is not the optimal way to get your 6 distinct numbers, but meant as an example how to, in general, build a list monadically using a filter that works on the entire list
It is in essence the same as Vektorweg's longest answer, but uses a combinator with a much more general type (which is the way I like to do it, which may be more useful for you, given your comment at the top of this answer)

Haskell: how to operate the string type in a do block

I want to make a function that firstly divides a list l to two list m and n. Then create two thread to find out the longest palindrome in the two list. My code is :
import Control.Concurrent (forkIO)
import System.Environment (getArgs)
import Data.List
import Data.Ord
main = do
l <- getArgs
forkIO $ putStrLn $ show $ longestPalindr $ mList l
forkIO $ putStrLn $ show $ longestPalindr $ nList l
longestPalindr x =
snd $ last $ sort $
map (\l -> (length l, l)) $
map head $ group $ sort $
filter (\y -> y == reverse y) $
concatMap inits $ tails x
mList l = take (length l `div` 2) l
nList l = drop (length l `div` 2) l
Now I can compile it, but the result is a [ ]. When I just run the longestPalindr and mList , I get the right result. I thought the logic here is right. So what is the problem?

The question title may need to be changed, as this is no longer about type errors.
The functionality of the program can be fixed by simply mapping longestPalindr across the two halves of the list. In your code, you are finding the longest palindrome across [[Char]], so the result length is usually just 1.
I've given a simple example of par and pseq. This just suggests to the compiler that it may be smart to evaluate left and right independently. It doesn't guarantee parallel evaluation, but rather leaves it up to the compiler to decide.
Consult Parallel Haskell on the wiki to understand sparks, compile with the -threaded flag, then run it with +RTS -N2. Add -stderr for profiling, and see if there is any benefit to sparking here. I would expect negative returns until you start to feed it longer lists.
For further reading on functional parallelism, take a look at Control.Parallel.Strategies. Manually wrangling threads in Haskell is only really needed in nondeterministic scenarios.
import Control.Parallel (par, pseq)
import System.Environment (getArgs)
import Data.List
import Data.Ord
import Control.Function (on)
main = do
l <- getArgs
let left = map longestPalindr (mList l)
right = map longestPalindr (nList l)
left `par` right `pseq` print $ longest (left ++ right)
longestPalindr x = longest pals
where pals = nub $ filter (\y -> y == reverse y) substrings
substrings = concatMap inits $ tails x
longest = maximumBy (compare `on` length)
mList l = take (length l `div` 2) l
nList l = drop (length l `div` 2) l

For reference, please read the Parallelchapter from Simon Marlow's book.
http://chimera.labs.oreilly.com/books/1230000000929/ch02.html#sec_par-eval-whnf
As others have stated, using par from the Eval monad seems to be the correct approach here.
Here is a simplified view of your problem. You can test it out by compiling with +RTS -threaded -RTSand then you can use Thread Scope to profile your performance.
import Control.Parallel.Strategies
import Data.List (maximumBy, subsequences)
import Data.Ord
isPalindrome :: Eq a => [a] -> Bool
isPalindrome xs = xs == reverse xs
-- * note while subsequences is correct, it is asymptotically
-- inefficient due to nested foldr calls
getLongestPalindrome :: Ord a => [a] -> Int
getLongestPalindrome = length . maximum' . filter isPalindrome . subsequences
where maximum' :: Ord a => [[a]] -> [a]
maximum' = maximumBy $ comparing length
--- Do it in parallel, in a monad
-- rpar rpar seems to fit your case, according to Simon Marlow's book
-- http://chimera.labs.oreilly.com/books/1230000000929/ch02.html#sec_par-eval-whnf
main :: IO ()
main = do
let shorter = [2,3,4,5,4,3,2]
longer = [1,2,3,4,5,4,3,2,1]
result = runEval $ do
a <- rpar $ getLongestPalindrome shorter
b <- rpar $ getLongestPalindrome longer
if a > b -- 'a > b' will always be false in this case
then return (a,"shorter")
else return (b,"longer")
print result
-- This will print the length of the longest palindrome along w/ the list name
-- Don't forget to compile w/ -threaded and use ThreadScope to check
-- performance and evaluation

How do you do an in-place quicksort in Haskell

Could somebody provide an in-place quicksort haskell function? I.e. it returns a new sorted list, but the input list is copied to a mutable array or something.
I want to see how to do this, because I have a performance critical program where i need to simulate races and tally scores. If I use immutable data structures for this, each race will take O(log(numRaces) + numRunners) time, whereas if I use mutable arrays etc, each race will take O(log(numRaces)) time.
oh by the way i didn't actually need to do quicksort, i just wanted an example to see how to use mutable arrays effectively

Here's a version, just to prove you can convert code from Wikipedia almost exactly into Haskell ;)
import Control.Monad.ST
import Data.Array.ST
import Data.Foldable
import Control.Monad
-- wiki-copied code starts here
partition arr left right pivotIndex = do
pivotValue <- readArray arr pivotIndex
swap arr pivotIndex right
storeIndex <- foreachWith [left..right-1] left (\i storeIndex -> do
val <- readArray arr i
if (val <= pivotValue)
then do
swap arr i storeIndex
return (storeIndex + 1)
else do
return storeIndex )
swap arr storeIndex right
return storeIndex
qsort arr left right = when (right > left) $ do
let pivotIndex = left + ((right-left) `div` 2)
newPivot <- partition arr left right pivotIndex
qsort arr left (newPivot - 1)
qsort arr (newPivot + 1) right
-- wrapper to sort a list as an array
sortList xs = runST $ do
let lastIndex = length xs - 1
arr <- newListArray (0,lastIndex) xs :: ST s (STUArray s Int Int)
qsort arr 0 lastIndex
newXs <- getElems arr
return newXs
-- test example
main = print $ sortList [212498,127,5981,2749812,74879,126,4,51,2412]
-- helpers
swap arr left right = do
leftVal <- readArray arr left
rightVal <- readArray arr right
writeArray arr left rightVal
writeArray arr right leftVal
-- foreachWith takes a list, and a value that can be modified by the function, and
-- it returns the modified value after mapping the function over the list.
foreachWith xs v f = foldlM (flip f) v xs

See 'sort' in the vector-algorithms package: http://hackage.haskell.org/packages/archive/vector-algorithms/0.4/doc/html/src/Data-Vector-Algorithms-Intro.html#sort

Check out this code, it has a quick sort version that uses arrays from IO module. You can adapt it to your needs. Keep in mind though that imperative programming in Haskell can give you headaches if you are not careful (mine programs usually suffered from huge memory use and 90% time spent in garbage collection).

syntactically i like this one best:
main :: IO ()
main = do print $ qs "qwertzuiopasdfghjklyxcvbnm"
qs :: Ord a => [a] -> [a]
qs [] = []
qs (x:xs) = qs lt ++ [x] ++ qs gt
where
lt = [y | y <- xs, y <= x]
gt = [y | y <- xs, y > x]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

STArray and stack overflow - haskell

The problem is that minimum is a non-strict fold, so it is causing a thunk buildup. Use (foldl' min). Now we add a bunch of verbiage to ignore because stackoverflow has turned worthless and no longer allows posting a straightforward answer.

Related

Which recursive calls are allowed in Haskell array creation?

Building up a list in Haskell

Random numbers without duplicates

Haskell: how to operate the string type in a do block

How do you do an in-place quicksort in Haskell

Categories

Resources