Why my program use so much memory? - haskell

For just a 25mb file the memory usage is constant at 792mb! I thought it had to do with my usage
from list, but moving certain parts of the code for vector (the arrays where fft is applied, for example) didn't change how much memory being used at all!
{-# LANGUAGE OverloadedStrings,BangPatterns #-}
import qualified Data.Attoparsec.Char8 as Ap
import Data.Attoparsec
import Control.Monad
import Control.Applicative
--import Control.DeepSeq (force)
import System.IO
import System.Environment
import Data.List (zipWith4,unzip4,zip4,foldl')
import Data.Bits
import Data.Complex
import Data.String (fromString)
import Data.ByteString.Internal
import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Lazy.Char8 as Bl
import qualified Data.Vector.Unboxed as Vu
import qualified Statistics.Transform as St
{-
I run a test on a collection of data from a file
[(1,t),(2,t),(3,t),(4,t),(5,t)]
- - -
| - - -
| | - - -
| | |
[y++t, n, y++t]
To do that, I use splitN to create a list of list
[[(1,t),(2,t),(3,t)],[(2,t),(3,t),(4,t)],[(3,t),(4,t),(5,t)]]
Map a serie of functions to determine a value for each inner collection,
and return when an event happened.
-}
data FourD b a = FourD a a a b
instance Functor (FourD c) where
fmap f (FourD x y z d) = FourD (f x) (f y) (f z) d
mgrav_per_bit = [ 18, 36, 71, 143, 286, 571, 1142 ]
--Converting raw data to mg
aToG :: Int -> Double
aToG a = fromIntegral . sign $ uresult
where
twocomp = if a>128
then 256-a
else a
uresult = sum $ zipWith (*) mgrav_per_bit (map (fromEnum . testBit twocomp) [0..7])
sign = if a > 128
then negate
else id
--Data is (int,int,int,time)
--Converted to (St.CD^3,Bytestring) in place of maping afterwards.
parseAcc :: Parser (FourD B.ByteString St.CD)
parseAcc = do Ap.char '('
x <- fmap ((:+0) . aToG) Ap.decimal
Ap.char ','
y <- fmap ((:+0) . aToG) Ap.decimal
Ap.char ','
z <- fmap ((:+0) . aToG) Ap.decimal
Ap.char ','
time <- takeTill (== 41)
Ap.char ')'
return $! FourD x y z time
--applies parseAcc to many lines, fails at the end of file (Need to add a newline)
parseFile = many $ parseAcc <* (Ap.endOfInput <|> Ap.endOfLine)
readExpr input = case parse parseFile input of
Done b val -> val
Partial p -> undefined
Fail a b c -> undefined
unType (FourD x y d z) = (x ,y ,d ,z)
-- Breaks a list of FourD into smaller lists, apply f and g to those lists, then filter the result based if an even happened or not
amap :: (Num c, Ord c) => ([a] -> [c]) -> ([d] -> [ByteString]) -> [FourD d a] -> [Bl.ByteString]
amap f g = (uncurry4 (zipWith4 (filterAcc))). map4 f g . unzip4 . map (unType)
where map4 f g (a,b,c,d) = (f a,f b,f c,g d)
uncurry4 f (a,b,c,d) = f a b c d
-- before i had map filterAcc,outside amap. Tried to fuse everything to eliminate intermediaries
-- An event is detected if x > 50
filterAcc x y z t = if x > 50
then (Bl.pack . B.unpack) $ "yes: " `B.append` t
else ""
-- split [St.CD] in [(Vector St.CD)], apply fft to each, and compress to a single value.
-- Core of the application
fftAcross :: [St.CD] -> [Int]
fftAcross = map (floor . noiseEnergy . St.fft) . splitN 32
-- how the value is determined (sum of all magnitudes but the first one)
noiseEnergy :: (RealFloat a, Vu.Unbox a) => Vu.Vector (Complex a) -> a
noiseEnergy x = (Vu.foldl' (\b a-> b+(magnitude a)) 0 (Vu.drop 1 x))/32
-- how the values are split in (Vector St.CD), if lenght > 32, takes 32, otherwhise I'm done
splitN :: Vu.Unbox a => Int -> [a] -> [Vu.Vector a]
splitN n x = helper x
where
helper x = if atLeast n x
then (Vu.take n (Vu.fromList x)) : (helper (drop 1 x) )
else []
-- Replacing the test by atLeast in place of a counter (that compared to length x,calculated once) reduced the behaviour that memory usage was constant.
-- this is replicated so the behaviour of splitN happens on the time part of FourD, Can't use the same since there is no Vector Bytestring instance
splitN2 n x = helper x
where
helper x = if atLeast n x
then (head x) : (helper (drop 1 x))
else []
atLeast :: Int -> [a] -> Bool
atLeast 0 _ = True
atLeast _ [] = False
atLeast n (_:ys) = atLeast (n-1) ys
main = do
filename <- liftM head getArgs
filehandle <- openFile "results.txt" WriteMode
contents <- liftM readExpr $ B.readFile filename
Bl.hPutStr (filehandle) . Bl.unlines . splitAndApplyAndFilter $ contents where
splitAndApplyAndFilter = amap fftAcross (splitN2 32)
Edit: after some refactoring, fusing some maps, reducing length, I managed to get this working at 400~ with a 25mb input file. Still, on a 100mb, it takes 1.5gb.
The program is intended to determine if a certain event happened ina point of time, for that it requries a collection of values (im using 32 atm), runs a fft in it, sum those values and see if passes a threshold. If yes, print the time to a file.
http://db.tt/fT8kXPKz for a 25mb testfile

I found the solution due a topic in reddit about the same problem!
Parsing with Haskell and Attoparsec
The great majority of my problem was caused by the fact attoparsec is strict and haskell data are rather large (so a 100mb text file can be actually much more in run time)
The other half was that profiling doubles the memory use, and I didn't account for that.
After changing the parser to be lazy, my program uses 120mb in place of 800mb (when input size is 116mb), so sucess!
In case this interest someone, here is the relevant piece of code change:
readExpr input = case parse (parseAcc<*(Ap.endOfLine<*Ap.endOfInput<|>Ap.endOfLine)) input of
Done b val -> val : readExpr b
Partial e -> []
Fail _ _ c -> error c
The full code:
{-# LANGUAGE OverloadedStrings,BangPatterns #-}
import qualified Data.Attoparsec.Char8 as Ap
import Data.Attoparsec
import Control.Monad
import Control.Applicative
--import Control.DeepSeq (force)
import System.IO
import System.Environment
import Data.List (zipWith4,unzip4,zip4,foldl')
import Data.Bits
import Data.Complex
import Data.String (fromString)
import Data.ByteString.Internal
import qualified Data.ByteString.Char8 as B
import qualified Data.ByteString.Lazy.Char8 as Bl
import qualified Data.Vector.Unboxed as Vu
import qualified Statistics.Transform as St
{-
I run a test on a collection of data from a file
[(1,t),(2,t),(3,t),(4,t),(5,t)]
- - -
| - - -
| | - - -
| | |
[y++t, n, y++t]
To do that, I use splitN to create a list of list
[[(1,t),(2,t),(3,t)],[(2,t),(3,t),(4,t)],[(3,t),(4,t),(5,t)]]
Map a serie of functions to determine a value for each inner collection,
and return when an event happened.
-}
data FourD b a = FourD a a a b
instance Functor (FourD c) where
fmap f (FourD x y z d) = FourD (f x) (f y) (f z) d
mgrav_per_bit = [ 18, 36, 71, 143, 286, 571, 1142 ]
--Converting raw data to mg
aToG :: Int -> Double
aToG a = fromIntegral . sign $ uresult
where
twocomp
| a>128 = 256-a
| otherwise = a
uresult = sum $ zipWith (*) mgrav_per_bit (map (fromEnum . testBit twocomp) [0..7])
sign
| a > 128 = negate
| otherwise = id
--Data is (int,int,int,time)
--Converted to (St.CD^3,Bytestring) in place of maping afterwards.
parseAcc :: Parser (FourD B.ByteString St.CD)
parseAcc = do Ap.char '('
x <- fmap ((:+0) . aToG) Ap.decimal -- Parse, transform to mg, convert to complex
Ap.char ','
y <- fmap ((:+0) . aToG) Ap.decimal
Ap.char ','
z <- fmap ((:+0) . aToG) Ap.decimal
Ap.char ','
time <- takeTill (== 41)
Ap.char ')'
return $! FourD x y z time
--applies parseAcc to many lines, fails at the end of file (Need to add a newline)
parseFile = many $ parseAcc <* (Ap.endOfInput <|> Ap.endOfLine)
readExpr input = case parse (parseAcc<*(Ap.endOfLine<*Ap.endOfInput<|>Ap.endOfLine)) input of
Done b val -> val : readExpr b
Partial e -> []
Fail _ _ c -> error c
unType (FourD x y d z) = (x ,y ,d ,z)
-- Breaks a list of FourD into smaller lists, apply f and g to those lists, then filter the result based if an even happened or not
amap :: (Num c, Ord c) => ([a] -> [c]) -> ([d] -> [ByteString]) -> [FourD d a] -> [ByteString]
amap f g = (uncurry4 (zipWith4 (filterAcc))). map4 f g . unzip4 . map (unType)
where map4 f g (a,b,c,d) = (f a,f b,f c,g d)
uncurry4 f (a,b,c,d) = f a b c d
-- before i had map filterAcc,outside amap. Tried to fuse everything to eliminate intermediaries
-- An event is detected if x > 50
filterAcc x y z t
| x > 50 = t
| otherwise = ""
-- split [St.CD] in [(Vector St.CD)], apply fft to each, and compress to a single value.
-- Core of the application
fftAcross :: [St.CD] -> [Int]
fftAcross = map (floor . noiseEnergy . St.fft) . splitN 32
-- how the value is determined (sum of all magnitudes but the first one)
noiseEnergy :: (RealFloat a, Vu.Unbox a) => Vu.Vector (Complex a) -> a
noiseEnergy x = (Vu.foldl' (\b a-> b+(magnitude a)) 0 (Vu.drop 1 x))/32
-- how the values are split in (Vector St.CD), if lenght > 32, takes 32, otherwhise I'm done
splitN :: Vu.Unbox a => Int -> [a] -> [Vu.Vector a]
splitN n x = helper x
where
helper x
| atLeast n x = (Vu.take n (Vu.fromList x)) : (helper (drop 1 x) )
| otherwise = []
-- Replacing the test by atLeast in place of a counter (that compared to length x,calculated once) reduced the behaviour that memory usage was constant.
-- this is replicated so the behaviour of splitN happens on the time part of FourD, Can't use the same since there is no Vector Bytestring instance
splitN2 n x = helper x
where
helper x
| atLeast n x = (head x) : (helper (drop 1 x))
| otherwise = []
atLeast :: Int -> [a] -> Bool
atLeast 0 _ = True
atLeast _ [] = False
atLeast n (_:ys) = atLeast (n-1) ys
intervalFinder :: [ByteString]->[B.ByteString]
intervalFinder x = helper x ""
where
helper (x:xs) ""
| x /= "" = ("Start Time: " `B.append` x `B.append` "\n"):(helper xs x)
| otherwise = helper xs ""
helper (x:xs) y
| x == "" = ( "End Time: "`B.append` y `B.append` "\n\n" ):(helper xs "")
| otherwise = helper xs x
helper _ _ = []
main = do
filename <- liftM head getArgs
filehandle <- openFile "results.txt" WriteMode
contents <- liftM readExpr $ B.readFile filename
Bl.hPutStr (filehandle) . Bl.fromChunks . intervalFinder . splitAndApplyAndFilter $ contents
hClose filehandle
where
splitAndApplyAndFilter = amap fftAcross (splitN2 32)
--contents <- liftM ((map ( readExpr )) . B.lines) $ B.readFile filename
{- *Main> let g = liftM ((amap fftAcross (splitN2 32)) . readExpr) $ B.readFile "te
stpattern2.txt"
-}
-- B.hPutStrLn (filehandle) . B.unlines . map (B.pack . show ) . amap (map (floor .quare) . (filter (/=[])) . map ( (drop 1) . (map (/32)) . fft ) . splitN 32) . map ( fmap(fromIntegral . aToG)) . map readExpr $ contents

Related

How can I optimize parallel sorting to improve temporal performance?

I have an algorithm for parallel sorting a list of a given length:
import Control.Parallel (par, pseq)
import Data.Time.Clock (diffUTCTime, getCurrentTime)
import System.Environment (getArgs)
import System.Random (StdGen, getStdGen, randoms)
parSort :: (Ord a) => [a] -> [a]
parSort (x:xs) = force greater `par` (force lesser `pseq`
(lesser ++ x:greater))
where lesser = parSort [y | y <- xs, y < x]
greater = parSort [y | y <- xs, y >= x]
parSort _ = []
sort :: (Ord a) => [a] -> [a]
sort (x:xs) = lesser ++ x:greater
where lesser = sort [y | y <- xs, y < x]
greater = sort [y | y <- xs, y >= x]
sort _ = []
parSort2 :: (Ord a) => Int -> [a] -> [a]
parSort2 d list#(x:xs)
| d <= 0 = sort list
| otherwise = force greater `par` (force lesser `pseq`
(lesser ++ x:greater))
where lesser = parSort2 d' [y | y <- xs, y < x]
greater = parSort2 d' [y | y <- xs, y >= x]
d' = d - 1
parSort2 _ _ = []
force :: [a] -> ()
force xs = go xs `pseq` ()
where go (_:xs) = go xs
go [] = 1
randomInts :: Int -> StdGen -> [Int]
randomInts k g = let result = take k (randoms g)
in force result `seq` result
testFunction = parSort
main = do
args <- getArgs
let count | null args = 500000
| otherwise = read (head args)
input <- randomInts count `fmap` getStdGen
start <- getCurrentTime
let sorted = testFunction input
putStrLn $ "Sort list N = " ++ show (length sorted)
end <- getCurrentTime
putStrLn $ show (end `diffUTCTime` start)
I want to get the time to perform parallel sorting on 2, 3 and 4 processor cores less than 1 core.
At the moment, this result I can not achieve.
Here are my program launches:
1. SortList +RTS -N1 -RTS 10000000
time = 41.2 s
2.SortList +RTS -N3 -RTS 10000000
time = 39.55 s
3.SortList +RTS -N4 -RTS 10000000
time = 54.2 s
What can I do?
Update 1:
testFunction = parSort2 60
Here's one idea you can play around with, using Data.Map. For simplicity and performance, I assume substitutivity for the element type, so we can count occurrences rather than storing lists of elements. I'm confident that you can get better results using some fancy array algorithm, but this is simple and (essentially) functional.
When writing a parallel algorithm, we want to minimize the amount of work that must be done sequentially. When sorting a list, there's one thing that we really can't avoid doing sequentially: splitting up the list into pieces for multiple threads to work on. We'd like to get that done with as little effort as possible, and then try to work mostly in parallel from then on.
Let's start with a simple sequential algorithm.
{-# language BangPatterns, TupleSections #-}
import qualified Data.Map.Strict as M
import Data.Map (Map)
import Data.List
import Control.Parallel.Strategies
type Bag a = Map a Int
ssort :: Ord a => [a] -> [a]
ssort xs =
let m = M.fromListWith (+) $ (,1) <$> xs
in concat [replicate c x | (x,c) <- M.toList m]
How can we parallelize this? First, let's break up the list into pieces. There are various ways to do this, none of them great. Assuming a small number of capabilities, I think it's reasonable to let each of them walk the list itself. Feel free to experiment with other approaches.
-- | Every Nth element, including the first
everyNth :: Int -> [a] -> [a]
everyNth n | n <= 0 = error "What you doing?"
everyNth n = go 0 where
go !_ [] = []
go 0 (x : xs) = x : go (n - 1) xs
go k (_ : xs) = go (k - 1) xs
-- | Divide up a list into N pieces fairly. Walking each list in the
-- result will walk the original list.
splatter :: Int -> [a] -> [[a]]
splatter n = map (everyNth n) . take n . tails
Now that we have pieces of list, we spark threads to convert them to bags.
parMakeBags :: Ord a => [[a]] -> Eval [Bag a]
parMakeBags xs =
traverse (rpar . M.fromListWith (+)) $ map (,1) <$> xs
Now we can repeatedly merge pairs of bags until we have just one.
parMergeBags_ :: Ord a => [Bag a] -> Eval (Bag a)
parMergeBags_ [] = pure M.empty
parMergeBags_ [t] = pure t
parMergeBags_ q = parMergeBags_ =<< go q where
go [] = pure []
go [t] = pure [t]
go (t1:t2:ts) = (:) <$> rpar (M.unionWith (+) t1 t2) <*> go ts
But ... there's a problem. In each round of merges, we use only half as many capabilities as we did in the previous one, and perform the final merge with just one capability. Ouch! To fix this, we'll need to parallelize unionWith. Fortunately, this is easy!
import Data.Map.Internal (Map (..), splitLookup, link)
parUnionWith
:: Ord k
=> (v -> v -> v)
-> Int -- Number of threads to spark
-> Map k v
-> Map k v
-> Eval (Map k v)
parUnionWith f n t1 t2 | n <= 1 = rseq $ M.unionWith f t1 t2
parUnionWith _ !_ Tip t2 = rseq t2
parUnionWith _ !_ t1 Tip = rseq t1
parUnionWith f n (Bin _ k1 x1 l1 r1) t2 = case splitLookup k1 t2 of
(l2, mb, r2) -> do
l1l2 <- parEval $ parUnionWith f (n `quot` 2) l1 l2
r1r2 <- parUnionWith f (n `quot` 2) r1 r2
case mb of
Nothing -> rseq $ link k1 x1 l1l2 r1r2
Just x2 -> rseq $ link k1 fx1x2 l1l2 r1r2
where !fx1x2 = f x1 x2
Now we can fully parallelize bag merging:
-- Uses the given number of capabilities per merge, initially,
-- doubling for each round.
parMergeBags :: Ord a => Int -> [Bag a] -> Eval (Bag a)
parMergeBags !_ [] = pure M.empty
parMergeBags !_ [t] = pure t
parMergeBags n q = parMergeBags (n * 2) =<< go q where
go [] = pure []
go [t] = pure [t]
go (t1:t2:ts) = (:) <$> parEval (parUnionWith (+) n t1 t2) <*> go ts
We can then implement a parallel merge like this:
parMerge :: Ord a => [[a]] -> Eval [a]
parMerge xs = do
bags <- parMakeBags xs
-- Why 2 and not one? We only have half as many
-- pairs as we have lists (capabilities we want to use)
-- so we double up.
m <- parMergeBags 2 bags
pure $ concat [replicate c x | (x,c) <- M.toList m]
Putting the pieces together,
parSort :: Ord a => Int -> [a] -> Eval [a]
parSort n = parMerge . splatter n
pSort :: Ord a => Int -> [a] -> [a]
pSort n = runEval . parMerge . splatter n
There's just one sequential piece remaining that we can parallelize: converting the final bag to a list. Is it worth parallelizing? I'm pretty sure that in practice it is not. But let's do it anyway, just for fun! To avoid considerable extra complexity, I'll assume that there aren't large numbers of equal elements; repeated elements in the result will lead to some work (thunks) remaining in the result list.
We'll need a basic partial list spine forcer:
-- | Force the first n conses of a list
walkList :: Int -> [a] -> ()
walkList n _ | n <= 0 = ()
walkList _ [] = ()
walkList n (_:xs) = walkList (n - 1) xs
And now we can convert the bag to a list in parallel chunks without paying for concatenation:
-- | Use up to the given number of threads to convert a bag
-- to a list, appending the final list argument.
parToListPlus :: Int -> Bag k -> [k] -> Eval [k]
parToListPlus n m lst | n <= 1 = do
rseq (walkList (M.size m) res)
pure res
-- Note: the concat and ++ should fuse away when compiling with
-- optimization.
where res = concat [replicate c x | (x,c) <- M.toList m] ++ lst
parToListPlus _ Tip lst = pure lst
parToListPlus n (Bin _ x c l r) lst = do
r' <- parEval $ parToListPlus (n `quot` 2) r lst
res <- parToListPlus (n `quot` 2) l $ replicate c x ++ r'
rseq r' -- make sure the right side is finished
pure res
And then we modify the merger accordingly:
parMerge :: Ord a => Int -> [[a]] -> Eval [a]
parMerge n xs = do
bags <- parMakeBags xs
m <- parMergeBags 2 bags
parToListPlus n m []

Composing arbitrarily many maps in Haskell

How is it possible to compose n maps in Haskell?
I've tried doing it recursively:
composeMap 0 f = (\x -> x)
composeMap n f = (.) f (composeMap (n-1) f)
And iteratively:
composeMap' n k f g =
if n == k then g
else composeMap' n (k+1) f (f . g)
composeMap n f = composeMap' n 0 f (\x -> x)
But to no avail. Haskell thinks I am constructing an infinite type.
This is obviously false as the function defined is finite for any
n >= 0.
Any suggestions?
Some have posted solutions treating f as having the following type signature:
f :: a -> a
However, I want this to work for f s.t. f is polymorphic in the following way:
f :: a -> a'
f :: a' -> a''
In particular, I want a function that works for the function map, with possible type signatures:
map :: (a -> b) -> [a] -> [b]
map (polymorphic) :: ([a] -> [b]) -> [[a]] -> [[b]]
The function compiles perfectly fine, but Haskell infers the following type signature, which is not what I want:
composeMap'' :: Int -> (b -> b) -> b -> b
I've even tried wrapping map in a monad, but Haskell still thinks I'm constructing an infinite type:
composeMap n f = foldl (>>=) f (replicate n (\x -> return (map x)))
Edit:
I got what I want with the following template Haskell code. Pretty sweet.
This is for declaring the composed map functions:
composeMap :: Int -> Q Dec
composeMap n
| n >= 1 = funD name [cl]
| otherwise = fail "composeMap: argument n may not be <= 0"
where
name = mkName $ "map" ++ show n
composeAll = foldl1 (\fs f -> [| $fs . $f |])
funcs = replicate n [| map |]
composedF = composeAll funcs
cl = clause [] (normalB composedF) []
This is for inlining the composed map. It is more flexible:
composeMap :: Int -> Q Exp
composeMap n = do
f <- newName "f"
maps <- composedF
return $ LamE [(VarP f)] (AppE maps (VarE f))
where
composeAll = foldl1 (\fs f -> [| $fs . $f |])
funcs = replicate n [| map |]
composedF = composeAll funcs
Also, the guys who put the question on hold didn't even understand the question in the first place...
I am afraid I am missing something. Your first implementation compiles and works fine for me (ghc 8.0.2).
Your second implementation failed to compile because you forgot the ' in the else clause. Here is my complete source file:
composeMap1 0 f = (\x -> x)
composeMap1 n f = (.) f (composeMap1 (n-1) f)
composeMap2' n k f g =
if n == k then g
else composeMap2' n (k+1) f (f . g)
composeMap2 n f = composeMap2' n 0 f (\x -> x)
And some tests
λ: :l question.hs
[1 of 1] Compiling Main ( question.hs, interpreted )
Ok, modules loaded: Main.
λ: doubleQuote = composeMap1 2 (\x -> "'" ++ x ++ "'")
λ: doubleQuote "something"
"''something''"
λ: doubleQuote = composeMap2 2 (\x -> "'" ++ x ++ "'")
λ: doubleQuote "something"
"''something''"
λ: plusThree = composeMap1 3 (+1)
λ: plusThree 10
13
λ: plusThree = composeMap2 3 (+1)
λ: plusThree 10
13

Elegant implementation of n-dimensional matrix multiplication using lists?

List functions allow us to implement arbitrarily-dimensional vector math quite elegantly. For example:
on = (.) . (.)
add = zipWith (+)
sub = zipWith (-)
mul = zipWith (*)
dist = len `on` sub
dot = sum `on` mul
len = sqrt . join dot
And so on.
main = print $ add [1,2,3] [1,1,1] -- [2,3,4]
main = print $ len [1,1,1] -- 1.7320508075688772
main = print $ dot [2,0,0] [2,0,0] -- 4
Of course, this is not the most efficient solution, but is insightful to look at, as one can say map, zipWith and such generalize those vector operations. There is one function I couldn't implement elegantly, though - that is cross products. Since a possible n-dimensional generalization of cross products is the nd matrix determinant, how can I implement matrix multiplication elegantly?
Edit: yes, I asked a completely unrelated question to the problem I set up. Fml.
It just so happens I have some code lying around for doing n-dimensional matrix operations which I thought was quite cute when I wrote it at least:
{-# LANGUAGE NoMonomorphismRestriction #-}
module MultiArray where
import Control.Arrow
import Control.Monad
import Data.Ix
import Data.Maybe
import Data.Array (Array)
import qualified Data.Array as A
-- {{{ from Dmwit.hs
deleteAt n xs = take n xs ++ drop (n + 1) xs
insertAt n x xs = take n xs ++ x : drop n xs
doublify f g xs ys = f (uncurry g) (zip xs ys)
any2 = doublify any
all2 = doublify all
-- }}}
-- makes the most sense when ls and hs have the same length
instance Ix a => Ix [a] where
range = sequence . map range . uncurry zip
inRange = all2 inRange . uncurry zip
rangeSize = product . uncurry (zipWith (curry rangeSize))
index (ls, hs) xs = fst . foldr step (0, 1) $ zip indices sizes where
indices = zipWith index (zip ls hs) xs
sizes = map rangeSize $ zip ls hs
step (i, b) (s, p) = (s + p * i, p * b)
fold :: (Enum i, Ix i) => ([a] -> b) -> Int -> Array [i] a -> Array [i] b
fold f n a = A.array newBound assocs where
(oldLowBound, oldHighBound) = A.bounds a
(newLowBoundBeg , dimLow : newLowBoundEnd ) = splitAt n oldLowBound
(newHighBoundBeg, dimHigh: newHighBoundEnd) = splitAt n oldHighBound
assocs = [(beg ++ end, f [a A.! (beg ++ i : end) | i <- [dimLow..dimHigh]])
| beg <- range (newLowBoundBeg, newHighBoundBeg)
, end <- range (newLowBoundEnd, newHighBoundEnd)
]
newBound = (newLowBoundBeg ++ newLowBoundEnd, newHighBoundBeg ++ newHighBoundEnd)
flatten a = check a >> return value where
check = guard . (1==) . length . fst . A.bounds
value = A.ixmap ((head *** head) . A.bounds $ a) return a
elementWise :: (MonadPlus m, Ix i) => (a -> b -> c) -> Array i a -> Array i b -> m (Array i c)
elementWise f a b = check >> return value where
check = guard $ A.bounds a == A.bounds b
value = A.listArray (A.bounds a) (zipWith f (A.elems a) (A.elems b))
unsafeFlatten a = fromJust $ flatten a
unsafeElementWise f a b = fromJust $ elementWise f a b
matrixMult a b = fold sum 1 $ unsafeElementWise (*) a' b' where
aBounds = (join (***) (!!0)) $ A.bounds a
bBounds = (join (***) (!!1)) $ A.bounds b
a' = copy 2 bBounds a
b' = copy 0 aBounds b
bijection f g a = A.ixmap ((f *** f) . A.bounds $ a) g a
unFlatten = bijection return head
matrixTranspose = bijection reverse reverse
copy n (low, high) a = A.ixmap (newBounds a) (deleteAt n) a where
newBounds = (insertAt n low *** insertAt n high) . A.bounds
The cute bit here is matrixMult, which is one of the only operations that is specialized to two-dimensional arrays. It expands its first argument along one dimension (by putting a copy of the two-dimensional object into each slice of the three-dimensional object); expands its second along another; does pointwise multiplication (now in a three-dimensional array); then collapses the fabricated third dimension by summing. Quite nice.

Use vector to manipulate Chars instead of lists

I have the some code that compile and works. And then some that don't.
My concern was that the first version was soooo bloated that it crashed while running on too big arguments, so I wrote a second version with performance in mind.
The second version does't even compile. Please advice.
import System.Environment (getArgs)
import Data.List (nub)
import System.Random
import Control.Applicative ( (<$>) )
import Control.Monad (replicateM)
randomItem :: [a] -> IO a
randomItem xs = (xs!!) <$> randomRIO (0, length xs - 1)
genFromMask :: [String] -> IO String
genFromMask = mapM randomItem
genMeSome :: [String] -> Int -> IO [String]
genMeSome mask n = do
glist <- replicateM (n*10) (genFromMask mask)
return $ take n $ nub glist
writeIt :: FilePath -> Int -> [String] -> IO ()
writeIt fi n mask = do
glist <- genMeSome mask n
writeFile fi $ unlines glist
maj :: String
maj = ['A'..'Z']
numa :: String
numa = ['0'..'9']
-- | Certaines regions n'utilisent aucune des plages libres
genBra :: [String]
genBra = ["VWXYZ",maj,maj," ",numa,numa,numa,numa]
genAus :: [String]
genAus = [maj,maj,maj," ",numa,numa,numa]
main :: IO ()
main = do
args <- getArgs
case args of
(mo:fi:n:_) -> case mo of
"aus" -> writeIt fi (read n) genAus
"bra" -> writeIt fi (read n) genBra
_ -> error "country is not supported"
_ -> error "wrong input, format is: genLicensePlate country file number"
And here is the second:
import System.Environment (getArgs)
import System.Random
import Crypto.Random.AESCtr (makeSystem)
import Control.Applicative ( (<$>) )
import qualified Data.Vector as V
import qualified Data.Text as T
import qualified Data.Text.IO as T
nubV :: V.Vector a -> V.Vector a
nubV va
| V.null va = V.empty
| V.any (== headV) tailV = nubV tailV
| otherwise = headV `V.cons` nubV tailV
where
headV = V.head va
tailV = V.tail va
randomItem :: RandomGen g => g -> V.Vector a -> (a,g)
randomItem g xs =
(xs V.! fst shamble, snd shamble)
where
shamble = randomR (0, V.length xs - 1) g
genFromMask :: RandomGen g => g -> V.Vector (V.Vector a) -> V.Vector a
genFromMask g xs =
if V.null xs
then V.empty
else fst paket `V.cons` genFromMask (snd paket) (V.tail xs)
where
paket = randomItem g (V.head xs)
genMeSome :: RandomGen g => g -> V.Vector (V.Vector a) -> Int -> V.Vector (V.Vector a)
genMeSome g mask n =
V.take n $ nubV $ V.replicateM (n*10) (genFromMask g mask)
writeIt :: RandomGen g => g -> FilePath -> Int -> V.Vector (V.Vector a) -> IO ()
writeIt g fi n mask =
T.writeFile fi $ T.unlines $ T.pack $ V.toList (V.map V.toList $ genMeSome g mask n)
maj = V.fromList ['A'..'Z']
num a = V.fromList ['0'..'9']
vspa = V.fromList " "
vtir = V.fromList "-"
-- | Certaines regions n'utilisent aucune des plages libres
genBra = V.fromList [static,maj,maj,vspa,numa,numa,numa,numa]
where
static = V.fromList "VWXYZ"
genAus = V.fromList [maj,maj,maj,vspa,numa,numa,numa]
main :: IO ()
main = do
g <- makeSystem
args <- getArgs
case args of
(mo:fi:n:_) -> case mo of
"aus" -> writeIt g fi (read n) genAus
"bra" -> writeIt g fi (read n) genBra
_ -> error "country is not supported"
_ -> error "wrong input, format is: genLicensePlate country file number"
I am trying to generate fake licenses plates, to populate an anonymous database.
EDIT1:
Here are the errors:
genLicensePlate.hs:22:12:
No instance for (Eq a)
arising from a use of `=='
In the first argument of `V.any', namely `(== headV)
In the expression: V.any (== headV) tailV
In a stmt of a pattern guard for
an equation for `nubV':
V.any (== headV) tailV
genLicensePlate.hs:48:52:
Couldn't match expected type `Char' with actual type
Expected type: V.Vector Char
Actual type: V.Vector [a]
In the first argument of `V.toList', namely
`(V.map V.toList $ genMeSome g mask n)'
In the second argument of `($)', namely
`V.toList (V.map V.toList $ genMeSome g mask n)'
EDIT2:
So the general idea is to use a mask to generate random Strings.
Like myFunc g [['A'..'Z'],['A'..'Z']] gives AA or ZZ or BA or FG etc...
Then I use this function to make a lot of those strings based on the mask.
After that I removes duplicate and take as many as needed (since I generate 10 times the number asked even with duplicate I am OK).
Finaly I drop it on a file.
I hope it is more clear.
Kind regards,
Sar
nubV needs an Eq constraint, since it compares elements (but you really should use a Set or HashSet or so to get a better algorithm)
nubV :: Eq a => V.Vector a -> V.Vector a
nubV va
| V.null va = V.empty
| V.any (== headV) tailV = nubV tailV
| otherwise = headV `V.cons` nubV tailV
where
headV = V.head va
tailV = V.tail va
And in writeIt, you lack a map,
writeIt :: RandomGen g => g -> FilePath -> Int -> V.Vector (V.Vector a) -> IO ()
writeIt g fi n mask =
T.writeFile fi $ T.unlines $ map T.pack $ V.toList (V.map V.toList $ genMeSome g mask n)
-- ^^^
since you get a list of lists of Char from V.toList (V.map V.toList $ genMeSome g mask n).
That fixes the two reported errors.

Data Parallel Haskell Prefix Sum

I'm playing with some Data Parallel Haskell code and found myself in need of a prefix sum. However I didn't see any basic operator in the dph package for prefix sum.
I rolled my own, but, since I'm new to dph, I'm not sure if it's properly taking advantage of parallelization:
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}
module PrefixSum ( scanP ) where
import Data.Array.Parallel (lengthP, indexedP, mapP, zipWithP, concatP, filterP, singletonP, sliceP, (+:+), (!:))
import Data.Array.Parallel.Prelude.Int ((<=), (-), (==), Int, mod)
-- hide prelude
import qualified Prelude
-- assuming zipWithP (a -> b -> c) given
-- [:a:] of length n and
-- [:b:] of length m, n /= m
-- will return
-- [:c:] of length min n m
scanP :: (a -> a -> a) -> [:a:] -> [:a:]
scanP f xs = if lengthP xs <= 1
then xs
else head +:+ tail
where -- [: x_0, x_2, ..., x_2n :]
evens = mapP snd . filterP (even . fst) $ indexedP xs
-- [: x_1, x_3 ... :]
odds = mapP snd . filterP (odd . fst) $ indexedP xs
lenEvens = lengthP evens
lenOdds = lengthP odds
-- calculate the prefix sums [:w:] of the pair sums [:z:]
psums = scanP f $ zipWithP f evens odds
-- calculate the total prefix sums as
-- [: x_0, w_0, f w_0 x_2, w_1, f w_1 x_4, ...,
head = singletonP (evens !: 0)
body = concatP . zipWithP (\p e -> [: p, f p e :]) psums $ sliceP 1 lenOdds evens
-- ending at either
-- ... w_{n-1}, f w_{n-1} x_2n :]
-- or
-- ... w_{n-1}, f w_{n-1} x_2n, w_n :]
-- depending on whether the length of [:x:] is 2n+1 or 2n+2
tail = if lenEvens == lenOdds then body +:+ singletonP (psums !: (lenEvens - 1)) else body
-- reimplement some of Prelude so it can be vectorised
f $ x = f x
infixr 0 $
(.) f g y = f (g y)
snd (a,b) = b
fst (a,b) = a
even n = n `mod` 2 == 0
odd n = n `mod` 2 == 1
Parallel prefix scans are supported, in fact, they're rather fundamental. So just pass (+) as your associative operator.

Resources