I'm looking for a technique that allows for memoization between subsequent fold calls against the lists that is being prepended.
I looked at memoize library but this doesn't seem to support memoization of higher-order functions, which is the case for folds.
I also tried the technique with lazy evaluated map of results but to no avail.
Here's simple example code:
module Main where
import Data.Time
printAndMeasureTime :: Show a => a -> IO ()
printAndMeasureTime a = do
startTime <- getCurrentTime
print a
stopTime <- getCurrentTime
putStrLn $ " in " ++ show (diffUTCTime stopTime startTime)
main = do
let as = replicate 10000000 1
printAndMeasureTime $ foldr (-) 0 as -- just to resolve thunks
printAndMeasureTime $ sum as
printAndMeasureTime $ sum (1:as) -- recomputed from scratch, could it reuse previous computation result?
printAndMeasureTime $ length (as)
printAndMeasureTime $ length (1:as) -- recomputed from scratch, could it reuse previous computation result?
and the output:
0
in 1.125098223s
10000000
in 0.096558168s
10000001
in 0.104047058s
10000000
in 0.037727126s
10000001
in 0.041266456s
Times suggest that folds are computed from scratch. Is there a way to make the subsequent folds reuse previous fold results?
Make a data type!
module List (List, _elements, _sum, _length, toList, cons) where
data List = List
{ _elements :: [Int]
, _sum :: !Int
, _length :: !Int
}
toList :: [Int] -> List
toList xs = List xs (sum xs) (length xs)
cons :: Int -> List -> List
cons x (List xs t n) = List (x:xs) (x+t) (1+n)
Note that the List type is exported, but the List constructor is not, so that the only way to construct a List is using the toList function (commonly called a "smart constructor").
Related
I was testing the performance of the partition function for lists and got some strange results, I think.
We have that partition p xs == (filter p xs, filter (not . p) xs) but we chose the first implementation because it only performs a single traversal over the list. Yet, the results I got say that it maybe be better to use the implementation that uses two traversals.
Here is the minimal code that shows what I'm seeing
import Criterion.Main
import System.Random
import Data.List (partition)
mypartition :: (a -> Bool) -> [a] -> ([a],[a])
mypartition p l = (filter p l, filter (not . p) l)
randList :: RandomGen g => g -> Integer -> [Integer]
randList gen 0 = []
randList gen n = x:xs
where
(x, gen') = random gen
xs = randList gen' (n - 1)
main = do
gen <- getStdGen
let arg10000000 = randList gen 10000000
defaultMain [
bgroup "filters -- split list in half " [
bench "partition100" $ nf (partition (>= 50)) arg10000000
, bench "mypartition100" $ nf (mypartition (>= 50)) arg10000000
]
]
I ran the tests both with -O and without it and both times I get that the double traversals is better.
I am using ghc-7.10.3 with criterion-1.1.1.0
My questions are:
Is this expected?
Am I using Criterion correctly? I know that laziness can be tricky and (filter p xs, filter (not . p) xs) will only do two traversals if both elements of the tuple are used.
Does this has to do something with the way lists are handled in Haskell?
Thanks a lot!
There is no black or white answer to the question. To dissect the problem consider the following code:
import Control.DeepSeq
import Data.List (partition)
import System.Environment (getArgs)
mypartition :: (a -> Bool) -> [a] -> ([a],[a])
mypartition p l = (filter p l, filter (not . p) l)
main :: IO ()
main = do
let cnt = 10000000
xs = take cnt $ concat $ repeat [1 .. 100 :: Int]
args <- getArgs
putStrLn $ unwords $ "Args:" : args
case args of
[percent, fun]
-> let p = (read percent >=)
in case fun of
"partition" -> print $ rnf $ partition p xs
"mypartition" -> print $ rnf $ mypartition p xs
"partition-ds" -> deepseq xs $ print $ rnf $ partition p xs
"mypartition-ds" -> deepseq xs $ print $ rnf $ mypartition p xs
_ -> err
_ -> err
where
err = putStrLn "Sorry, I do not understand."
I do not use Criterion to have a better control about the order of evaluation. To get timings, I use the +RTS -s runtime option. The different test case are executed using different command line options. The first command line option defines for which percentage of the data the predicate holds. The second command line option chooses between different tests.
The tests distinguish two cases:
The data is generated lazily (2nd argument partition or mypartition).
The data is already fully evaluated in memory (2nd argument partition-ds or mypartition-ds).
The result of the partitioning is always evaluated from left to right, i.e. starting with the list that contains all the elements for which the predicate holds.
In case 1 partition has the advantage that elements of the first resulting list get discarded before all elements of the input list were even produced. Case 1 is especially good, if the predicate matches many elements, i.e. the first command line argument is large.
In case 2, partition cannot play out this advantage, since all elements are already in memory.
For mypartition, in any case all elements are held in memory after the first resulting list is evaluated, because they are needed again to compute the second resulting list. Therefore there is not much of a difference between the two cases.
It seems, the more memory is used, the harder garbage collection gets. Therefore partition is well suited, if the predicate matches many elements and the lazy variant is used.
Conversely, if the predicate does not match many elements or all elements are already in memory, mypartition performs better, since its recursion does not deal with pairs in contrast to partition.
The Stackoverflow question “Irrefutable pattern does not leak memory in recursion, but why?” might give some more insights about the handling of pairs in the recursion of partition.
I'm new to haskell and I've been trying to figure out this problem for a while now. I want to generate a list of a random size using randomRIO and have that list be populated with random numbers using randomIO. I have tried to approach this problem by creating a function which takes in the randomly generated from randomRIO like so:
x <- randomRIO(1,5)
let y = randList x []
the function itself is something like this:
randList :: Int -> IO [Int] -> IO [Int]
randList 0 xs = return [xs]
randList g xs = do
t <- randomIO
let a = t:randList (g-1) xs
return [a]
I'm not sure how to handle the monad IO in a recursive function but this is how I'm thinking of it. Any help is appreciated thanks!
You can use replicateM to repeatedly execute randomIO, generating a new number each time. You call randomRIO once up front to decide on the length of the list:
import Control.Monad (replicateM)
import System.Random (randomIO, randomRIO)
randList :: IO [Int]
randList = do
len <- randomRIO (1,5)
replicateM len randomIO
Now, your definition wasn't actually very far off. A couple of things though:
Given that you expect to be able to call randList x [], the second argument of randList clearly is a plain list. Not an IO action of some sort. So your type should be
randList :: Int -> [Int] -> IO [Int]
In your first pattern match
randList 0 xs = return [xs]
Remember that xs is already a list. So when you do return [xs] you will get an IO [[Int]], a list of lists. What you want here is a plain return xs.
In your second definition
randList g xs = do
t <- randomIO
let a = t:randList (g-1) xs
return [a]
The expression t:randList ... makes no sense. The right-hand side of : must be a list. randList does not yield a list though, it yields an IO action. What you actually want to do is to treat the second argument of randList (the list) as an "accumulator", which is gradually built up. So you want to generate a number, add it to the accumulator, and then recurse with g decreased by one:
randList g xs = do
t <- randomIO
randList (g-1) (t:xs)
I try to write a function that takes a list and give permutation of it. In ghci I can do something like this:
>let xs=[1..10]
>ys <- shuffleM xs
both xs and ys have the type [Integer] and ys is indeed a permutation of xs. I want to get the same effect in a programme, because after shuffling I need to use ys further. How can it be done?
You could do
main :: IO ()
main = do
let xs = [1..10]
ys <- shuffleM xs
print $ doSomething ys
doSomething :: [Integer] -> Integer
doSomething = sum
For example. I'm not sure where shuffleM comes from, but if it's from the random-shuffle library then you just have to use it inside a monad that implements MonadRandom, which includes IO. Then you can process the shuffled list just like you would any other list.
Simulating a lottery of 6 numbers chosen from 40, I want to create a list of numbers in Haskell using the system random generator but eliminate duplicates, which often arise.
If I have the following:
import System.Random
main :: IO ()
main = do
rs <- forM [1..6] $ \_x -> randomRIO (1, 40) :: (IO Int)
print rs
this is halfway. But how do I filter out duplicates? It seems to me I need a while loop of some sort to construct a list filtering elements that are already in the list until the list is the required size. If I can generate an infinite list of random numbers and filter it inside the IO monad I am sure that would work, but I do not know how to approach this. It seems while loops are generally deprecated in Haskell, so I am uncertain of the true Haskeller's way here. Is this a legitimate use case for a while loop, and if so, how does one do that?
The function you are looking for is nub from Data.List, to filter dublicates.
import Data.List
import System.Random
main = do
g <- newStdGen
print . take 6 . nub $ (randomRs (1,40) g :: [Int])
If you don't mind using a library, then install the random-shuffle package and use it like this:
import System.Random.Shuffle
import Control.Monad.Random
main1 = do
perm <- evalRandIO $ shuffleM [1..10]
print perm
If you want to see how to implement a naive Fischer-Yates shuffle using lists in Haskell, have a look at this code:
shuffle2 xs g = go [] g (length xs) xs
where
go perm g n avail
| n == 0 = (perm,g)
| otherwise = let (i, g') = randomR (0,n-1) g
a = avail !! i
-- can also use splitAt to define avail':
avail' = take i avail ++ drop (i+1) avail
in go (a:perm) g' (n-1) avail'
main = do
perm <- evalRandIO $ liftRand $ shuffle2 [1..10]
print perm
The parameters to the go helper function are:
perm - the constructed permutation so far
g - the current generator value
n - the length of the available items
avail - the available items - i.e. items not yet selected to be part of the permutation
go simply adds a random element from avail to the permutation being constructed and recursively calls itself with the new avail list and new generator.
To only draw k random elements from xs, just start go at k instead of length xs:
shuffle2 xs k g = go [] g k xs
...
You could also use a temporary array (in the ST or IO monad) to implement a Fischer-Yates type algorithm. The shuffleM function in random-shuffle uses a yet completely different approach which you might find interesting.
Update: Here is an example of using an ST-array in a F-Y style algorithm:
import Control.Monad.Random
import Data.Array.ST
import Control.Monad
import Control.Monad.ST (runST, ST)
shuffle3 :: RandomGen g => Int -> g -> ([Int], g)
shuffle3 n g0 = runST $ do
arr <- newListArray (1,n) [1..n] :: ST s (STUArray s Int Int)
let step g i = do let (j,g') = randomR (1,n) g
-- swap i and j
a <- readArray arr i
b <- readArray arr j
writeArray arr j a
writeArray arr i b
return g'
g' <- foldM step g0 [1..n]
perm <- getElems arr
return (perm, g')
main = do
perm <- evalRandIO $ liftRand $ shuffle3 20
print perm
I've used the Fisher Yates Shuffle in C++ with a decent random number generator to great success. This approach is very efficient if you are willing to allocate an array for holding numbers 1 to 40.
Going the strict IO way requires to break down nub, bringing the condition into the tail recursion.
import System.Random
randsf :: (Eq a, Num a, Random a) => [a] -> IO [a]
randsf rs
| length rs > 6 = return rs
| otherwise = do
r <- randomRIO (1,40)
if elem r rs
then randsf rs
else randsf (r:rs)
main = do
rs <- randsf [] :: IO [Int]
print rs
If you know what you do unsafeInterleaveIO from System.IO.Unsafe can be handy, allowing you to generate lazy lists from IO. Functions like getContents work this way.
import Control.Monad
import System.Random
import System.IO.Unsafe
import Data.List
rands :: (Eq a, Num a, Random a) => IO [a]
rands = do
r <- randomRIO (1,40)
unsafeInterleaveIO $ liftM (r:) rands
main = do
rs <- rands :: IO [Int]
print . take 6 $ nub rs
You commented:
The goal is to learn how to build a list monadically using filtering. It's a raw newbie question
Maybe you should change the question title then! Anyways, this is quite a common task. I usually define a combinator with a general monadic type that does what I want, give it a descriptive name (which I didn't quite succeed in here :-) and then use it, like below
import Control.Monad
import System.Random
-- | 'myUntil': accumulate a list with unique action results until the list
-- satisfies a test
myUntil :: (Monad m, Eq a) => ([a] -> Bool) -> m a -> m [a]
myUntil test action = myUntil' test [] action where
myUntil' test resultSoFar action = do
if test resultSoFar then
return resultSoFar
else do
x <- action
let newResults = if x `elem` resultSoFar then resultSoFar
else resultSoFar ++ [x] -- x:resultSoFar
myUntil' test newResults action
main :: IO ()
main = do
let enough xs = length xs == 6
drawNumber = randomRIO (0, 40::Int)
numbers <- myUntil enough drawNumber
print numbers
NB: this is not the optimal way to get your 6 distinct numbers, but meant as an example how to, in general, build a list monadically using a filter that works on the entire list
It is in essence the same as Vektorweg's longest answer, but uses a combinator with a much more general type (which is the way I like to do it, which may be more useful for you, given your comment at the top of this answer)
I'm totally new to Haskell so apologies if the question is silly.
What I want to do is recursively build a list while at the same time building up an accumulated value based on the recursive calls. This is for a problem I'm doing for a Coursera course, so I won't post the exact problem but something analogous.
Say for example I wanted to take a list of ints and double each one (ignoring for the purpose of the example that I could just use map), but I also wanted to count up how many times the number '5' appears in the list.
So to do the doubling I could do this:
foo [] = []
foo (x:xs) = x * 2 : foo xs
So far so easy. But how can I also maintain a count of how many times x is a five? The best solution I've got is to use an explicit accumulator like this, which I don't like as it reverses the list, so you need to do a reverse at the end:
foo total acc [] = (total, reverse acc)
foo total acc (x:xs) = foo (if x == 5 then total + 1 else total) (x*2 : acc) xs
But I feel like this should be able to be handled nicer by the State monad, which I haven't used before, but when I try to construct a function that will fit the pattern I've seen I get stuck because of the recursive call to foo. Is there a nicer way to do this?
EDIT: I need this to work for very long lists, so any recursive calls need to be tail-recursive too. (The example I have here manages to be tail-recursive thanks to Haskell's 'tail recursion modulo cons').
Using State monad it can be something like:
foo :: [Int] -> State Int [Int]
foo [] = return []
foo (x:xs) = do
i <- get
put $ if x==5 then (i+1) else i
r <- foo xs
return $ (x*2):r
main = do
let (lst,count) = runState (foo [1,2,5,6,5,5]) 0 in
putStr $ show count
This is a simple fold
foo :: [Integer] -> ([Integer], Int)
foo [] = ([], 0)
foo (x : xs) = let (rs, n) = foo xs
in (2 * x : rs, if x == 5 then n + 1 else n)
or expressed using foldr
foo' :: [Integer] -> ([Integer], Int)
foo' = foldr f ([], 0)
where
f x (rs, n) = (2 * x : rs, if x == 5 then n + 1 else n)
The accumulated value is a pair of both the operations.
Notes:
Have a look at Beautiful folding. It shows a nice way how to make such computations composable.
You can use State for the same thing as well, by viewing each element as a stateful computation. This is a bit overkill, but certainly possible. In fact, any fold can be expressed as a sequence of State computations:
import Control.Monad
import Control.Monad.State
-- I used a slightly non-standard signature for a left fold
-- for simplicity.
foldl' :: (b -> a -> a) -> a -> [b] -> a
foldl' f z xs = execState (mapM_ (modify . f) xs) z
Function mapM_ first maps each element of xs to a stateful computation by modify . f :: b -> State a (). Then it combines a list of such computations into one of type State a () (it discards the results of the monadic computations, just keeps the effects). Finally we run this stateful computation on z.