Benchmarking Filter and Partition

Benchmarking Filter and Partition - haskell

I was testing the performance of the partition function for lists and got some strange results, I think.
We have that partition p xs == (filter p xs, filter (not . p) xs) but we chose the first implementation because it only performs a single traversal over the list. Yet, the results I got say that it maybe be better to use the implementation that uses two traversals.
Here is the minimal code that shows what I'm seeing
import Criterion.Main
import System.Random
import Data.List (partition)
mypartition :: (a -> Bool) -> [a] -> ([a],[a])
mypartition p l = (filter p l, filter (not . p) l)
randList :: RandomGen g => g -> Integer -> [Integer]
randList gen 0 = []
randList gen n = x:xs
where
(x, gen') = random gen
xs = randList gen' (n - 1)
main = do
gen <- getStdGen
let arg10000000 = randList gen 10000000
defaultMain [
bgroup "filters -- split list in half " [
bench "partition100" $ nf (partition (>= 50)) arg10000000
, bench "mypartition100" $ nf (mypartition (>= 50)) arg10000000
]
]
I ran the tests both with -O and without it and both times I get that the double traversals is better.
I am using ghc-7.10.3 with criterion-1.1.1.0
My questions are:
Is this expected?
Am I using Criterion correctly? I know that laziness can be tricky and (filter p xs, filter (not . p) xs) will only do two traversals if both elements of the tuple are used.
Does this has to do something with the way lists are handled in Haskell?
Thanks a lot!

There is no black or white answer to the question. To dissect the problem consider the following code:
import Control.DeepSeq
import Data.List (partition)
import System.Environment (getArgs)
mypartition :: (a -> Bool) -> [a] -> ([a],[a])
mypartition p l = (filter p l, filter (not . p) l)
main :: IO ()
main = do
let cnt = 10000000
xs = take cnt $ concat $ repeat [1 .. 100 :: Int]
args <- getArgs
putStrLn $ unwords $ "Args:" : args
case args of
[percent, fun]
-> let p = (read percent >=)
in case fun of
"partition" -> print $ rnf $ partition p xs
"mypartition" -> print $ rnf $ mypartition p xs
"partition-ds" -> deepseq xs $ print $ rnf $ partition p xs
"mypartition-ds" -> deepseq xs $ print $ rnf $ mypartition p xs
_ -> err
_ -> err
where
err = putStrLn "Sorry, I do not understand."
I do not use Criterion to have a better control about the order of evaluation. To get timings, I use the +RTS -s runtime option. The different test case are executed using different command line options. The first command line option defines for which percentage of the data the predicate holds. The second command line option chooses between different tests.
The tests distinguish two cases:
The data is generated lazily (2nd argument partition or mypartition).
The data is already fully evaluated in memory (2nd argument partition-ds or mypartition-ds).
The result of the partitioning is always evaluated from left to right, i.e. starting with the list that contains all the elements for which the predicate holds.
In case 1 partition has the advantage that elements of the first resulting list get discarded before all elements of the input list were even produced. Case 1 is especially good, if the predicate matches many elements, i.e. the first command line argument is large.
In case 2, partition cannot play out this advantage, since all elements are already in memory.
For mypartition, in any case all elements are held in memory after the first resulting list is evaluated, because they are needed again to compute the second resulting list. Therefore there is not much of a difference between the two cases.
It seems, the more memory is used, the harder garbage collection gets. Therefore partition is well suited, if the predicate matches many elements and the lazy variant is used.
Conversely, if the predicate does not match many elements or all elements are already in memory, mypartition performs better, since its recursion does not deal with pairs in contrast to partition.
The Stackoverflow question “Irrefutable pattern does not leak memory in recursion, but why?” might give some more insights about the handling of pairs in the recursion of partition.

Related

Memoization for subsequent fold calls

I'm looking for a technique that allows for memoization between subsequent fold calls against the lists that is being prepended.
I looked at memoize library but this doesn't seem to support memoization of higher-order functions, which is the case for folds.
I also tried the technique with lazy evaluated map of results but to no avail.
Here's simple example code:
module Main where
import Data.Time
printAndMeasureTime :: Show a => a -> IO ()
printAndMeasureTime a = do
startTime <- getCurrentTime
print a
stopTime <- getCurrentTime
putStrLn $ " in " ++ show (diffUTCTime stopTime startTime)
main = do
let as = replicate 10000000 1
printAndMeasureTime $ foldr (-) 0 as -- just to resolve thunks
printAndMeasureTime $ sum as
printAndMeasureTime $ sum (1:as) -- recomputed from scratch, could it reuse previous computation result?
printAndMeasureTime $ length (as)
printAndMeasureTime $ length (1:as) -- recomputed from scratch, could it reuse previous computation result?
and the output:
0
in 1.125098223s
10000000
in 0.096558168s
10000001
in 0.104047058s
10000000
in 0.037727126s
10000001
in 0.041266456s
Times suggest that folds are computed from scratch. Is there a way to make the subsequent folds reuse previous fold results?

Make a data type!
module List (List, _elements, _sum, _length, toList, cons) where
data List = List
{ _elements :: [Int]
, _sum :: !Int
, _length :: !Int
}
toList :: [Int] -> List
toList xs = List xs (sum xs) (length xs)
cons :: Int -> List -> List
cons x (List xs t n) = List (x:xs) (x+t) (1+n)
Note that the List type is exported, but the List constructor is not, so that the only way to construct a List is using the toList function (commonly called a "smart constructor").

Haskell parallel search with early abort

I'd like to search through a list, testing each element for property X and then return when an element with property X is found.
This list is very large and would benefit from parallelism, but the cost of the spark is rather high relative to the compute time. parListChunk would be great, but then it must search through the entire list.
Is there some way I can write something like parListChunk but with early abort?
This is the naive search code:
hasPropertyX :: Object -> Bool
anyObjectHasPropertyX :: [Object] -> Bool
anyObjectHasPropertyX [] = False
anyObjectHasPropertyX l
| hasPropertyX (head l) == True = True
| otherwise = anyObjectHasPropertyX (tail l)
and this is my first attempt at parallelism:
anyObjectHasPropertyXPar [] = False
anyObjectHasPropertyXPar [a] = hasPropertyX a
anyObjectHasPropertyXPar (a:b:rest) = runEval $ do c1 <- rpar (force (hasPropertyX a))
c2 <- rpar (force (hasPropertyX b))
rseq c1
rseq c2
if (c1 == True) || (c2 == True) then return True else return (anyObjectHasPropertyXPar rest)
This does run slightly faster than the naive code (even with -N1, oddly enough), but not by much (it helps a little by extending the number of parallel computations). I believe it's not benefitting much because it has to spark one thread for each element in the list.
Is there an approach similar to parListChunk that will only spark n threads and that allows for an early abort?
Edit: I'm having problems thinking about this because it seems that I would need to monitor the return value of all the threads. If I omit the rseq's and have something like
if (c1 == True) || (c2 == True) then ...
Is the runtime environment intelligent enough to monitor both threads and continue when either one of them returns?

I don't think you're going to have much luck using Control.Parallel.Strategies. A key feature of this module is that it expresses "deterministic parallelism" such that the result of the program is unaffected by the parallel evaluation. The problem you've described is fundamentally non-deterministic because threads are racing to find the first match.
Update: I see now that you're only returning True if the element is found, so the desired behavior is technically deterministic. So, perhaps there is a way to trick the Strategies module into working. Still, the implementation below seems to meet the requirements.
Here's an implementation of a parallel find parFind that runs in the IO monad using Control.Concurrent primitives and seems to do what you want. Two MVars are used: runningV keeps count of how many threads are still running to allow the last thread standing to detect search failure; and resultV is used to return Just the result or Nothing when search failure is detected by that last thread. Note that it is unlikely to perform better than a single-threaded implementation unless the test (your hasPropertyX above) is substantially more work than the list traversal, unlike this toy example.
import Control.Monad
import Control.Concurrent
import Data.List
import System.Environment
-- Thin a list to every `n`th element starting with index `i`
thin :: Int -> Int -> [a] -> [a]
thin i n = unfoldr step . drop i
where step [] = Nothing
step (y:ys) = Just (y, drop (n-1) ys)
-- Use `n` parallel threads to find first element of `xs` satisfying `f`
parFind :: Int -> (a -> Bool) -> [a] -> IO (Maybe a)
parFind n f xs = do
resultV <- newEmptyMVar
runningV <- newMVar n
comparisonsV <- newMVar 0
threads <- forM [0..n-1] $ \i -> forkIO $ do
case find f (thin i n xs) of
Just x -> void (tryPutMVar resultV (Just x))
Nothing -> do m <- takeMVar runningV
if m == 1
then void (tryPutMVar resultV Nothing)
else putMVar runningV (m-1)
result <- readMVar resultV
mapM_ killThread threads
return result
myList :: [Int]
myList = [1..1000000000]
-- Use `n` threads to find first element equal to `y` in `myList`
run :: Int -> Int -> IO ()
run n y = do x <- parFind n (== y) myList
print x
-- e.g., stack ghc -- -O2 -threaded SearchList.hs
-- time ./SearchList +RTS -N4 -RTS 4 12345 # find 12345 using 4 threads -> 0.018s
-- time ./SearchList +RTS -N4 -RTS 4 -1 # full search w/o match -> 6.7s
main :: IO ()
main = do [n,y] <- getArgs
run (read n) (read y)
Also, note that this version runs the threads on interleaved sublists rather than dividing the main list up into consecutive chunks. I did it this way because (1) it was easier to demonstrate that "early" elements were found quickly; and (2) my huge list means that memory usage can explode if the whole list needs to be kept in memory.
In fact, this example is a bit of a performance time bomb -- its memory usage is nondeterministic and can probably explode if one thread falls way behind so that a substantial portion of the whole list needs to be kept in memory.
In a real world example where the whole list is probably being kept in memory and the property test is expensive, you may find that breaking the list into chunks is faster.

Haskell bottom-up DP space leak

Apologies if this is too specific, I am new here and not exactly sure what is reasonable. I have been bashing my head against this problem for hours with nothing to show for it. The following code is my implementation of a competitive programming problem.
module Main where
import Data.List (foldl', groupBy)
import Debug.Trace
type Case = (Int, [(Int, Int)])
type Soln = Int
main = interact handle
handle :: String -> String
handle = fmt . solve . parse
fmt :: Soln -> String
fmt s = (show s) ++ "\n"
parse :: String -> Case
parse s = (l, fs)
where
(l:_:fs') = (map read) $ words s
fs = pairs fs'
pairs :: [a] -> [(a, a)]
pairs [] = []
pairs (a:b:s) = (a, b):(pairs s)
solve :: Case -> Soln
solve c#(l, fs) = last $ foldl' run [0..l] f
where
f = concat $ map rep $ map combine $ groupBy samev fs
samev a b = (snd a) == (snd b)
combine a = (sum $ map fst $ a, snd $ head $ a)
rep (n, v) = replicate (min n (l `div` v)) v
run :: [Int] -> Int -> [Int]
run b v = (take v b) ++ (zipWith min b (drop v b))
-- run b v = (take v b) ++ (zipMin b (drop v b))
zipMin :: [Int] -> [Int] -> [Int]
zipMin [] _ = []
zipMin _ [] = []
zipMin (a:as) (b:bs) = (min a b):(zipMin as bs)
The intent is that this works like a bottom-up dynamic programming solution generating each row of the DP table from the previous using the fold in solve. In theory GHC should be able to optimize out all the old rows of the table. However, running this program on a moderately large input with approximately l = 2000 and length f = 5000, I get this:
> time ./E < E.in
0
1.98user 0.12system 0:02.10elapsed 99%CPU (0avgtext+0avgdata 878488maxresident)k
0inputs+0outputs (0major+219024minor)pagefaults 0swaps
That's using 878 MB of memory at peak! The table I am generating is only 10,000 Ints, and in theory I only need one row at a time! It seems obvious that this is some form of thunk leak or other space leak. Profiling reveals that run is consuming 99% of total runtime and memory. Digging further indicated that 98% of that was in the zipWith call. Interestingly, replacing the call to zipWith min with my custom zipMin function produces a significant improvement:
> time ./E < E.in
0
1.39user 0.08system 0:01.48elapsed 99%CPU (0avgtext+0avgdata 531400maxresident)k
0inputs+0outputs (0major+132239minor)pagefaults 0swaps
That's just 70% the run time, and 60% the memory! I tried all sorts to make this work. I know (++) is generally a bad idea, so I replaced the lists in run with Data.Sequence.Seq Int... and it got slower and used more memory. I am not particularly experienced with Haskell, but I am at my wit's end here. I am sure the answer to this problem exists somehwere on SO, but I am too new to Haskell to be able to find it, it seems.
Any help any of you can offer is very much appreciated. I would love an explanation of exactly what I have done wrong, how to diagnose it in future, and how to fix it.
Thanks in advance.
EDIT:
After following Steven's Excellent advice and replacing my lists with unboxed vectors the performance is... uh... signficantly improved:
> time ./E < E.in
0
0.01user 0.00system 0:00.02elapsed 80%CPU (0avgtext+0avgdata 5000maxresident)k
24inputs+0outputs (0major+512minor)pagefaults 0swaps

So, by using foldl' you have ensured that the accumulator will be in WHNF. Putting a list in WHNF only evaluates the first element of the list. The remainder of the list exists as a thunk, and will be passed around as a thunk to the subsequent calls of your fold. Since you are interested in multiple positions in the list at once (that is, you are dropping some parts of it in the zipWith) large portions of the lists are being kept from previous iterations.
The structure you need here is an unboxed vector. An unboxed vector will ensure that everything is maximally strict, and will run in far less memory.

How to split a [String] in to [[String]] based on length

I'm trying to split a list of Strings in to a List of Lists of Strings
so like in the title [String] -> [[String]]
This has to be done based on length of characters, so that the Lists in the output are no longer than 10. So if input was length 20 this would be broken down in to 2 lists and if length 21 in to 3 lists.
I'm not sure what to use to do this, I don't even know how to brake down a list in to a list of lists never mind based on certain length.
For example if the limit was 5 and the input was:
["abc","cd","abcd","ab"]
The output would be:
[["abc","cd"],["abcd"],["ab"]]
I'd like to be pointed in the right direction and what methods to use, list comprehension? recursion?

Here's an intuitive solution:
import Data.List (foldl')
breakup :: Int -> [[a]] -> [[[a]]]
breakup size = foldl' accumulate [[]]
where accumulate broken l
| length l > size = error "Breakup size too small."
| sum (map length (last broken ++ [l])) <= size
= init broken ++ [last broken ++ [l]]
| otherwise = broken ++ [[l]]
Now, let's go through it line-by-line:
breakup :: Int -> [[a]] -> [[[a]]]
Since you hinted that you may want to generalize the function to accept different size limits, our type signature reflects this. We also generalize beyond [String] (that is, [[Char]]), since our problem is not specific to [[Char]], and could equally apply to any [[a]].
breakup size = foldl' accumulate [[]]
We're using a left fold because we want to transform a list, left-to-right, into our target, which will be a list of sub-lists. Even though we're not concerned with efficiency, we're using Data.List.foldl' instead of Prelude's own foldl because this is standard practice. You can read more about foldl vs. foldl' here.
Our folding function is called accumulate. It will consider a new item and decide whether to place it in the last-created sub-list or to start a new sub-list. To make that judgment, it uses the size we passed in. We start with an initial value of [[]], that is, a list with one empty sub-list.
Now the question is, how should you accumulate your target?
where accumulate broken l
We're using broken to refer to our constructed target so far, and l (for "list") to refer to the next item to process. We'll use guards for the different cases:
| length l > size = error "Breakup size too small."
We need to raise an error if the item surpasses the size limit on its own, since there's no way to place it in a sub-list that satisfies the size limit. (Alternatively, we could build a safe function by wrapping our return value in the Maybe monad, and that's something you should definitely try out on your own.)
| sum (map length (last broken ++ [l])) <= size
= init broken ++ [last broken ++ [l]]
The guard condition is sum (map length (last broken ++ [l])) <= size, and the return value for this guard is init broken ++ [last broken ++ [l]]. Translated into plain English, we might say, "If the item can fit in the last sub-list without going over the size limit, append it there."
| otherwise = broken ++ [[l]]
On the other hand, if there isn't enough "room" in the last sub-list for this item, we start a new sub-list, containing only this item. When the accumulate helper is applied to the next item in the input list, it will decide whether to place that item in this sub-list or start yet another sub-list, following the same logic.
There you have it. Don't forget to import Data.List (foldl') up at the top. As another answer points out, this is not a performant solution if you plan to process 100,000 strings. However, I believe this solution is easier to read and understand. In many cases, readability is the more important optimization.
Thanks for the fun question. Good luck with Haskell, and happy coding!

You can do something like this:
splitByLen :: Int -> [String] -> [[String]]
splitByLen n s = go (zip s $ scanl1 (+) $ map length s) 0
where go [] _ = []
go xs prev = let (lst, rest) = span (\ (x, c) -> c - prev <= n) xs
in (map fst lst) : go rest (snd $ last lst)
And then:
*Main> splitByLen 5 ["abc","cd","abcd","ab"]
[["abc","cd"],["abcd"],["ab"]]
In case there is a string longer than n, this function will fail. Now, what you want to do in those cases depends on your requirements and that was not specified in your question.
[Update]
As requested by #amar47shah, I made a benchmark comparing his solution (breakup) with mine (splitByLen):
import Data.List
import Data.Time.Clock
import Control.DeepSeq
import System.Random
main :: IO ()
main = do
s <- mapM (\ _ -> randomString 10) [1..10000]
test "breakup 10000" $ breakup 10 s
test "splitByLen 10000" $ splitByLen 10 s
putStrLn ""
r <- mapM (\ _ -> randomString 10) [1..100000]
test "breakup 100000" $ breakup 10 r
test "splitByLen 100000" $ splitByLen 10 r
test :: (NFData a) => String -> a -> IO ()
test s a = do time1 <- getCurrentTime
time2 <- a `deepseq` getCurrentTime
putStrLn $ s ++ ": " ++ show (diffUTCTime time2 time1)
randomString :: Int -> IO String
randomString n = do
l <- randomRIO (1,n)
mapM (\ _ -> randomRIO ('a', 'z')) [1..l]
Here are the results:
breakup 10000: 0.904012s
splitByLen 10000: 0.005966s
breakup 100000: 150.945322s
splitByLen 100000: 0.058658s

Here is another approach. It is clear from the problem that the result is a list of lists and we need a running length and an inner list to keep track of how much we have accumulated (We use foldl' with these two as input). We then describe what we want which is basically:
If the length of the current input string itself exceeds the input length, we ignore that string (you may change this if you want a different behavior).
If the new length after we have added the length of the current string is within our input length, we add it to the current result list.
If the new length exceeds the input length, we add the result so far to the output and start a new result list.
chunks len = reverse . map reverse . snd . foldl' f (0, [[]]) where
f (resSoFar#(lenSoFar, (currRes: acc)) curr
| currLength > len = resSoFar -- ignore
| newLen <= len = (newLen, (curr: currRes):acc)
| otherwise = (currLength, [curr]:currRes:acc)
where
newLen = lenSoFar + currLength
currLength = length curr
Every time we add a result to the output list, we add it to the front hence we need reverse . map reverse at the end.
> chunks 5 ["abc","cd","abcd","ab"]
[["abc","cd"],["abcd"],["ab"]]
> chunks 5 ["abc","cd","abcdef","ab"]
[["abc","cd"],["ab"]]

Here is an elementary approach. First, the type String doesn't matter, so we can define our function in terms of a general type a:
breakup :: [a] -> [[a]]
I'll illustrate with a limit of 3 instead of 10. It'll be obvious how to implement it with another limit.
The first pattern will handle lists which are of size >= 3 and the the second pattern handles all of the other cases:
breakup (a1 : a2 : a3 : as) = [a1, a2, a3] : breakup as
breakup as = [ as ]
It is important to have the patterns in this order. That way the second pattern will only be used when the first pattern does not match, i.e. when there are less than 3 elements in the list.
Examples of running this on some inputs:
breakup [1..5] -> [ [1,2,3], [4,5] ]
breakup [1..4] -> [ [1,2,3], [4] ]
breakup [1..2] -> [ [1,2] ]
breakup [1..3] -> [ [1,2,3], [] ]
We see these is an extra [] when we run the function on [1..3]. Fortunately this is easy to fix by inserting another rule before the last one:
breakup [] = []
The complete definition is:
breakup :: [a] -> [[a]]
breakup [] = []
breakup (a1 : a2 : a3 : as) = [a1, a2, a3] : breakup as
breakup as = [ as ]

Haskell: how to operate the string type in a do block

I want to make a function that firstly divides a list l to two list m and n. Then create two thread to find out the longest palindrome in the two list. My code is :
import Control.Concurrent (forkIO)
import System.Environment (getArgs)
import Data.List
import Data.Ord
main = do
l <- getArgs
forkIO $ putStrLn $ show $ longestPalindr $ mList l
forkIO $ putStrLn $ show $ longestPalindr $ nList l
longestPalindr x =
snd $ last $ sort $
map (\l -> (length l, l)) $
map head $ group $ sort $
filter (\y -> y == reverse y) $
concatMap inits $ tails x
mList l = take (length l `div` 2) l
nList l = drop (length l `div` 2) l
Now I can compile it, but the result is a [ ]. When I just run the longestPalindr and mList , I get the right result. I thought the logic here is right. So what is the problem?

The question title may need to be changed, as this is no longer about type errors.
The functionality of the program can be fixed by simply mapping longestPalindr across the two halves of the list. In your code, you are finding the longest palindrome across [[Char]], so the result length is usually just 1.
I've given a simple example of par and pseq. This just suggests to the compiler that it may be smart to evaluate left and right independently. It doesn't guarantee parallel evaluation, but rather leaves it up to the compiler to decide.
Consult Parallel Haskell on the wiki to understand sparks, compile with the -threaded flag, then run it with +RTS -N2. Add -stderr for profiling, and see if there is any benefit to sparking here. I would expect negative returns until you start to feed it longer lists.
For further reading on functional parallelism, take a look at Control.Parallel.Strategies. Manually wrangling threads in Haskell is only really needed in nondeterministic scenarios.
import Control.Parallel (par, pseq)
import System.Environment (getArgs)
import Data.List
import Data.Ord
import Control.Function (on)
main = do
l <- getArgs
let left = map longestPalindr (mList l)
right = map longestPalindr (nList l)
left `par` right `pseq` print $ longest (left ++ right)
longestPalindr x = longest pals
where pals = nub $ filter (\y -> y == reverse y) substrings
substrings = concatMap inits $ tails x
longest = maximumBy (compare `on` length)
mList l = take (length l `div` 2) l
nList l = drop (length l `div` 2) l

For reference, please read the Parallelchapter from Simon Marlow's book.
http://chimera.labs.oreilly.com/books/1230000000929/ch02.html#sec_par-eval-whnf
As others have stated, using par from the Eval monad seems to be the correct approach here.
Here is a simplified view of your problem. You can test it out by compiling with +RTS -threaded -RTSand then you can use Thread Scope to profile your performance.
import Control.Parallel.Strategies
import Data.List (maximumBy, subsequences)
import Data.Ord
isPalindrome :: Eq a => [a] -> Bool
isPalindrome xs = xs == reverse xs
-- * note while subsequences is correct, it is asymptotically
-- inefficient due to nested foldr calls
getLongestPalindrome :: Ord a => [a] -> Int
getLongestPalindrome = length . maximum' . filter isPalindrome . subsequences
where maximum' :: Ord a => [[a]] -> [a]
maximum' = maximumBy $ comparing length
--- Do it in parallel, in a monad
-- rpar rpar seems to fit your case, according to Simon Marlow's book
-- http://chimera.labs.oreilly.com/books/1230000000929/ch02.html#sec_par-eval-whnf
main :: IO ()
main = do
let shorter = [2,3,4,5,4,3,2]
longer = [1,2,3,4,5,4,3,2,1]
result = runEval $ do
a <- rpar $ getLongestPalindrome shorter
b <- rpar $ getLongestPalindrome longer
if a > b -- 'a > b' will always be false in this case
then return (a,"shorter")
else return (b,"longer")
print result
-- This will print the length of the longest palindrome along w/ the list name
-- Don't forget to compile w/ -threaded and use ThreadScope to check
-- performance and evaluation

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Benchmarking Filter and Partition - haskell

Related

Memoization for subsequent fold calls

Haskell parallel search with early abort

Haskell bottom-up DP space leak

How to split a [String] in to [[String]] based on length

Haskell: how to operate the string type in a do block

Categories

Resources