Why is MergeSort in Haskell faster when implemented with foldl'? - haskell

I have implemented two versions of Merge Sort in Haskell like follows:
mergeSort1 :: (Ord a) => [a] -> [a]
mergeSort1 xs = foldl' (\acc x -> merge [x] acc) [] xs
and
mergeSort2 :: (Ord a) => [a] -> [a]
mergeSort2 [] = []
mergeSort2 (x:[]) = [x]
mergeSort2 xs = (mergeSort2 $ fst halves) `merge` (mergeSort2 $ snd halves)
where halves = splitList xs
where 'merge' and 'splitList' are implemented as follows:
merge :: (Ord a) => [a] -> [a] -> [a]
merge [] [] = []
merge xs [] = xs
merge [] ys = ys
merge all_x#(x:xs) all_y#(y:ys)
| x < y = x:merge xs all_y
| otherwise = y:merge all_x ys
splitList :: [a] -> ([a], [a])
splitList zs = go zs [] [] where
go [] xs ys = (xs, ys)
go [x] xs ys = (x:xs, ys)
go (x:y:zs) xs ys = go zs (x:xs) (y:ys)
Doing last $ mergeSort2 [1000000,999999..0] in ghci results in showing the number 1000000 after more than a minute of processing, while doing last $ mergeSort1 [1000000,999999..0] results in showing the last element only after 5 seconds.
I can understand why mergeSort1 uses much less memory than mergeSort2 because of the tail-recursiveness of foldl' and so.
What I can't understand is why mergeSort1 is faster than mergeSort2 by such a big difference ?
Could it be that splitList is the bottleneck in mergeSort2, generating two new lists every call?

As is,
mergeSort2 :: (Ord a) => [a] -> [a]
mergeSort2 xs = (mergeSort2 $ fst halves) `merge` (mergeSort2 $ snd halves)
where halves = splitList xs
is an infinite recursion, since you haven't given a base case (you need to specify the result for lists of length < 2). After that is fixed, mergeSort2 is still relatively slow due to the splitList which requires a complete traversal in each step and builds two new lists, not allowing to process anything before that is completed. A simple
splitList zs = splitAt h zs where h = length zs `quot` 2
does much better.
Your mergeSort1, however, is not a merge sort at all, it is an insertion sort.
mergeSort1 :: (Ord a) => [a] -> [a]
mergeSort1 xs = foldl' (\acc x -> merge [x] acc) [] xs
That does particularly well on reverse-sorted input, but if you give it sorted or random input, it scales quadratically.
So mergeSort1 was faster because you gave it optimal input, where it finishes in linear time.

Related

Haskell - Non-exhaustive pattern for a reason I don't understand

So I'm trying to write a function that, given two lists of integers, adds the ith even number of each list and returns them in another list. In case one of the list doesn't have an ith even number, a 0 is considered. For example, if the lists are [1,2,1,4,6] and [2,2], it returns [4,6,6] ([2+2,4+2,6+0]). I have the following code:
addEven :: [Int] -> [Int] -> [Int]
addEeven [] [] = []
addEeven (x:xs) [] = filter (\g -> g `mod`2 == 0) (x:xs)
addEven [] (y:ys) = filter (\g -> g `mod` 2 == 0) (y:ys)
addEven (x:xs) (y:ys) = (a + b):(addEven as bs)
where
(a:as) = filter (\g -> g `mod` 2 == 0) (x:xs)
(b:bs) = filter (\g -> g `mod` 2 == 0) (y:ys)
When I run that with the previous example, I get:
[4,6*** Exception: ex.hs:(4,1)-(8,101): Non-exhaustive patterns in function addEven
I really can't see what I'm missing, since it doesn't work with any input I throw at it.
A filter might eliminate elements, hence filter (\g -> gmod2 == 0) is not said to return any elements, and thus the patterns (a:as) and (b:bs) might fail.
That being said, I think you make the problem too complex here. You can first define a helper function that adds two elements of a list:
addList :: Num a => [a] -> [a] -> [a]
addList (x:xs) (y:ys) = (x+y) : addList xs ys
addList xs [] = xs
addList [] ys = ys
Then we do the filter on the two parameters, and make a function addEven that looks like:
addEven :: Integral a => [a] -> [a] -> [a]
addEven xs ys = addList (filter even xs) (filter even ys)
or with on :: (b -> b -> c) -> (a -> b) -> a -> a -> c:
import Data.Function(on)
addEven :: Integral a => [a] -> [a] -> [a]
addEven = addList `on` filter even
While using filter is very instinctive in this case, perhaps using filter twice and then summing up the results might be slightly ineffficient for large lists. Why don't we do the job all at once for a change..?
addMatches :: [Int] -> [Int] -> [Int]
addMatches [] [] = []
addMatches [] ys = filter even ys
addMatches xs [] = filter even xs
addMatches xs ys = first [] xs ys
where
first :: [Int] -> [Int] -> [Int] -> [Int]
first rs [] ys = rs ++ filter even ys
first rs (x:xs) ys = rs ++ if even x then second [x] xs ys
else first [] xs ys
second :: [Int] -> [Int] -> [Int] -> [Int]
second [r] xs [] = [r] ++ filter even xs
second [r] xs (y:ys) = if even y then first [r+y] xs ys
else second [r] xs ys
λ> addMatches [1,2,1,4,6] [2,2]
[4,6,6]

Haskell - Weave two lists together in chunks of n size?

I am practicing some Haskell exam paper questions, and have come across the following
Define a Haskell function weaveHunks which takes an int and
two lists and weaves them together in hunks of the given size.
Be sure to declare its type signature.
Example:
weaveHunks 3 "abcdefghijklmno" "ABCDEFGHIJKLMNO"
=> "abcABCdefDEFghiGHIjklJKLmnoMNO"
I have found the following on Stack Overflow, which is just too weave two lists together but only in chunks of 1
weaveHunks :: [a] -> [a] -> [a]
weaveHunks xs [] = xs
weaveHunks [] ys = ys
weaveHunks (x:xs) (y:ys) = x : y : weaveHunks xs ys
I am having problems adjusting this to take chunks fo n size, I am very new to Haskell but this is what I have so far
weaveHunks :: Int -> [a] -> [a] -> [a]
weaveHunks n xs [] = xs
weaveHunks n [] ys = ys
weaveHunks n xs ys = (take n xs) : (take n ys) : weaveHunks n (drop n xs) (drop n ys)
I am getting an error on the last line
(Couldn't match type a' with[a]')
Is (drop n xs) not a list?
You're very close!
By using the : operator to prepend the hunks, you're expressing that take n xs is one element of the result list, take n ys the next, and so on. But actually in both cases it's multiple elements you're prepending. That's the [a] that should actually be just a.
The solution is to use the ++ operator instead, which prepends an entire list rather than just a single element.
This is the full solution as I'd write it:
weaveHunks :: Int -> [a] -> [a] -> [a]
weaveHunks _ xs [] = xs
weaveHunks _ [] ys = ys
weaveHunks n xs ys = xHunk ++ yHunk ++ weaveHunks n xRemain yRemain
where [(xHunk, xRemain), (yHunk, yRemain)] = splitAt n <$> [xs,ys]
As #leftaroundabout said, since your appending lists of type [a], you need to use ++ instead of :. With this in mind, your code would then look like this:
weaveHunks :: Int -> [a] -> [a] -> [a]
weaveHunks _ xs [] = xs
weaveHunks _ [] ys = ys
weaveHunks n xs ys = (take n xs) ++ (take n ys) ++ weaveHunks n (drop n xs) (drop n ys)
If your interested, you can also use library functions to do this task:
import Data.List.Split
weaveHunks :: Int -> [a] -> [a] -> [a]
weaveHunks n xs ys = concat $ zipWith (++) (chunksOf n xs) (chunksOf n ys)
Note: chunksOf is from Data.List.Split, which splits the list into sublists of length n, so the type of this function is Int -> [a] -> [[a]]. zipWith zips two lists based on a condition, in this case concatenation ++. concat turns a list of [[a]] into [a].

Sorting a list of lists in Haskell

I want to write a function that takes a list of sorted lists, then merges everything together and sorts them again.
I managed to write this so far:
merge_:: Ord a => [[a]] -> [a] --takes in the list and merges it
merge_ [] = []
merge_ (x:xs) = x ++ merge_ xs
isort:: Ord a => [a] -> [a] --Sorts a list
isort [] = []
isort (a:x) = ins a (isort x)
where
ins a [] = [a]
ins a (b:y) | a<= b = a:(b:y)
| otherwise = b: (ins a y)
I haven't been able to find a way to combine these two in one function in a way that makes sense. Note that I'm not allowed to use things such as ('.', '$'..etc) (homework)
We start simple. How do we merge two sorted lists?
mergeTwo :: Ord a => [a] -> [a] -> [a]
mergeTwo [] ys = ys
mergeTwo xs [] = xs
mergeTwo (x:xs) (y:ys)
| x <= y = x : mergeTwo xs (y:ys)
| otherwise = y : mergeTwo (x:xs) ys
How do we merge multiple? Well, we start with the first and the second and merge them together. Then we merge the new one and the third together:
mergeAll :: Ord a => [[a]] -> [a]
mergeAll (x:y:xs) = mergeAll ((mergeTwo x y) : xs)
mergeAll [x] = x
mergeAll _ = []
Allright. Now, to sort all elements, we need to create a list from every element, and then merge them back. Let's write a function that creates a list for a single item:
toList :: a -> [a]
toList x = -- exercise
And now a function to wrap all elements in lists:
allToList :: [a] -> [[a]]
allToList = -- exercise
And now we're done. We simply need to use allToList and then mergeAll:
isort :: Ord a => [a] -> [a]
isort xs = mergeAll (allToList xs)
Note that this exercise got a lot easier since we've split it into four functions.
Exercises (which might not be possible for you(r homework))
Write toList and allToList.
Try a list comprehension for allToList. Try a higher order function for allToList.
Write isort point-free (with (.)).
Check whether there is already a toList function with the same type. Use that one.
Rewrite mergeAll using foldr
Try this (not tested):
merge :: Ord a => [a] -> [a] -> [a]
merge [] l1 = l1
merge l1 [] = l1
merge (e1:l1) (e2:l2)
| e1<e2 = e1:merge l1 (e2:l2)
| otherwise = e2:merge (e1:l1) l2

Haskell Merge Sort

This is an implementation of Mergesort using higher order functions,guards,where and recursion.
However getting an error from compiler 6:26: parse error on input ‘=’
mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
mergeSort merge xs
| length xs < 2 = xs
| otherwise = merge (mergeSort merge first) (mergeSort merge second)
where first = take half xs
second = drop half xs
half = (length xs) `div` 2
I can't see whats wrong? or rather I don't understand the compiler.
Halving a list is not an O(1) operation but O(n), so the given solutions introduce additional costs compared to the imperative version of merge sort. One way to avoid halving is to simply start merging directly by making singletons and then merging every two consecutive lists:
sort :: (Ord a) => [a] -> [a]
sort = mergeAll . map (:[])
where
mergeAll [] = []
mergeAll [t] = t
mergeAll xs = mergeAll (mergePairs xs)
mergePairs (x:y:xs) = merge x y:mergePairs xs
mergePairs xs = xs
where merge is already given by others.
Another msort implementation in Haskell;
merge :: Ord a => [a] -> [a] -> [a]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys) | x < y = x:merge xs (y:ys)
| otherwise = y:merge (x:xs) ys
halve :: [a] -> ([a],[a])
halve xs = (take lhx xs, drop lhx xs)
where lhx = length xs `div` 2
msort :: Ord a => [a] -> [a]
msort [] = []
msort [x] = [x]
msort xs = merge (msort left) (msort right)
where (left,right) = halve xs
Haskell is an indentation sensitive programming language, you simply need to fix that (btw. if you are using tabs change that to using spaces).
mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
mergeSort merge xs
| length xs < 2 = xs
| otherwise = merge (mergeSort merge first) (mergeSort merge second)
where first = take half xs
second = drop half xs
half = length xs `div` 2
None of these solutions is as smart as Haskell's own solution, which runs on the idea that in the worst case scenario's these proposed algorithms is still run Theta (n log n) even if the list to be sorted is already trivially sorted.
Haskell's solution is to merge lists of strictly decreasing (and increasing values). The simplified code looks like:
mergesort :: Ord a => [a] -> [a]
mergesort xs = unwrap (until single (pairWith merge) (runs xs))
runs :: Ord a => [a] -> [[a]]
runs = foldr op []
where op x [] = [[x]]
op x ((y:xs):xss) | x <= y = (x:y:xs):xss
| otherwise = [x]:(y:xs):xss`
This will run Theta(n)
Haskell's version is smarter still because it will do an up run and a down run.
As usual I am in awe with the cleverness of Haskell!

How to extract the same elements from two lists in Haskell?

here's my question:
How to extract the same elements from two equal length lists to another list?
For example: given two lists [2,4,6,3,2,1,3,5] and [7,3,3,2,8,8,9,1] the answer should be [1,2,3,3]. Note that the order is immaterial. I'm actually using the length of the return list.
I tried this:
sameElem as bs = length (nub (intersect as bs))
but the problem is nub removes all the duplications. The result of using my function to the former example is 3 the length of [1,3,2] instead of 4 the length of [1,3,3,2]. Is there a solution? Thank you.
Since the position seems to be irrelevant, you can simply sort the lists beforehand and then traverse both lists:
import Data.List (sort)
intersectSorted :: Ord a => [a] -> [a] -> [a]
intersectSorted (x:xs) (y:ys)
| x == y = x : intersectSorted xs ys
| x < y = intersectSorted xs (y:ys)
| x > y = intersectSorted (x:xs) ys
intersectSorted _ _ = []
intersect :: Ord a => [a] -> [a] -> [a]
intersect xs ys = intersectSorted (sort xs) (sort ys)
Note that it's also possible to achieve this with a Map:
import Data.Map.Strict (fromListWith, assocs, intersectionWith, Map)
type Counter a = Map a Int
toCounter :: Ord a => [a] -> Counter a
toCounter = fromListWith (+) . flip zip (repeat 1)
intersectCounter :: Ord a => Counter a -> Counter a -> Counter a
intersectCounter = intersectionWith min
toList :: Counter a -> [a]
toList = concatMap (\(k,c) -> replicate c k) . assocs
intersect :: Ord a => [a] -> [a] -> [a]
intersect xs ys = toList $ intersectCounter (toCounter xs) (toCounter ys)
You could write a function for this. There is probably a more elegant version of this involving lambda's or folds, but this does work for your example:
import Data.List
same (x:xs) ys = if x `elem` ys
then x:same xs (delete x ys)
else same xs ys
same [] _ = []
same _ [] = []
The delete x ys in the then-clause is important, without that delete command items from the first list that occur at least once will be counted every time they're encountered.
Note that the output is not sorted, since you were only interested in the length of the resulting list.
import Data.List (delete)
mutuals :: Eq a => [a] -> [a] -> [a]
mutuals [] _ = []
mutuals (x : xs) ys | x `elem` ys = x : mutuals xs (delete x ys)
| otherwise = mutuals xs ys
gives
mutuals [2,4,6,3,2,1,3,5] [7,3,3,2,8,8,9,1] == [2,3,1,3]

Resources