Haskell Merge Sort - haskell

This is an implementation of Mergesort using higher order functions,guards,where and recursion.
However getting an error from compiler 6:26: parse error on input ‘=’
mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
mergeSort merge xs
| length xs < 2 = xs
| otherwise = merge (mergeSort merge first) (mergeSort merge second)
where first = take half xs
second = drop half xs
half = (length xs) `div` 2
I can't see whats wrong? or rather I don't understand the compiler.

Halving a list is not an O(1) operation but O(n), so the given solutions introduce additional costs compared to the imperative version of merge sort. One way to avoid halving is to simply start merging directly by making singletons and then merging every two consecutive lists:
sort :: (Ord a) => [a] -> [a]
sort = mergeAll . map (:[])
where
mergeAll [] = []
mergeAll [t] = t
mergeAll xs = mergeAll (mergePairs xs)
mergePairs (x:y:xs) = merge x y:mergePairs xs
mergePairs xs = xs
where merge is already given by others.

Another msort implementation in Haskell;
merge :: Ord a => [a] -> [a] -> [a]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys) | x < y = x:merge xs (y:ys)
| otherwise = y:merge (x:xs) ys
halve :: [a] -> ([a],[a])
halve xs = (take lhx xs, drop lhx xs)
where lhx = length xs `div` 2
msort :: Ord a => [a] -> [a]
msort [] = []
msort [x] = [x]
msort xs = merge (msort left) (msort right)
where (left,right) = halve xs

Haskell is an indentation sensitive programming language, you simply need to fix that (btw. if you are using tabs change that to using spaces).
mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
mergeSort merge xs
| length xs < 2 = xs
| otherwise = merge (mergeSort merge first) (mergeSort merge second)
where first = take half xs
second = drop half xs
half = length xs `div` 2

None of these solutions is as smart as Haskell's own solution, which runs on the idea that in the worst case scenario's these proposed algorithms is still run Theta (n log n) even if the list to be sorted is already trivially sorted.
Haskell's solution is to merge lists of strictly decreasing (and increasing values). The simplified code looks like:
mergesort :: Ord a => [a] -> [a]
mergesort xs = unwrap (until single (pairWith merge) (runs xs))
runs :: Ord a => [a] -> [[a]]
runs = foldr op []
where op x [] = [[x]]
op x ((y:xs):xss) | x <= y = (x:y:xs):xss
| otherwise = [x]:(y:xs):xss`
This will run Theta(n)
Haskell's version is smarter still because it will do an up run and a down run.
As usual I am in awe with the cleverness of Haskell!

Related

Sorting a list of lists in Haskell

I want to write a function that takes a list of sorted lists, then merges everything together and sorts them again.
I managed to write this so far:
merge_:: Ord a => [[a]] -> [a] --takes in the list and merges it
merge_ [] = []
merge_ (x:xs) = x ++ merge_ xs
isort:: Ord a => [a] -> [a] --Sorts a list
isort [] = []
isort (a:x) = ins a (isort x)
where
ins a [] = [a]
ins a (b:y) | a<= b = a:(b:y)
| otherwise = b: (ins a y)
I haven't been able to find a way to combine these two in one function in a way that makes sense. Note that I'm not allowed to use things such as ('.', '$'..etc) (homework)
We start simple. How do we merge two sorted lists?
mergeTwo :: Ord a => [a] -> [a] -> [a]
mergeTwo [] ys = ys
mergeTwo xs [] = xs
mergeTwo (x:xs) (y:ys)
| x <= y = x : mergeTwo xs (y:ys)
| otherwise = y : mergeTwo (x:xs) ys
How do we merge multiple? Well, we start with the first and the second and merge them together. Then we merge the new one and the third together:
mergeAll :: Ord a => [[a]] -> [a]
mergeAll (x:y:xs) = mergeAll ((mergeTwo x y) : xs)
mergeAll [x] = x
mergeAll _ = []
Allright. Now, to sort all elements, we need to create a list from every element, and then merge them back. Let's write a function that creates a list for a single item:
toList :: a -> [a]
toList x = -- exercise
And now a function to wrap all elements in lists:
allToList :: [a] -> [[a]]
allToList = -- exercise
And now we're done. We simply need to use allToList and then mergeAll:
isort :: Ord a => [a] -> [a]
isort xs = mergeAll (allToList xs)
Note that this exercise got a lot easier since we've split it into four functions.
Exercises (which might not be possible for you(r homework))
Write toList and allToList.
Try a list comprehension for allToList. Try a higher order function for allToList.
Write isort point-free (with (.)).
Check whether there is already a toList function with the same type. Use that one.
Rewrite mergeAll using foldr
Try this (not tested):
merge :: Ord a => [a] -> [a] -> [a]
merge [] l1 = l1
merge l1 [] = l1
merge (e1:l1) (e2:l2)
| e1<e2 = e1:merge l1 (e2:l2)
| otherwise = e2:merge (e1:l1) l2

How to extract the same elements from two lists in Haskell?

here's my question:
How to extract the same elements from two equal length lists to another list?
For example: given two lists [2,4,6,3,2,1,3,5] and [7,3,3,2,8,8,9,1] the answer should be [1,2,3,3]. Note that the order is immaterial. I'm actually using the length of the return list.
I tried this:
sameElem as bs = length (nub (intersect as bs))
but the problem is nub removes all the duplications. The result of using my function to the former example is 3 the length of [1,3,2] instead of 4 the length of [1,3,3,2]. Is there a solution? Thank you.
Since the position seems to be irrelevant, you can simply sort the lists beforehand and then traverse both lists:
import Data.List (sort)
intersectSorted :: Ord a => [a] -> [a] -> [a]
intersectSorted (x:xs) (y:ys)
| x == y = x : intersectSorted xs ys
| x < y = intersectSorted xs (y:ys)
| x > y = intersectSorted (x:xs) ys
intersectSorted _ _ = []
intersect :: Ord a => [a] -> [a] -> [a]
intersect xs ys = intersectSorted (sort xs) (sort ys)
Note that it's also possible to achieve this with a Map:
import Data.Map.Strict (fromListWith, assocs, intersectionWith, Map)
type Counter a = Map a Int
toCounter :: Ord a => [a] -> Counter a
toCounter = fromListWith (+) . flip zip (repeat 1)
intersectCounter :: Ord a => Counter a -> Counter a -> Counter a
intersectCounter = intersectionWith min
toList :: Counter a -> [a]
toList = concatMap (\(k,c) -> replicate c k) . assocs
intersect :: Ord a => [a] -> [a] -> [a]
intersect xs ys = toList $ intersectCounter (toCounter xs) (toCounter ys)
You could write a function for this. There is probably a more elegant version of this involving lambda's or folds, but this does work for your example:
import Data.List
same (x:xs) ys = if x `elem` ys
then x:same xs (delete x ys)
else same xs ys
same [] _ = []
same _ [] = []
The delete x ys in the then-clause is important, without that delete command items from the first list that occur at least once will be counted every time they're encountered.
Note that the output is not sorted, since you were only interested in the length of the resulting list.
import Data.List (delete)
mutuals :: Eq a => [a] -> [a] -> [a]
mutuals [] _ = []
mutuals (x : xs) ys | x `elem` ys = x : mutuals xs (delete x ys)
| otherwise = mutuals xs ys
gives
mutuals [2,4,6,3,2,1,3,5] [7,3,3,2,8,8,9,1] == [2,3,1,3]

merge finite sorted list in Haskell

I am new to Haskell. I am wondering how to write a function in Haskell that accepts finite sorted list of integers and merge them (sorted). Any code is appreciated!
If your goal is just to merge two list this is not so complicated
merge :: Ord a => [a] -> [a] -> [a]
this says that merge takes two lists and produce a list for any type with a defined ordering relation
merge [] x = x
merge x [] = x
this says that if you merge the empty list with anything you get that anything
merge (x:xs) (y:ys) | y < x = y : merge (x:xs) ys
merge (x:xs) (y:ys) | otherwise = x : merge xs (y:ys)
this says that if when you merge two lists the first element of the second list is lower, that should go on the front of the new list, and otherwise you should use the first element of the first list.
EDIT: Note that unlike some of the other solutions the merge above is both O(n) and stable. Wikipedia it if you don't know what that means.
If your goal is to merge a list of lists you generally want to do this bottom up by merging two lists at a time
mergePairs :: Ord a => [[a]] -> [[a]]
mergePairs [] = []
mergePairs [ls] = [ls]
mergePairs (x:y:ls) = (merge x y):mergePairs ls
merges :: Ord a => [[a]] -> [a]
merges [] = []
merges [x] = x
merges ls = merges $ mergePairs ls
it can be shown that this is asymptotically optimal if all the initial lists are the same length (O(m n log n) where m is the length of sorted lists and n is the number of sorted lists).
This can lead to an asymptotically efficent merge sort
mergeSort :: Ord a => [a] -> [a]
mergeSort ls = merges $ map (\x -> [x]) ls
This should do it, without requiring that the lists be finite:
merge :: Ord a => [a] -> [a] -> [a]
merge (x:xs) (y:ys) = if x < y
then x:(merge xs (y:ys))
else y:(merge (x:xs) ys)
merge [] xs = xs
merge xs [] = xs
In english, check the first elements of each list, and make the lesser one the next element, then merge the lists that remain.
(sort . concat) [[30..32],[1..3]] == [1,2,3,30,31,32]

Why is MergeSort in Haskell faster when implemented with foldl'?

I have implemented two versions of Merge Sort in Haskell like follows:
mergeSort1 :: (Ord a) => [a] -> [a]
mergeSort1 xs = foldl' (\acc x -> merge [x] acc) [] xs
and
mergeSort2 :: (Ord a) => [a] -> [a]
mergeSort2 [] = []
mergeSort2 (x:[]) = [x]
mergeSort2 xs = (mergeSort2 $ fst halves) `merge` (mergeSort2 $ snd halves)
where halves = splitList xs
where 'merge' and 'splitList' are implemented as follows:
merge :: (Ord a) => [a] -> [a] -> [a]
merge [] [] = []
merge xs [] = xs
merge [] ys = ys
merge all_x#(x:xs) all_y#(y:ys)
| x < y = x:merge xs all_y
| otherwise = y:merge all_x ys
splitList :: [a] -> ([a], [a])
splitList zs = go zs [] [] where
go [] xs ys = (xs, ys)
go [x] xs ys = (x:xs, ys)
go (x:y:zs) xs ys = go zs (x:xs) (y:ys)
Doing last $ mergeSort2 [1000000,999999..0] in ghci results in showing the number 1000000 after more than a minute of processing, while doing last $ mergeSort1 [1000000,999999..0] results in showing the last element only after 5 seconds.
I can understand why mergeSort1 uses much less memory than mergeSort2 because of the tail-recursiveness of foldl' and so.
What I can't understand is why mergeSort1 is faster than mergeSort2 by such a big difference ?
Could it be that splitList is the bottleneck in mergeSort2, generating two new lists every call?
As is,
mergeSort2 :: (Ord a) => [a] -> [a]
mergeSort2 xs = (mergeSort2 $ fst halves) `merge` (mergeSort2 $ snd halves)
where halves = splitList xs
is an infinite recursion, since you haven't given a base case (you need to specify the result for lists of length < 2). After that is fixed, mergeSort2 is still relatively slow due to the splitList which requires a complete traversal in each step and builds two new lists, not allowing to process anything before that is completed. A simple
splitList zs = splitAt h zs where h = length zs `quot` 2
does much better.
Your mergeSort1, however, is not a merge sort at all, it is an insertion sort.
mergeSort1 :: (Ord a) => [a] -> [a]
mergeSort1 xs = foldl' (\acc x -> merge [x] acc) [] xs
That does particularly well on reverse-sorted input, but if you give it sorted or random input, it scales quadratically.
So mergeSort1 was faster because you gave it optimal input, where it finishes in linear time.

can you get a count on matches in a list comprehension (trying to insertionSort on a qsort after threshold)

I come from a C++ background so I'm not sure if I'm even going about this properly. But what I'm trying to do is write up quick sort but fallback to insertion sort if the length of a list is less than a certain threshold. So far I have this code:
insertionSort :: (Ord a) => [a] -> [a]
insertionSort [] = []
insertionSort (x:xs) = insert x (insertionSort xs)
quickSort :: (Ord a) => [a] -> [a]
quickSort x = qsHelper x (length x)
qsHelper :: (Ord a) => [a] -> Int -> [a]
qsHelper [] _ = []
qsHelper (x:xs) n
| n <= 10 = insertionSort xs
| otherwise = qsHelper before (length before) ++ [x] ++ qsHelper after (length after)
where
before = [a | a <- xs, a < x]
after = [a | a <- xs, a >= x]
Now what I'm concerned about is calculating the length of each list every time. I don't fully understand how Haskell optimizes things or the complete effects of lazy evaluation on code like the above. But it seems like calculating the length of the list for each before and after list comprehension is not a good thing? Is there a way for you to extract the number of matches that occurred in a list comprehension while performing the list comprehension?
I.e. if we had [x | x <- [1,2,3,4,5], x > 3] (which results in [4,5]) could I get the count of [4,5] without using a call to length?
Thanks for any help/explanations!
Short answer: no.
Less short answer: yes, you can fake it. import Data.Monoid, then
| otherwise = qsHelper before lenBefore ++ [x] ++ qsHelper after lenAfter
where
(before, Sum lenBefore) = mconcat [([a], Sum 1) | a <- xs, a < x]
(after, Sum lenAfter) = mconcat [([a], Sum 1) | a <- xs, a >= x]
Better answer: you don't want to.
Common reasons to avoid length include:
its running time is O(N)
but it costs us O(N) to build the list anyway
it forces the list spine to be strict
but we're sorting the list: we have to (at least partially) evaluate each element in order to know which is the minimum; the list spine is already forced to be strict
if you don't care how long the list is, just whether it's shorter/longer than another list or a threshold, length is wasteful: it will walk all the way to the end of the list regardless
BINGO
isLongerThan :: Int -> [a] -> Bool
isLongerThan _ [] = False
isLongerThan 0 _ = True
isLongerThan n (_:xs) = isLongerThan (n-1) xs
quickSort :: (Ord a) => [a] -> [a]
quickSort [] = []
quickSort (x:xs)
| not (isLongerThan 10 (x:xs)) = insertionSort xs
| otherwise = quickSort before ++ [x] ++ quickSort after
where
before = [a | a <- xs, a < x]
after = [a | a <- xs, a >= x]
The real inefficiency here though is in before and after. They both step through the entire list, comparing each element against x. So we are stepping through xs twice, and comparing each element against x twice. We only have to do it once.
(before, after) = partition (< x) xs
partition is in Data.List.
No, there is no way to use list comprehensions to simultaneously do a filter and count the number of found elements. But if you are worried about this performance hit, you should not be using the list comprehensions the way you are in the first place: You are filtering the list twice, hence applying the predicate <x and its negation to each element. A better variant would be
(before, after) = partition (< x) xs
Starting from that it is not hard to write a function
partitionAndCount :: (a -> Bool) -> [a] -> (([a],Int), ([a],Int))
that simultaneously partitions and counts the list and counts the elements in each of the returned list:
((before, lengthBefore), (after, lengthAfter)) = partitionAndCount (< x) xs
Here is a possible implementation (with a slightly reordered type):
{-# LANGUAGE BangPatterns #-}
import Control.Arrow
partitionAndCount :: (a -> Bool) -> [a] -> (([a], [a]), (Int, Int))
partitionAndCount p = go 0 0
where go !c1 !c2 [] = (([],[]),(c1,c2))
go !c1 !c2 (x:xs) = if p x
then first (first (x:)) (go (c1 + 1) c2 xs)
else first (second (x:)) (go c1 (c2 + 1) xs)
And here you can see it in action:
*Main> partitionAndCount (>=4) [1,2,3,4,5,3,4,5]
(([4,5,4,5],[1,2,3,3]),(4,4))

Resources