Merge sort in Haskell - haskell

I am new to Haskell and I am trying to implement a few known algorithms in it.
I have implemented merge sort on strings. I am a bit disappointed with the
performance of my Haskell implementation compared to C and Java implementations.
On my machine (Ubuntu Linux, 1.8 GHz), C (gcc 4.3.3) sorts 1 000 000 strings in 1.85 s,
Java (Java SE 1.6.0_14) in 3.68 s, Haskell (GHC 6.8.2) in 25.89 s.
With larger input (10 000 000 strings), C takes 21.81 s, Java takes 59.68 s, Haskell
starts swapping and I preferred to stop the program after several minutes.
Since I am new to Haskell, I would be interested to know if my implementation can
be made more time / space efficient.
Thank you in advance for any hint
Giorgio
My implementation:
merge :: [String] -> [String] -> [String]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys) = if x < y
then x : (merge xs (y:ys))
else y : (merge (x:xs) ys)
mergeSort :: [String] -> [String]
mergeSort xs = if (l < 2)
then xs
else merge h t
where l = length xs
n = l `div` 2
s = splitAt n xs
h = mergeSort (fst s)
t = mergeSort (snd s)

Try this version:
mergesort :: [String] -> [String]
mergesort = mergesort' . map wrap
mergesort' :: [[String]] -> [String]
mergesort' [] = []
mergesort' [xs] = xs
mergesort' xss = mergesort' (merge_pairs xss)
merge_pairs :: [[String]] -> [[String]]
merge_pairs [] = []
merge_pairs [xs] = [xs]
merge_pairs (xs:ys:xss) = merge xs ys : merge_pairs xss
merge :: [String] -> [String] -> [String]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys)
= if x > y
then y : merge (x:xs) ys
else x : merge xs (y:ys)
wrap :: String -> [String]
wrap x = [x]
Bad idea is splitting list first. Instead of it just make list of one member lists. Haskell is lazy, it will be done in right time.
Then merge pairs of lists until you have only one list.

In Haskell, a string is a lazy list of characters and has the same overhead as any other list. If I remember right from a talk I heard Simon Peyton Jones give in 2004, the space cost in GHC is 40 bytes per character. For an apples-to-apples comparation you probably should be sorting Data.ByteString, which is designed to give performance comparable to other languages.

Better way to split the list to avoid the issue CesarB points out:
split [] = ([], [])
split [x] = ([x], [])
split (x : y : rest) = (x : xs, y : ys)
where (xs, ys) = split rest
mergeSort [] = []
mergeSort [x] = [x]
mergeSort xs = merge (mergesort ys) (mergesort zs)
where (ys, zs) = split xs
EDIT: Fixed.

I am not sure if this is the cause of your problem, but remember that lists are a sequential data structure. In particular, both length xs and splitAt n xs will take an amount of time proportional to the length of the list (O(n)).
In C and Java, you are most probably using arrays, which take constant time for both operations (O(1)).
Edit: answering your question on how to make it more efficient, you can use arrays in Haskell too.

Related

Haskell Permutations with very limited functions

I have to implement a function in haskell that takes a list [Int] and gives a list [[Int]] with all permutations, but i'm only allowed to use:
[], :, True, False, comparisons, &&, ||, and not
permutations [] = [[]]
permutations xs = [(y:zs) | (y,ys) <- picks xs, zs <- permutations ys]
where
picks (x:xs) = (x,xs) : [(y,x:ys) | (y,ys) <- picks xs]
My idea was to use something like that but i have to replace the <-
As mentioned by chepner in the comments, a few missing elementary library functions can easily be re-implemented “on the spot”.
The Wikipedia article on permutations leads us to, among many other things, the Steinhaus–Johnson–Trotter algorithm, which seems well suited to linked lists.
For this algorithm, an essential building block is a function we could declare as:
spread :: a -> [a] -> [[a]]
For example, expression spread 4 [1,2,3] has to put 4 at all possible positions within [1,2;3], thus evaluating to: [[4,1,2,3],[1,4,2,3],[1,2,4,3],[1,2,3,4]]. To get all permutations of [1,2,3,4], you just need to apply spread 4 to all permutations of [1,2,3]. And it is easy to write spread in recursive fashion:
spread :: a -> [a] -> [[a]]
spread x [] = [[x]]
spread x (y:ys) = (x:y:ys) : (map (y:) (spread x ys))
And permutations can thus be obtained like this:
permutations :: [a] -> [[a]]
permutations [] = [[]]
permutations (x:xs) = concat (map (spread x) (permutations xs))
Overall, a rules-compliant version of the source code would go like this, with its own local versions of the map and concat Prelude functions:
permutations :: [a] -> [[a]]
permutations [] = [[]]
permutations (x:xs) = myConcat (myMap (spread x) (permutations xs))
where
myMap fn [] = []
myMap fn (z:zs) = (fn z) : (myMap fn zs)
myConcat [] = []
myConcat ([]:zss) = myConcat zss
myConcat ((z:zs):zss) = z : (myConcat (zs:zss))
spread z [] = [[z]]
spread z (y:ys) = ( z:y:ys) : (myMap (y:) (spread z ys))

Haskell Error on program

just working on a problem and it keeps giving me this error:
Exception: Prelude.tail: empty list
Here is my code so far:
lxP :: Eq a => [[a]] -> [a]
lxP [] = []
lxP xss
| any null xss = []
| otherwise = loop xss []
where loop ::Eq b => [[b]] -> [b] -> [b]
loop xss acc =
let xs = concatMap (take 1) xss
in if any (\x -> x /= head xs) (tail xs)
then reverse acc
else loop (map tail xss) (head xs : acc)
Any idea if my indentation is the problem or is it something with the code?
PS. How could I improve the efficiency?
I can’t quite work out what your function is supposed to do. It doesn’t make sense to me as a reasonable thing to want. Here’s what it looks like to me:
You take some list of lists (let’s say it’s a matrix for now as I’m about to talk about columns) and you want to return the longest prefix of the first row such that each element is in a constant column.
So let’s try to write this in a more idiomatic way.
Now we want to look for constantness in columns, but what should we do if the rows aren’t the same length? I’m going to decide that we’ll just ignore them and imagine shoving all elements upwards so that there are no gaps. Let’s convert rows to columns:
transpose :: [[a]] -> [[a]]
transpose xss = t xss where
n = maximum (map length xss)
t [] = repeat n []
t (xs:xss) = join xs (t xss)
join [] yss = yss
join (x:xs) (ys:yss) = (x:ys) : join xs yss
So now we can write the function.
myWeirdFunction xss
| any null xss = []
| otherwise = map head $ takeWhile constant $ transpose xss where
constant (x:xs) = c x xs
c x (y:ys) | y == x = c x ys
| True = False
c x [] = True

Haskell Merge Sort

This is an implementation of Mergesort using higher order functions,guards,where and recursion.
However getting an error from compiler 6:26: parse error on input ‘=’
mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
mergeSort merge xs
| length xs < 2 = xs
| otherwise = merge (mergeSort merge first) (mergeSort merge second)
where first = take half xs
second = drop half xs
half = (length xs) `div` 2
I can't see whats wrong? or rather I don't understand the compiler.
Halving a list is not an O(1) operation but O(n), so the given solutions introduce additional costs compared to the imperative version of merge sort. One way to avoid halving is to simply start merging directly by making singletons and then merging every two consecutive lists:
sort :: (Ord a) => [a] -> [a]
sort = mergeAll . map (:[])
where
mergeAll [] = []
mergeAll [t] = t
mergeAll xs = mergeAll (mergePairs xs)
mergePairs (x:y:xs) = merge x y:mergePairs xs
mergePairs xs = xs
where merge is already given by others.
Another msort implementation in Haskell;
merge :: Ord a => [a] -> [a] -> [a]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys) | x < y = x:merge xs (y:ys)
| otherwise = y:merge (x:xs) ys
halve :: [a] -> ([a],[a])
halve xs = (take lhx xs, drop lhx xs)
where lhx = length xs `div` 2
msort :: Ord a => [a] -> [a]
msort [] = []
msort [x] = [x]
msort xs = merge (msort left) (msort right)
where (left,right) = halve xs
Haskell is an indentation sensitive programming language, you simply need to fix that (btw. if you are using tabs change that to using spaces).
mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
mergeSort merge xs
| length xs < 2 = xs
| otherwise = merge (mergeSort merge first) (mergeSort merge second)
where first = take half xs
second = drop half xs
half = length xs `div` 2
None of these solutions is as smart as Haskell's own solution, which runs on the idea that in the worst case scenario's these proposed algorithms is still run Theta (n log n) even if the list to be sorted is already trivially sorted.
Haskell's solution is to merge lists of strictly decreasing (and increasing values). The simplified code looks like:
mergesort :: Ord a => [a] -> [a]
mergesort xs = unwrap (until single (pairWith merge) (runs xs))
runs :: Ord a => [a] -> [[a]]
runs = foldr op []
where op x [] = [[x]]
op x ((y:xs):xss) | x <= y = (x:y:xs):xss
| otherwise = [x]:(y:xs):xss`
This will run Theta(n)
Haskell's version is smarter still because it will do an up run and a down run.
As usual I am in awe with the cleverness of Haskell!

interleaving two strings, preserving order: functional style

In this question, the author brings up an interesting programming question: given two string, find possible 'interleaved' permutations of those that preserves order of original strings.
I generalized the problem to n strings instead of 2 in OP's case, and came up with:
-- charCandidate is a function that finds possible character from given strings.
-- input : list of strings
-- output : a list of tuple, whose first value holds a character
-- and second value holds the rest of strings with that character removed
-- i.e ["ab", "cd"] -> [('a', ["b", "cd"])] ..
charCandidate xs = charCandidate' xs []
charCandidate' :: [String] -> [String] -> [(Char, [String])]
charCandidate' [] _ = []
charCandidate' ([]:xs) prev =
charCandidate' xs prev
charCandidate' (x#(c:rest):xs) prev =
(c, prev ++ [rest] ++ xs) : charCandidate' xs (x:prev)
interleavings :: [String] -> [String]
interleavings xs = interleavings' xs []
-- interleavings is a function that repeatedly applies 'charCandidate' function, to consume
-- the tuple and build permutations.
-- stops looping if there is no more tuple from charCandidate.
interleavings' :: [String] -> String -> [String]
interleavings' xs prev =
let candidates = charCandidate xs
in case candidates of
[] -> [prev]
_ -> concat . map (\(char, ys) -> interleavings' ys (prev ++ [char])) $ candidates
-- test case
input :: [String]
input = ["ab", "cd"]
-- interleavings input == ["abcd","acbd","acdb","cabd","cadb","cdab"]
it works, however I'm quite concerned with the code:
it is ugly. no point-free!
explicit recursion and additional function argument prev to preserve states
using tuples as intermediate form
How can I rewrite the above program to be more "haskellic", concise, readable and more conforming to "functional programming"?
I think I would write it this way. The main idea is to treat creating an interleaving as a nondeterministic process which chooses one of the input strings to start the interleaving and recurses.
Before we start, it will help to have a utility function that I have used countless times. It gives a convenient way to choose an element from a list and know which element it was. This is a bit like your charCandidate', except that it operates on a single list at a time (and is consequently more widely applicable).
zippers :: [a] -> [([a], a, [a])]
zippers = go [] where
go xs [] = []
go xs (y:ys) = (xs, y, ys) : go (y:xs) ys
With that in hand, it is easy to make some non-deterministic choices using the list monad. Notionally, our interleavings function should probably have a type like [NonEmpty a] -> [[a]] which promises that each incoming string has at least one character in it, but the syntactic overhead of NonEmpty is too annoying for a simple exercise like this, so we'll just give wrong answers when this precondition is violated. You could also consider making this a helper function and filtering out empty lists from your top-level function before running this.
interleavings :: [[a]] -> [[a]]
interleavings [] = [[]]
interleavings xss = do
(xssL, h:xs, xssR) <- zippers xss
t <- interleavings ([xs | not (null xs)] ++ xssL ++ xssR)
return (h:t)
You can see it go in ghci:
> interleavings ["abc", "123"]
["abc123","ab123c","ab12c3","ab1c23","a123bc","a12bc3","a12b3c","a1bc23","a1b23c","a1b2c3","123abc","12abc3","12ab3c","12a3bc","1abc23","1ab23c","1ab2c3","1a23bc","1a2bc3","1a2b3c"]
> interleavings ["a", "b", "c"]
["abc","acb","bac","bca","cba","cab"]
> permutations "abc" -- just for fun, to compare
["abc","bac","cba","bca","cab","acb"]
This is fastest implementation I've come up with so far. It interleaves a list of lists pairwise.
interleavings :: [[a]] -> [[a]]
interleavings = foldr (concatMap . interleave2) [[]]
This horribly ugly mess is the best way I could find to interleave two lists. It's intended to be asymptotically optimal (which I believe it is); it's not very pretty. The constant factors could be improved by using a special-purpose queue (such as the one used in Data.List to implement inits) rather than sequences, but I don't feel like including that much boilerplate.
{-# LANGUAGE BangPatterns #-}
import Data.Monoid
import Data.Foldable (toList)
import Data.Sequence (Seq, (|>))
interleave2 :: [a] -> [a] -> [[a]]
interleave2 xs ys = interleave2' mempty xs ys []
interleave2' :: Seq a -> [a] -> [a] -> [[a]] -> [[a]]
interleave2' !prefix xs ys rest =
(toList prefix ++ xs ++ ys)
: interleave2'' prefix xs ys rest
interleave2'' :: Seq a -> [a] -> [a] -> [[a]] -> [[a]]
interleave2'' !prefix [] _ = id
interleave2'' !prefix _ [] = id
interleave2'' !prefix xs#(x : xs') ys#(y : ys') =
interleave2' (prefix |> y) xs ys' .
interleave2'' (prefix |> x) xs' ys
Using foldr over interleave2
interleave :: [[a]] -> [[a]]
interleave = foldr ((concat .) . map . iL2) [[]] where
iL2 [] ys = [ys]
iL2 xs [] = [xs]
iL2 (x:xs) (y:ys) = map (x:) (iL2 xs (y:ys)) ++ map (y:) (iL2 (x:xs) ys)
Another approach would be to use the list monad:
interleavings xs ys = interl xs ys ++ interl ys xs where
interl [] ys = [ys]
interl xs [] = [xs]
interl xs ys = do
i <- [1..(length xs)]
let (h, t) = splitAt i xs
map (h ++) (interl ys t)
So the recursive part will alternate between the two lists, taking all from 1 to N elements from each list in turns and then produce all possible combinations of that. Fun use of the list monad.
Edit: Fixed bug causing duplicates
Edit: Answer to dfeuer. It turned out tricky to do code in the comment field. An example of solutions that do not use length could look something like:
interleavings xs ys = interl xs ys ++ interl ys xs where
interl [] ys = [ys]
interl xs [] = [xs]
interl xs ys = splits xs >>= \(h, t) -> map (h ++) (interl ys t)
splits [] = []
splits (x:xs) = ([x], xs) : map ((h, t) -> (x:h, t)) (splits xs)
The splits function feels a bit awkward. It could be replaced by use of takeWhile or break in combination with splitAt, but that solution ended up a bit awkward as well. Do you have any suggestions?
(I got rid of the do notation just to make it slightly shorter)
Combining the best ideas from the existing answers and adding some of my own:
import Control.Monad
interleave [] ys = return ys
interleave xs [] = return xs
interleave (x : xs) (y : ys) =
fmap (x :) (interleave xs (y : ys)) `mplus` fmap (y :) (interleave (x : xs) ys)
interleavings :: MonadPlus m => [[a]] -> m [a]
interleavings = foldM interleave []
This is not the fastest possible you can get, but it should be good in terms of general and simple.

How do I make a list of substrings?

I am trying to make a list of all substrings where each substring has one less element of the originial string.
e.g "1234" would result in ["1234","123","12","1"]
I would like to achieve this only using prelude (no import) so cant use subsequences.
I am new to Haskell, and I know some of the problems with my code but don't currently know how to fix them.
slist :: String -> [String]
slist (x:xs) = (take (length (x:xs)) (x:xs)) ++ slist xs
How can I do this recursively using
Edit: would like to this by using init recursively
slist :: String -> [String]
slist [] = []
-- slist xs = [xs] ++ (slist $ init xs)
slist xs = xs : (slist $ init xs)
main = do
print $ slist "1234"
Here's a very lazy version suitable for working on infinite lists. Each element of each resulting list after the first only requires O(1) amortized time to compute it no matter how far into the list we look.
The general idea is: for each length n we intend to drop off the end we split the list into a queue of items of length n and the remainder of the list. To yield results, we first check there's another item in the list that can take a place in the queue, then yield the first item in the queue. When we reach the end of the list we discard the remaining items from the queue.
import Data.Sequence (Seq, empty, fromList, ViewL (..), viewl, (|>))
starts :: [a] -> [[a]]
starts = map (uncurry shiftThrough) . splits
shiftThrough :: Seq a -> [a] -> [a]
shiftThrough queue [] = []
shiftThrough queue (x:xs) = q1:shiftThrough qs xs
where
(q1 :< qs) = viewl (queue |> x)
splits finds all the initial sequences of a list together with the tailing list.
splits :: [a] -> [(Seq a, [a])]
splits = go empty
where
go s [] = []
go s (x:xs) = (s,x:xs):go (s |> x) xs
We can write dropping from the end of a list in terms of the same strategy.
dropEnd :: Int -> [a] -> [a]
dropEnd n = uncurry (shiftThrough . fromList) . splitAt n
These use Data.Sequence's amortized O(n) construction of a sequence fromList, O(1) appending to the end of sequence with |> and O(1) examining the start of a sequence with viewl.
This is fast enough to query things like (starts [1..]) !! 80000 very quickly and (starts [1..]) !! 8000000 in a few seconds.
Look ma, no imports
A simple purely functional implementation of a queue is a pair of lists, one containing the things to output next in order and one containing the most recent things added. Whenever something is added it's added to the beginning of the added list. When something is needed the item is removed from the beginning of the next list. When there are no more items left to remove from the next list it is replaced by the added list in reverse order, and the added list is set to []. This has amortized O(1) running time since each item will be added once, removed once, and reversed once, however many of the reversals will happen all at once.
delay uses the queue logic described above to implement the same thing as shiftThrough from the previous section. xs is the list of things that were recently added and ys is the list of things to use next.
delay :: [a] -> [a] -> [a]
delay ys = traverse step ([],ys)
where
step (xs, ys) x = step' (x:xs) ys
step' xs [] = step' [] (reverse xs)
step' xs (y:ys) = (y, (xs, ys))
traverse is almost a scan
traverse :: (s -> a -> (b, s)) -> s -> [a] -> [b]
traverse f = go
where
go _ [] = []
go s (x:xs) = y : go s' xs
where (y, s') = f s x
We can define starts in terms of delay and another version of splits that returns lists.
starts :: [a] -> [[a]]
starts = map (uncurry delay) . splits
splits :: [a] -> [([a], [a])]
splits = go []
where
go s [] = []
go s (x:xs) = (reverse s, x:xs):go (x:s) xs
This has very similar performance to the implementation using Seq.
Here's a somewhat convoluted version:
slist xs = go (zip (repeat xs) [lenxs, lenxs - 1..1])
where lenxs = length xs
go [] = []
go (x:xs) = (take (snd x) (fst x)) : go xs
main = do
print $ slist "1234"
Updated answer to list all possible substrings (not just starting from the root).
slist :: [t] -> [[t]]
slist [] = []
slist xs = xs : (slist $ init xs ) # Taken from Pratik Deoghare's post
all_substrings:: [t] -> [[t]]
all_substrings (x:[]) = [[x]]
all_substrings (x:xs) = slist z ++ all_substrings xs
where z = x:xs
λ> all_substrings "1234"
["1234","123","12","1","234","23","2","34","3","4"]

Resources