More efficient powerset algorithm haskell - haskell

I Have a powerset function which creates a list [[a]] but the largest [a] is worked out first, meaning the whole algorithm has to run before I can get the smaller values.
I need a function which returns a powerset, in ascending order, so I could take the first n values of the function and the whole algorithm would not need to run.
Current simple algorithm
powerset :: [a] -> [[a]]
powerset [] = [[]]
powerset (x:xs) = [x:ps | ps <- powerset xs] ++ powerset xs

I don't understand what you mean by ascending order, but consider this solution:
powerset' :: [a] -> [[a]]
powerset' = loop [[]]
where
loop :: [[a]] -> [a] -> [[a]]
loop acc [] = acc
loop acc (x:xs) = loop (acc ++ fmap (\e -> e ++ [x]) acc) xs
We start with the powerset of the empty list, which is [[]], and expand it for each new element we encounter in the input list. The expansion is by appending the new element in each sublist we already emitted.
It requires that we append elements to the sublists exponentially many times, so I also considered using Data.DList from the dlist package that provides an efficient snoc operator that appends new elements to the end of the list:
import Data.DList
powerset :: [a] -> [[a]]
powerset xs = toList <$> loop [empty] xs
where
loop :: [DList a] -> [a] -> [DList a]
loop acc [] = acc
loop acc (y:ys) = loop (acc ++ fmap (`snoc` y) acc) ys
In my (rough) experiments, though, the first solution uses way less memory in the REPL and thus finishes faster for bigger input lists.
In both cases, this is what you get at the end:
$> powerset [1,2,3]
[[],[1],[2],[1,2],[3],[1,3],[2,3],[1,2,3]]
$> powerset_original [1,2,3]
[[1],[1,2],[1,3],[1,2,3],[],[2],[3],[2,3]]

Related

Haskell Merge Sort

This is an implementation of Mergesort using higher order functions,guards,where and recursion.
However getting an error from compiler 6:26: parse error on input ‘=’
mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
mergeSort merge xs
| length xs < 2 = xs
| otherwise = merge (mergeSort merge first) (mergeSort merge second)
where first = take half xs
second = drop half xs
half = (length xs) `div` 2
I can't see whats wrong? or rather I don't understand the compiler.
Halving a list is not an O(1) operation but O(n), so the given solutions introduce additional costs compared to the imperative version of merge sort. One way to avoid halving is to simply start merging directly by making singletons and then merging every two consecutive lists:
sort :: (Ord a) => [a] -> [a]
sort = mergeAll . map (:[])
where
mergeAll [] = []
mergeAll [t] = t
mergeAll xs = mergeAll (mergePairs xs)
mergePairs (x:y:xs) = merge x y:mergePairs xs
mergePairs xs = xs
where merge is already given by others.
Another msort implementation in Haskell;
merge :: Ord a => [a] -> [a] -> [a]
merge [] ys = ys
merge xs [] = xs
merge (x:xs) (y:ys) | x < y = x:merge xs (y:ys)
| otherwise = y:merge (x:xs) ys
halve :: [a] -> ([a],[a])
halve xs = (take lhx xs, drop lhx xs)
where lhx = length xs `div` 2
msort :: Ord a => [a] -> [a]
msort [] = []
msort [x] = [x]
msort xs = merge (msort left) (msort right)
where (left,right) = halve xs
Haskell is an indentation sensitive programming language, you simply need to fix that (btw. if you are using tabs change that to using spaces).
mergeSort :: ([a] -> [a] -> [a]) -> [a] -> [a]
mergeSort merge xs
| length xs < 2 = xs
| otherwise = merge (mergeSort merge first) (mergeSort merge second)
where first = take half xs
second = drop half xs
half = length xs `div` 2
None of these solutions is as smart as Haskell's own solution, which runs on the idea that in the worst case scenario's these proposed algorithms is still run Theta (n log n) even if the list to be sorted is already trivially sorted.
Haskell's solution is to merge lists of strictly decreasing (and increasing values). The simplified code looks like:
mergesort :: Ord a => [a] -> [a]
mergesort xs = unwrap (until single (pairWith merge) (runs xs))
runs :: Ord a => [a] -> [[a]]
runs = foldr op []
where op x [] = [[x]]
op x ((y:xs):xss) | x <= y = (x:y:xs):xss
| otherwise = [x]:(y:xs):xss`
This will run Theta(n)
Haskell's version is smarter still because it will do an up run and a down run.
As usual I am in awe with the cleverness of Haskell!

interleaving two strings, preserving order: functional style

In this question, the author brings up an interesting programming question: given two string, find possible 'interleaved' permutations of those that preserves order of original strings.
I generalized the problem to n strings instead of 2 in OP's case, and came up with:
-- charCandidate is a function that finds possible character from given strings.
-- input : list of strings
-- output : a list of tuple, whose first value holds a character
-- and second value holds the rest of strings with that character removed
-- i.e ["ab", "cd"] -> [('a', ["b", "cd"])] ..
charCandidate xs = charCandidate' xs []
charCandidate' :: [String] -> [String] -> [(Char, [String])]
charCandidate' [] _ = []
charCandidate' ([]:xs) prev =
charCandidate' xs prev
charCandidate' (x#(c:rest):xs) prev =
(c, prev ++ [rest] ++ xs) : charCandidate' xs (x:prev)
interleavings :: [String] -> [String]
interleavings xs = interleavings' xs []
-- interleavings is a function that repeatedly applies 'charCandidate' function, to consume
-- the tuple and build permutations.
-- stops looping if there is no more tuple from charCandidate.
interleavings' :: [String] -> String -> [String]
interleavings' xs prev =
let candidates = charCandidate xs
in case candidates of
[] -> [prev]
_ -> concat . map (\(char, ys) -> interleavings' ys (prev ++ [char])) $ candidates
-- test case
input :: [String]
input = ["ab", "cd"]
-- interleavings input == ["abcd","acbd","acdb","cabd","cadb","cdab"]
it works, however I'm quite concerned with the code:
it is ugly. no point-free!
explicit recursion and additional function argument prev to preserve states
using tuples as intermediate form
How can I rewrite the above program to be more "haskellic", concise, readable and more conforming to "functional programming"?
I think I would write it this way. The main idea is to treat creating an interleaving as a nondeterministic process which chooses one of the input strings to start the interleaving and recurses.
Before we start, it will help to have a utility function that I have used countless times. It gives a convenient way to choose an element from a list and know which element it was. This is a bit like your charCandidate', except that it operates on a single list at a time (and is consequently more widely applicable).
zippers :: [a] -> [([a], a, [a])]
zippers = go [] where
go xs [] = []
go xs (y:ys) = (xs, y, ys) : go (y:xs) ys
With that in hand, it is easy to make some non-deterministic choices using the list monad. Notionally, our interleavings function should probably have a type like [NonEmpty a] -> [[a]] which promises that each incoming string has at least one character in it, but the syntactic overhead of NonEmpty is too annoying for a simple exercise like this, so we'll just give wrong answers when this precondition is violated. You could also consider making this a helper function and filtering out empty lists from your top-level function before running this.
interleavings :: [[a]] -> [[a]]
interleavings [] = [[]]
interleavings xss = do
(xssL, h:xs, xssR) <- zippers xss
t <- interleavings ([xs | not (null xs)] ++ xssL ++ xssR)
return (h:t)
You can see it go in ghci:
> interleavings ["abc", "123"]
["abc123","ab123c","ab12c3","ab1c23","a123bc","a12bc3","a12b3c","a1bc23","a1b23c","a1b2c3","123abc","12abc3","12ab3c","12a3bc","1abc23","1ab23c","1ab2c3","1a23bc","1a2bc3","1a2b3c"]
> interleavings ["a", "b", "c"]
["abc","acb","bac","bca","cba","cab"]
> permutations "abc" -- just for fun, to compare
["abc","bac","cba","bca","cab","acb"]
This is fastest implementation I've come up with so far. It interleaves a list of lists pairwise.
interleavings :: [[a]] -> [[a]]
interleavings = foldr (concatMap . interleave2) [[]]
This horribly ugly mess is the best way I could find to interleave two lists. It's intended to be asymptotically optimal (which I believe it is); it's not very pretty. The constant factors could be improved by using a special-purpose queue (such as the one used in Data.List to implement inits) rather than sequences, but I don't feel like including that much boilerplate.
{-# LANGUAGE BangPatterns #-}
import Data.Monoid
import Data.Foldable (toList)
import Data.Sequence (Seq, (|>))
interleave2 :: [a] -> [a] -> [[a]]
interleave2 xs ys = interleave2' mempty xs ys []
interleave2' :: Seq a -> [a] -> [a] -> [[a]] -> [[a]]
interleave2' !prefix xs ys rest =
(toList prefix ++ xs ++ ys)
: interleave2'' prefix xs ys rest
interleave2'' :: Seq a -> [a] -> [a] -> [[a]] -> [[a]]
interleave2'' !prefix [] _ = id
interleave2'' !prefix _ [] = id
interleave2'' !prefix xs#(x : xs') ys#(y : ys') =
interleave2' (prefix |> y) xs ys' .
interleave2'' (prefix |> x) xs' ys
Using foldr over interleave2
interleave :: [[a]] -> [[a]]
interleave = foldr ((concat .) . map . iL2) [[]] where
iL2 [] ys = [ys]
iL2 xs [] = [xs]
iL2 (x:xs) (y:ys) = map (x:) (iL2 xs (y:ys)) ++ map (y:) (iL2 (x:xs) ys)
Another approach would be to use the list monad:
interleavings xs ys = interl xs ys ++ interl ys xs where
interl [] ys = [ys]
interl xs [] = [xs]
interl xs ys = do
i <- [1..(length xs)]
let (h, t) = splitAt i xs
map (h ++) (interl ys t)
So the recursive part will alternate between the two lists, taking all from 1 to N elements from each list in turns and then produce all possible combinations of that. Fun use of the list monad.
Edit: Fixed bug causing duplicates
Edit: Answer to dfeuer. It turned out tricky to do code in the comment field. An example of solutions that do not use length could look something like:
interleavings xs ys = interl xs ys ++ interl ys xs where
interl [] ys = [ys]
interl xs [] = [xs]
interl xs ys = splits xs >>= \(h, t) -> map (h ++) (interl ys t)
splits [] = []
splits (x:xs) = ([x], xs) : map ((h, t) -> (x:h, t)) (splits xs)
The splits function feels a bit awkward. It could be replaced by use of takeWhile or break in combination with splitAt, but that solution ended up a bit awkward as well. Do you have any suggestions?
(I got rid of the do notation just to make it slightly shorter)
Combining the best ideas from the existing answers and adding some of my own:
import Control.Monad
interleave [] ys = return ys
interleave xs [] = return xs
interleave (x : xs) (y : ys) =
fmap (x :) (interleave xs (y : ys)) `mplus` fmap (y :) (interleave (x : xs) ys)
interleavings :: MonadPlus m => [[a]] -> m [a]
interleavings = foldM interleave []
This is not the fastest possible you can get, but it should be good in terms of general and simple.

How do I make a list of substrings?

I am trying to make a list of all substrings where each substring has one less element of the originial string.
e.g "1234" would result in ["1234","123","12","1"]
I would like to achieve this only using prelude (no import) so cant use subsequences.
I am new to Haskell, and I know some of the problems with my code but don't currently know how to fix them.
slist :: String -> [String]
slist (x:xs) = (take (length (x:xs)) (x:xs)) ++ slist xs
How can I do this recursively using
Edit: would like to this by using init recursively
slist :: String -> [String]
slist [] = []
-- slist xs = [xs] ++ (slist $ init xs)
slist xs = xs : (slist $ init xs)
main = do
print $ slist "1234"
Here's a very lazy version suitable for working on infinite lists. Each element of each resulting list after the first only requires O(1) amortized time to compute it no matter how far into the list we look.
The general idea is: for each length n we intend to drop off the end we split the list into a queue of items of length n and the remainder of the list. To yield results, we first check there's another item in the list that can take a place in the queue, then yield the first item in the queue. When we reach the end of the list we discard the remaining items from the queue.
import Data.Sequence (Seq, empty, fromList, ViewL (..), viewl, (|>))
starts :: [a] -> [[a]]
starts = map (uncurry shiftThrough) . splits
shiftThrough :: Seq a -> [a] -> [a]
shiftThrough queue [] = []
shiftThrough queue (x:xs) = q1:shiftThrough qs xs
where
(q1 :< qs) = viewl (queue |> x)
splits finds all the initial sequences of a list together with the tailing list.
splits :: [a] -> [(Seq a, [a])]
splits = go empty
where
go s [] = []
go s (x:xs) = (s,x:xs):go (s |> x) xs
We can write dropping from the end of a list in terms of the same strategy.
dropEnd :: Int -> [a] -> [a]
dropEnd n = uncurry (shiftThrough . fromList) . splitAt n
These use Data.Sequence's amortized O(n) construction of a sequence fromList, O(1) appending to the end of sequence with |> and O(1) examining the start of a sequence with viewl.
This is fast enough to query things like (starts [1..]) !! 80000 very quickly and (starts [1..]) !! 8000000 in a few seconds.
Look ma, no imports
A simple purely functional implementation of a queue is a pair of lists, one containing the things to output next in order and one containing the most recent things added. Whenever something is added it's added to the beginning of the added list. When something is needed the item is removed from the beginning of the next list. When there are no more items left to remove from the next list it is replaced by the added list in reverse order, and the added list is set to []. This has amortized O(1) running time since each item will be added once, removed once, and reversed once, however many of the reversals will happen all at once.
delay uses the queue logic described above to implement the same thing as shiftThrough from the previous section. xs is the list of things that were recently added and ys is the list of things to use next.
delay :: [a] -> [a] -> [a]
delay ys = traverse step ([],ys)
where
step (xs, ys) x = step' (x:xs) ys
step' xs [] = step' [] (reverse xs)
step' xs (y:ys) = (y, (xs, ys))
traverse is almost a scan
traverse :: (s -> a -> (b, s)) -> s -> [a] -> [b]
traverse f = go
where
go _ [] = []
go s (x:xs) = y : go s' xs
where (y, s') = f s x
We can define starts in terms of delay and another version of splits that returns lists.
starts :: [a] -> [[a]]
starts = map (uncurry delay) . splits
splits :: [a] -> [([a], [a])]
splits = go []
where
go s [] = []
go s (x:xs) = (reverse s, x:xs):go (x:s) xs
This has very similar performance to the implementation using Seq.
Here's a somewhat convoluted version:
slist xs = go (zip (repeat xs) [lenxs, lenxs - 1..1])
where lenxs = length xs
go [] = []
go (x:xs) = (take (snd x) (fst x)) : go xs
main = do
print $ slist "1234"
Updated answer to list all possible substrings (not just starting from the root).
slist :: [t] -> [[t]]
slist [] = []
slist xs = xs : (slist $ init xs ) # Taken from Pratik Deoghare's post
all_substrings:: [t] -> [[t]]
all_substrings (x:[]) = [[x]]
all_substrings (x:xs) = slist z ++ all_substrings xs
where z = x:xs
λ> all_substrings "1234"
["1234","123","12","1","234","23","2","34","3","4"]

How to consider previous elements when mapping over a list?

I'm stuck at making a function in Haskell wich has to do the following:
For each integer in a list check how many integers in front of it are smaller.
smallerOnes [1,2,3,5] will have the result [(1,0), (2,1), (3,2), (5,3)]
At the moment I have:
smallerOnes :: [Int] -> [(Int,Int)]
smallerOnes [] = []
smallerOnes (x:xs) =
I don't have any clue on how to tackle this problem. Recursion is probably the way of thinking here but at that point I'm losing it.
It is beneficial here not to start with a base case, but rather with a main case.
Imagine we've already processed half the list. Now we are faced with the rest of the list, say x:xs. We want to know how many integers "before it" are smaller than x; so we need to know these elements, say ys: length [y | y<-ys, y<x] will be the answer.
So you'll need to use an internal function that will maintain the prefix ys, produce the result for each x and return them in a list:
smallerOnes :: [Int] -> [(Int,Int)]
smallerOnes [] = []
smallerOnes xs = go [] xs
where
go ys (x:xs) = <result for this x> : <recursive call with updated args>
go ys [] = []
This can also be coded using some built-in higher-order functions, e.g.
scanl :: (a -> b -> a) -> a -> [b] -> [a]
which will need some post-processing (like map snd or something) or more directly with
mapAccumL :: (acc -> x -> (acc, y)) -> acc -> [x] -> (acc, [y])
mapAccumL is in Data.List.
import Data.List (inits)
smallerOnes :: [Int] -> [(Int,Int)]
smallerOnes xs = zipWith (\x ys -> (x, length $ filter (< x) ys)) xs (inits xs)

Why is MergeSort in Haskell faster when implemented with foldl'?

I have implemented two versions of Merge Sort in Haskell like follows:
mergeSort1 :: (Ord a) => [a] -> [a]
mergeSort1 xs = foldl' (\acc x -> merge [x] acc) [] xs
and
mergeSort2 :: (Ord a) => [a] -> [a]
mergeSort2 [] = []
mergeSort2 (x:[]) = [x]
mergeSort2 xs = (mergeSort2 $ fst halves) `merge` (mergeSort2 $ snd halves)
where halves = splitList xs
where 'merge' and 'splitList' are implemented as follows:
merge :: (Ord a) => [a] -> [a] -> [a]
merge [] [] = []
merge xs [] = xs
merge [] ys = ys
merge all_x#(x:xs) all_y#(y:ys)
| x < y = x:merge xs all_y
| otherwise = y:merge all_x ys
splitList :: [a] -> ([a], [a])
splitList zs = go zs [] [] where
go [] xs ys = (xs, ys)
go [x] xs ys = (x:xs, ys)
go (x:y:zs) xs ys = go zs (x:xs) (y:ys)
Doing last $ mergeSort2 [1000000,999999..0] in ghci results in showing the number 1000000 after more than a minute of processing, while doing last $ mergeSort1 [1000000,999999..0] results in showing the last element only after 5 seconds.
I can understand why mergeSort1 uses much less memory than mergeSort2 because of the tail-recursiveness of foldl' and so.
What I can't understand is why mergeSort1 is faster than mergeSort2 by such a big difference ?
Could it be that splitList is the bottleneck in mergeSort2, generating two new lists every call?
As is,
mergeSort2 :: (Ord a) => [a] -> [a]
mergeSort2 xs = (mergeSort2 $ fst halves) `merge` (mergeSort2 $ snd halves)
where halves = splitList xs
is an infinite recursion, since you haven't given a base case (you need to specify the result for lists of length < 2). After that is fixed, mergeSort2 is still relatively slow due to the splitList which requires a complete traversal in each step and builds two new lists, not allowing to process anything before that is completed. A simple
splitList zs = splitAt h zs where h = length zs `quot` 2
does much better.
Your mergeSort1, however, is not a merge sort at all, it is an insertion sort.
mergeSort1 :: (Ord a) => [a] -> [a]
mergeSort1 xs = foldl' (\acc x -> merge [x] acc) [] xs
That does particularly well on reverse-sorted input, but if you give it sorted or random input, it scales quadratically.
So mergeSort1 was faster because you gave it optimal input, where it finishes in linear time.

Resources