Is `group list by size` a fold?

Is `group list by size` a fold? - haskell

I came across this problem : grouping the elements of a list by packet of the same size, so that
> groupBy 3 [1..10]
[[1,2,3], [4,5,6], [7,8,9], [10]]
Nothing really hard to do, but first I was surprise that I couldn't find a function for it.
My first try was
groupBy _ [] = []
groupBy n xs = g : groupBy n gs
where (g, gs) = splitAt n xs
So far so good, it works, even on infinite list. However I don't like the first line groupBy _ [] = []. Seems a good candidate for a fold but I couldn't figure it out.
So can this function can be written as a fold or as a one liner ?
Update
My attempt at a one liner:
groupBy' n l = map (map snd) $ groupBy ((==) `on` fst) $ concatMap (replicate n) [1..] `zip` l
It took me 10 times more to write that the initial attempt.
Update 2
Following Ganesh answer and using unfoldr and the help of pointfree I came out with this convoluted point free solution
groupBy' n = unfoldr $ listToMaybe . (ap (>>) (return.splitAt n))

You can do it as a fold with some gymnastics, but it's much nicer as an unfold:
unfoldr (\xs -> if null xs then Nothing else Just (splitAt n xs))
[You'll need to import Data.List if you haven't already]
The type of unfoldr is:
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
The idea of unfoldr is that a generating function decides whether to stop (Nothing) or keep going (Just). If the result is Just then the first element of the tuple is the next element of the output list, and the second element is passed to the generating function again.
As #leftroundabout pointed out in a comment on the question, an unfold is much more natural here because it treats the output list elements as similar to each other, whereas in a fold the input list elements should be treated similarly. In this case the need to start a new sublist every n elements of the input list makes this harder.

Related

Intermediate value in simple Haskell function

I need a function to double every other number in a list. This does the trick:
doubleEveryOther :: [Integer] -> [Integer]
doubleEveryOther [] = []
doubleEveryOther (x:[]) = [x]
doubleEveryOther (x:(y:zs)) = x : 2 * y : doubleEveryOther zs
However, the catch is that I need to double every other number starting from the right - so if the length of the list is even, the first one will be doubled, etc.
I understand that in Haskell it's tricky to operate on lists backwards, so my plan was to reverse the list, apply my function, then output the reverse again. I have a reverseList function:
reverseList :: [Integer] -> [Integer]
reverseList [] = []
reverseList xs = last xs : reverseList (init xs)
But I'm not quite sure how to implant it inside my original function. I got to something like this:
doubleEveryOther :: [Integer] -> [Integer]
doubleEveryOther [] = []
doubleEveryOther (x:[]) = [x]
doubleEveryOther (x:(y:zs)) =
| rev_list = reverseList (x:(y:zs))
| rev_list = [2 * x, y] ++ doubleEveryOther zs
I'm not exactly sure of the syntax of a function that includes intermediate values like this.
In case it's relevant, this is for Exercise 2 in CIS 194 HW 1.

This is a very simple combination of the two functions you've already created:
doubleEveryOtherFromRight = reverseList . doubleEveryOther . reverseList
Note that your reverseList is actually already defined in the standard Prelude as reverse. so you didn't need to define it yourself.
I'm aware that the above solution isn't very efficient, because both uses of reverse need to pass through the entire list. I'll leave it to others to suggest more efficient versions, but hopefully this illustrates the power of function composition to build more complex computations out of simpler ones.

As Lorenzo points out, you can make one pass to determine if the list has an odd or even length, then a second pass to actually construct the new list. It might be simpler, though, to separate the two tasks.
doubleFromRight ls = zipWith ($) (cycle fs) ls -- [f0 ls0, f1 ls1, f2 ls2, ...]
where fs = if odd (length ls)
then [(*2), id]
else [id, (*2)]
So how does this work? First, we observe that to create the final result, we need to apply one of two function (id or (*2)) to each element of ls. zipWith can do that if we have a list of appropriate functions. The interesting part of its definition is basically
zipWith f (x:xs) (y:ys) = f x y : zipWith f xs ys
When f is ($), we're just applying a function from one list to the corresponding element in the other list.
We want to zip ls with an infinite alternating list of id and (*2). The question is, which function should that list start with? It should always end with (*2), so the starting item is determined by the length of ls. An odd-length requires us to start with (*2); an even one, id.

Most of the other solutions show you how to either use the building blocks you already have or building blocks available in the standard library to build your function. I think it's also instructive to see how you might build it from scratch, so in this answer I discuss one idea for that.
Here's the plan: we're going to walk all the way to the end of the list, then walk back to the front. We'll build our new list during our walk back from the end. The way we'll build it as we walk back is by alternating between (multiplicative) factors of 1 and 2, multiplying our current element by our current factor and then swapping factors for the next step. At the end we'll return both the final factor and the new list. So:
doubleFromRight_ :: Num a => [a] -> (a, [a])
doubleFromRight_ [] = (1, [])
doubleFromRight_ (x:xs) =
-- not at the end yet, keep walking
let (factor, xs') = doubleFromRight_ xs
-- on our way back to the front now
in (3-factor, factor*x:xs')
If you like, you can write a small wrapper that throws away the factor at the end.
doubleFromRight :: Num a => [a] -> [a]
doubleFromRight = snd . doubleFromRight_
In ghci:
> doubleFromRight [1..5]
[1,4,3,8,5]
> doubleFromRight [1..6]
[2,2,6,4,10,6]
Modern practice would be to hide the helper function doubleFromRight_ inside a where block in doubleFromRight; and since the slightly modified name doesn't actually tell you anything new, we'll use the community standard name internally. Those two changes might land you here:
doubleFromRight :: Num a => [a] -> [a]
doubleFromRight = snd . go where
go [] = (1, [])
go (x:xs) = let (factor, xs') = go xs in (3-factor, factor*x:xs')
An advanced Haskeller might then notice that go fits into the shape of a fold and write this:
doubleFromRight :: Num a => [a] -> [a]
doubleFromRight = snd . foldr (\x (factor, xs) -> (3-factor, factor*x:xs)) (1,[])
But I think it's perfectly fine in this case to stop one step earlier with the explicit recursion; it may even be more readable in this case!

If we really want to avoid calculating the length, we can define
doubleFromRight :: Num a => [a] -> [a]
doubleFromRight xs = zipWith ($)
(foldl' (\a _ -> drop 1 a) (cycle [(2*), id]) xs)
xs
This pairs up the input list with the cycled infinite list of functions, [(*2), id, (*2), id, .... ]. then it skips along them both. when the first list is finished, the second is in the appropriate state to be - again - applied, pairwise, - on the second! This time, for real.
So in effect it does measure the length (of course), it just doesn't count in integers but in the list elements so to speak.
If the length of the list is even, the first element will be doubled, otherwise the second, as you've specified in the question:
> doubleFromRight [1..4]
[2,2,6,4]
> doubleFromRight [1..5]
[1,4,3,8,5]
The foldl' function processes the list left-to-right. Its type is
foldl' :: (b -> a -> b) -> b -> [a] -> b
-- reducer_func acc xs result

Whenever you have to work on consecutive terms in a list, zip with a list comprehension is an easy way to go. It takes two lists and returns a list of tuples, so you can either zip the list with its tail or make it indexed. What i mean is
doubleFromRight :: [Int] -> [Int]
doubleFromRight ls = [if (odd i == oddness) then 2*x else x | (i,x) <- zip [1..] ls]
where
oddness = odd . length $ ls
This way you count every element, starting from 1 and if the index has the same parity as the last element in the list (both odd or both even), then you double the element, else you leave it as is.
I am not 100% sure this is more efficient, though, if anyone could point it out in the comments that would be great

A better way of optimizing for permutation of function compositions over an input?

I have a list of functions and their 'apply priority'.
It looks like this. Length of it is 33
listOfAllFunctions = [ (f1, 1)
, (f2, 2)
, ...
, ...
, (f33, 33)
]
What I want to do is generate a list of permutations of the above list with no duplicates and I only want 8 unique elements in the inner list.
Which I'm implementing like this
prioratizedFunctions :: [[(MyDataType -> MyDataType, Int)]]
prioratizedFunctions = nubBy removeDuplicates
$ sortBy (comparing snd)
<$> take 8
<$> permutations listOfAllFunctions
where removeDuplicates is defined like
removeDuplicates a b = map snd a == map snd b
Lastly I'm turning the sublists which'd be [(MyDataType -> MyDataType, Int)] to a composition of functions and a [Int]
with this function
compFunc :: [(MyDataType -> MyDataType, Int)] -> MyDataType -> (MyDataType, [Int])
compFunc listOfDataAndInts target = (foldr ((.) . fst) id listOfDataAndInts target
, map snd listOfDataAndInts)
Applying the above function like this (flip compFunc) target <$> prioratizedFunctions
All of the above is a simplified version of the actual code but it should provide the gist it.
The problem is that this code takes practically forever to execute. From some prototyping I think the blame of it falls on my implementation of permutations function inside prioratizedFunctions.
So I was wondering, is there a better way of doing what I want (basically generating permutation of listOfAllFunctions where each list only contains 8 elements, every list of elements sorted by their priority with snd and containing no duplicate list)
or is the problem inherently a long process?

I was generating unnecessary permutations.
This choose function is basically a non-deterministic take function
choose 0 xs = [[]]
choose n [] = []
choose n (x:xs) = map (x:) (choose (n-1) xs) ++ choose n xs
which improved performance by a lot.

Generating subsets of set. Laziness?

I have written a function generating subsets of subset. It caused stack overflow when I use in the following way subsets [1..]. And it is "normal" behaviour when it comes to "normal" (no-lazy) languages. And now, I would like to improve my function to be lazy.
P.S. I don't understand laziness ( And I try to understand it) so perhaps my problem is strange for you- please explain. :)
P.S. 2 Feel free to say me something about my disability in Haskell ;)
subsets :: [a] -> [[a]]
subsets (x:xs) = (map (\ e -> x:e) (subsets xs)) ++ (subsets xs)
subsets [] = [[]]

There's two problems with that function. First, it recurses twice, which makes it exponentially more ineffiecient than necessary (if we disregard the exponential number of results...), because each subtree is recalculated every time for all overlapping subsets; this can be fixed by leting the recursive call be the same value:
subsets' :: [a] -> [[a]]
subsets' [] = [[]]
subsets' (x:xs) = let s = subsets' xs
in map (x:) s ++ s
This will already allow you to calculate length $ subsets' [1..25] in a few seconds, while length $ subsets [1..25] takes... well, I didn't wait ;)
The other issue is that with your version, when you give it an infinite list, it will recurse on the infinite tail of that list first. To generate all finite subsets in a meaningful way, we need to ensure two things: first, we must build up each set from smaller sets (to ensure termination), and second, we should ensure a fair order (ie., not generate the list [[1], [2], ...] first and never get to the rest). For this, we start from [[]] and recursively add the current element to everything we have already generated, and then remember the new list for the next step:
subsets'' :: [a] -> [[a]]
subsets'' l = [[]] ++ subs [[]] l
where subs previous (x:xs) = let next = map (x:) previous
in next ++ subs (previous ++ next) xs
subs _ [] = []
Which results in this order:
*Main> take 100 $ subsets'' [1..]
[[],[1],[2],[2,1],[3],[3,1],[3,2],[3,2,1],[4],[4,1],[4,2],[4,2,1],[4,3],[4,3,1],[4,3,2],[4,3,2,1],[5],[5,1],[5,2],[5,2,1],[5,3],[5,3,1],[5,3,2],[5,3,2,1],[5,4],[5,4,1],[5,4,2],[5,4,2,1],[5,4,3],[5,4,3,1],[5,4,3,2],[5,4,3,2,1],[6],[6,1],[6,2],[6,2,1],[6,3],[6,3,1],[6,3,2],[6,3,2,1],[6,4],[6,4,1],[6,4,2],[6,4,2,1],[6,4,3],[6,4,3,1],[6,4,3,2],[6,4,3,2,1],[6,5],[6,5,1],[6,5,2],[6,5,2,1],[6,5,3],[6,5,3,1],[6,5,3,2],[6,5,3,2,1],[6,5,4],[6,5,4,1],[6,5,4,2],[6,5,4,2,1],[6,5,4,3],[6,5,4,3,1],[6,5,4,3,2],[6,5,4,3,2,1],[7],[7,1],[7,2],[7,2,1],[7,3],[7,3,1],[7,3,2],[7,3,2,1],[7,4],[7,4,1],[7,4,2],[7,4,2,1],[7,4,3],[7,4,3,1],[7,4,3,2],[7,4,3,2,1],[7,5],[7,5,1],[7,5,2],[7,5,2,1],[7,5,3],[7,5,3,1],[7,5,3,2],[7,5,3,2,1],[7,5,4],[7,5,4,1],[7,5,4,2],[7,5,4,2,1],[7,5,4,3],[7,5,4,3,1],[7,5,4,3,2],[7,5,4,3,2,1],[7,6],[7,6,1],[7,6,2],[7,6,2,1]]

You can't generate all the subsets of an infinite set: they form an uncountable set. Cardinality makes it impossible.
At most, you can try to generate all the finite subsets. For that, you can't proceed by induction, from [] onwards, since you'll never reach []. You need to proceed inductively from the beginning of the list, instead of the end.

A right fold solution would be:
powerset :: Foldable t => t a -> [[a]]
powerset xs = []: foldr go (const []) xs [[]]
where go x f a = let b = (x:) <$> a in b ++ f (a ++ b)
then:
\> take 8 $ powerset [1..]
[[],[1],[2],[2,1],[3],[3,1],[3,2],[3,2,1]]

How do I split a list into sublists at certain points?

How do I manually split [1,2,4,5,6,7] into [[1],[2],[3],[4],[5],[6],[7]]? Manually means without using break.
Then, how do I split a list into sublists according to a predicate? Like so
f even [[1],[2],[3],[4],[5],[6],[7]] == [[1],[2,3],[4,5],[6,7]]
PS: this is not homework, and I've tried for hours to figure it out on my own.

To answer your first question, this is rather an element-wise transformation than a split. The appropriate function to do this is
map :: (a -> b) -> [a] -> [b]
Now, you need a function (a -> b) where b is [a], as you want to transform an element into a singleton list containing the same type. Here it is:
mkList :: a -> [a]
mkList a = [a]
so
map mkList [1,2,3,4,5,6,7] == [[1],[2],...]
As for your second question: If you are not allowed (homework?) to use break, are you then allowed to use takeWhile and dropWhile which form both halves of the result of break.
Anyway, for a solution without them ("manually"), just use simple recursion with an accumulator:
f p [] = []
f p (x:xs) = go [x] xs
where go acc [] = [acc]
go acc (y:ys) | p y = acc : go [y] ys
| otherwise = go (acc++[y]) ys
This will traverse your entire list tail recursively, always remembering what the current sublist is, and when you reach an element where p applies, outputting the current sublist and starting a new one.
Note that go first receives [x] instead of [] to provide for the case where the first element already satisfies p x and we don't want an empty first sublist to be output.
Also, this operates on the original list ([1..7]) instead of [[1],[2]...]. But you can use it on the transformed one as well:
> map concat $ f (odd . head) [[1],[2],[3],[4],[5],[6],[7]]
[[1,2],[3,4],[5,6],[7]]

For the first, you can use a list comprehension:
>>> [[x] | x <- [1,2,3,4,5,6]]
[[1], [2], [3], [4], [5], [6]]
For the second problem, you can use the Data.List.Split module provided by the split package:
import Data.List.Split
f :: (a -> Bool) -> [[a]] -> [[a]]
f predicate = split (keepDelimsL $ whenElt predicate) . concat
This first concats the list, because the functions from split work on lists and not list of lists. The resulting single list is the split again using functions from the split package.

First:
map (: [])
Second:
f p xs =
let rs = foldr (\[x] ~(a:r) -> if (p x) then ([]:(x:a):r) else ((x:a):r))
[[]] xs
in case rs of ([]:r) -> r ; _ -> rs
foldr's operation is easy enough to visualize:
foldr g z [a,b,c, ...,x] = g a (g b (g c (.... (g x z) ....)))
So when writing the combining function, it is expecting two arguments, 1st of which is "current element" of a list, and 2nd is "result of processing the rest". Here,
g [x] ~(a:r) | p x = ([]:(x:a):r)
| otherwise = ((x:a):r)
So visualizing it working from the right, it just adds into the most recent sublist, and opens up a new sublist if it must. But since lists are actually accessed from the left, we keep it lazy with the lazy pattern, ~(a:r). Now it works even on infinite lists:
Prelude> take 9 $ f odd $ map (:[]) [1..]
[[1,2],[3,4],[5,6],[7,8],[9,10],[11,12],[13,14],[15,16],[17,18]]
The pattern for the 1st argument reflects the peculiar structure of your expected input lists.

How to select every n-th element from a list [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to get every Nth element of an infinite list in Haskell?
Simple task - we have a list and want to leave only each nth element in that list.
What is the most idiomatic way to do it in haskell?
off the top of my head it is something like:
dr n [] = []
dr n (x : xs) = x : (dr n $ drop n xs)
but I have a strong feeling that I'm overcomplicating the problem.

My variant would be:
each :: Int -> [a] -> [a]
each n = map head . takeWhile (not . null) . iterate (drop n)
Fast and plays well with laziness.

Your solution is fine, but here are three other solutions using functions from Haskell's base library.
dr1 m = concatMap (take 1) . iterate (drop m)
Of coarse, this will never terminate (because iterate never terminates). So perhaps a better solution would be to use unfoldr:
{-# LANGUAGE TupleSections #-}
import Data.Maybe
dr2 m = unfoldr ((\x-> fmap (,drop m x) (listToMaybe x)))
The function you pass to an unfold can get a bit ugly if you don't know GHC extensions and concepts such as functors, here's that solution again without the fancy foot-work (untested):
dr2 m = unfoldr ((\x -> case listToMaybe x of
Nothing -> Nothing
Just i -> Just (i,drop m x)))
If you don't like unfolds then consider a zip and a filter:
dr3 m = map snd . filter ((== 1) . fst) . zip (cycle [1..m])
Review
Understand all these solutions are slightly different. Learning why will make you a better Haskell progammer. dr1 uses iterate and will thus never terminate (perhaps this is ok for infinite lists, but probably not a good overall solution):
> dr1 99 [1..400]
[1,100,199,298,397^CInterrupted.
The dr2 solution will show every mth value by skipping values in the unfold. The unfold passes both the value to be used for the next unfolding and the result of the current unfolding in a single tuple.
> dr2 99 [1..400]
[1,100,199,298,397]
The dr3 solution is slightly longer but probably easier for a beginner to understand. First you tag every element in the list with a cycle of [1..n, 1..n, 1..n ...]. Second, you select only the numbers tagged with a 1, effectively skipping n-1 of the elements. Third you remove the tags.
> dr3 99 [1..400]
[1,100,199,298,397]

Lots of ways to shave this yak! Here's yet another:
import Data.List.Split -- from the "split" package on Hackage
dr n = map head . chunk n

Try this:
getEach :: Int -> [a] -> [a]
getEach _ [] = []
getEach n list
| n < 1 = []
| otherwise = foldr (\i acc -> list !! (i - 1):acc) [] [n, (2 * n)..(length list)]
Then in GHC:
*Main> getEach 2 [1..10]
[10,8,6,4,2]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Is `group list by size` a fold? - haskell

Related

Intermediate value in simple Haskell function

A better way of optimizing for permutation of function compositions over an input?

Generating subsets of set. Laziness?

How do I split a list into sublists at certain points?

How to select every n-th element from a list [duplicate]

Categories

Resources