Merge multiple lists if condition is true

Merge multiple lists if condition is true - haskell

I've been trying to wrap my head around this for a while now, but it seems like my lack of Haskell experience just won't get me through it. I couldn't find a similar question here on Stackoverflow (most of them are related to merging all sublists, without any condition)
So here it goes. Let's say I have a list of lists like this:
[[1, 2, 3], [3, 5, 6], [20, 21, 22]]
Is there an efficient way to merge lists if some sort of condition is true? Let's say I need to merge lists that share at least one element. In case of example, result would be:
[[1, 2, 3, 3, 5, 6], [20, 21, 22]]
Another example (when all lists can be merged):
[[1, 2], [2, 3], [3, 4]]
And it's result:
[[1, 2, 2, 3, 3, 4]]
Thanks for your help!

I don't know what to say about efficiency, but we can break down what's going on and get several different functionalities at least. Particular functionalities might be optimizable, but it's important to clarify exactly what's needed.
Let me rephrase the question: For some set X, some binary relation R, and some binary operation +, produce a set Q = {x+y | x in X, y in X, xRy}. So for your example, we might have X being some set of lists, R being "xRy if and only if there's at least one element in both x and y", and + being ++.
A naive implementation might just copy the set-builder notation itself
shareElement :: Eq a => [a] -> [a] -> Bool
shareElement xs ys = or [x == y | x <- xs, y <- ys]
v1 :: (a -> a -> Bool) -> (a -> a -> b) -> [a] -> [b]
v1 (?) (<>) xs = [x <> y | x <- xs, y <- xs, x ? y]
then p = v1 shareElement (++) :: Eq a => [[a]] -> [[a]] might achieve what you want. Except it probably doesn't.
Prelude> p [[1], [1]]
[[1,1],[1,1],[1,1],[1,1]]
The most obvious problem is that we get four copies: two from merging the lists with themselves, two from merging the lists with each other "in both directions". The problem occurs because List isn't the same as Set so we can't kill uniques. Of course, that's an easy fix, we'll just use Set everywhere
import Data.Set as Set
v2 :: (a -> a -> Bool) -> (a -> a -> b) -> Set.Set a -> Set.Set b
v2 (?) (<>) = Set.fromList . v1 (?) (<>) . Set.toList
So we can try again, p = v2 (shareElementonSet.toList) Set.union with
Prelude Set> p $ Set.fromList $ map Set.fromList [[1,2], [2,1]]
fromList [fromList [1,2]]
which seems to work. Note that we have to "go through" List because Set can't be made an instance of Monad or Applicative due to its Ord constraint.
I'd also note that there's a lot of lost behavior in Set. For instance, we fight either throwing away order information in the list or having to handle both x <> y and y <> x when our relation is symmetric.
Some more convenient versions can be written like
v3 :: Monoid a => (a -> a -> Bool) -> [a] -> [a]
v3 r = v2 r mappend
and more efficient ones can be built if we assume that the relationship is, say, an equality relation since then instead of having an O(n^2) operation we can do it in O(nd) where d is the number of partitions (cosets) of the relation.
Generally, it's a really interesting problem.

I just happened to write something similar here: Finding blocks in arrays
You can just modify it so (although I'm not too sure about the efficiency):
import Data.List (delete, intersect)
example1 = [[1, 2, 3], [3, 5, 6], [20, 21, 22]]
example2 = [[1, 2], [2, 3], [3, 4]]
objects zs = map concat . solve zs $ [] where
areConnected x y = not . null . intersect x $ y
solve [] result = result
solve (x:xs) result =
let result' = solve' xs [x]
in solve (foldr delete xs result') (result':result) where
solve' xs result =
let ys = filter (\y -> any (areConnected y) result) xs
in if null ys
then result
else solve' (foldr delete xs ys) (ys ++ result)
OUTPUT:
*Main> objects example1
[[20,21,22],[3,5,6,1,2,3]]
*Main> objects example2
[[3,4,2,3,1,2]]

Related

Split ranges in Haskell

Given a list like:
[1, 2, 2, 6, 7, 8, 10, 11, 12, 15]
Split it into blandly increasing ranges (maybe equal):
[[1, 2, 2], [6, 7, 8], [10, 11, 12], [15]]
I tried using a recursive approach:
splitRanges [] = [[]]
splitRanges (x:y:xs)
| x `elem` [y, y + 1] = [x, y] : splitRanges xs
| otherwise = xs
So if the item is one less or equal to the item after I fuse them. But it says I am trying to build an infinite type:
Occurs check: cannot construct the infinite type: a0 = [a0]
Expected type: [[a0]]
Actual type: [a0]
But what does [the fact that it is monotone] have to do with how the list is split?
That being strictly increasing would give different results.
Or are you really trying to say something else?
I hope I am not.
Will the list always be monotone?
No, splitting a monotone list means making it into just one sub-list.
If not, how should that affect the results?
If it is not monotone, you will have many sublists.
Is it always brown into groups of three?
No, the groups may contain n elements.
More examples would be good
splitRanges [1, 3] == [[1], [3]]
splitRanges [1, 2, 5] == [[1, 2], [3]]
splitRanges [0, 0, 1] == [[0, 0, 1]]
splitRanges [1, 5, 7, 9] == [[1], [5], [7], [9]]
I appreciate hints rather than full answers, as I would like to improve myself, copy-pasting is not improvement.

Try breaking the problem into more manageable parts.
First, how would you split just one blandly increasing range from the start of a list? Lets guess that should be splitOne :: [Integer] -> ([Integer], [Integer]).
Second, how can you repeatedly apply splitOne to the left-over list? Try implementing splitMany :: [Integer] -> [[Integer]] by using splitOne.
For splitOne, what should you be trying to find? The first position to split at. What are "split positions"? Lets make that up.
split 0 1 2 3 4 …
list [ | x1, | x2, | x3, | x4, | x5, …]
So a split at 0 is ([], [x1,x2,x3,x4,x5,…]), and a split at 3 is ([x1,x2,x3],[x4,x5,…]). What relationship can you see between the split position and the split list?
How do you determine the first split position of the list? Lets say that is implemented as firstSplitPos :: [Integer] -> Integer. What is the first split position of an empty list?
Can you now implement splitOne using firstSplitPos?
One Possible Answer
-- What are the adjacencies for:
-- 1) empty lists?
-- 2) lists with one element?
-- 3) lists with more than one element?
--
-- Bonus: rewrite in point-free form using <*>
--
adjacencies :: [a] -> [(a,a)]
adjacencies xxs = zip xxs (drop 1 xxs)
-- Bonus: rewrite in point-free form
--
withIndices :: [a] -> [(Int,a)]
withIndices xxs = zip [0..] xxs
-- This is the most involved part of the answer. Pay close
-- attention to:
-- 1) empty lists
-- 2) lists with one element
-- 3) lists which are a blandly increasing sequence
--
firstSplitPos :: (Eq a, Num a) => [a] -> Int
firstSplitPos xxs = maybe (length xxs) pos (find q searchList)
where q (_,(a,b)) = a /= b && a + 1 /= b
searchList = withIndices (adjacencies xxs)
-- Why is the split position one more than the index?
pos (i,_) = i + 1
--
-- Bonus: rewrite in point-free form using <*>
--
splitOne :: (Eq a, Num a) => [a] -> ([a],[a])
splitOne xxs = splitAt (firstSplitPos xxs) xxs
splitMany :: (Eq a, Num a) => [a] -> [[a]]
-- What happens if we remove the case for []?
splitMany [] = []
splitMany xxs = let (l, r) = splitOne xxs in l : splitMany r
Another Approach
This is my explanation of Carsten's solution. It is already succinct but I have elected for a variation which does not use a 2-tuple.
We know that Haskell lists are defined inductively. To demonstrate this, we can define an equivalent data type.
data List a = Cons a (List a) -- Cons = (:)
| Nil -- Nil = []
Then ask the question: can we use induction on lists for the solution? If so, we only have to solve two cases: Cons and Nil. The type signature of foldr shows us exactly that:
foldr :: (a -> b -> b) -- Cons case
-> b -- Nil case
-> [a] -- The list
-> b -- The result
What if the list is Nil? Then the only blandly increasing sequence is the empty sequence. Therefore:
nilCase = [[]]
We might want nilCase = [] instead, as that also seems reasonable — i.e. there are no blandly increasing sequences.
Now you need some imagination. In the Cons case we only get to look at one new element at a time. With this new element, we could decide whether it belongs to the right-adjacent sequence or if it begins a new sequence.
What do I mean by right-adjacent? In [5,4,1,2,2,7], 1 belongs to the right-adjacent sequence [2,2].
How might this look?
-- The rest of the list is empty
consCase new [] = [new] : []
-- The right-adjacent sequence is empty
consCase new ([]:ss) = [new] : ss
-- The right-adjacent sequence is non-empty
-- Why `new + 1 == x` and not `new == x + 1`?
consCase new sss#(xxs#(x:_):ss)
| new == x || new + 1 == x = (new:xxs):ss
| otherwise = [new]:sss
Now that we solved the Nil case and the Cons case, we are done!
splitRanges = foldr consCase nilCase

It would be useful and idiomatic to write your function to take a predicate, instead of writing your split condition into the function itself:
splitBy2 :: (a -> a -> Bool) -> [a] -> [[a]]
splitBy2 ok xs = snd $ f xs [] []
where f (a:b:xs) acc_list acc_out_lists | ok a b = ...

I hope you don't mind spoiling part of it, but as the comments are discussing what you want (and I hope I've got it) maybe you are interested in another possible solution?
I don't want to spoil it all but I think you can easily work this out:
blandly :: (Ord a, Num a) => [a] -> [[a]]
blandly = g . foldr f ([],[])
where f x ([],xss) = ([x],xss)
f x (y:ys,xss)
| abs (x-y) <= 1 = undefined
| otherwise = undefined
g (ys,xss) = undefined
you just have to fill in the undefined holes
The idea is just to fold the list from the right, accumulating your inner lists in the first item of the tuple, s long as the elements are not to far away; and if they are: to push it to the second item.
If done correctly it will yield:
λ> blandly [1,3]
[[1],[3]]
λ> blandly [1,2,5]
[[1,2],[5]]
λ> blandly [0,0,1]
[[0,0,1]]
λ> blandly [1,5,7,9]
[[1],[5],[7],[9]]
which seems to be what you want
1 hour later - I think I can post my solution - just stop reading if you don't want to get spoiled
blandly :: (Ord a, Num a) => [a] -> [[a]]
blandly = uncurry (:) . foldr f ([],[])
where f x ([],xs) = ([x],xs)
f x (y:ys,xs)
| abs (x-y) <= 1 = (x:y:ys,xs)
| otherwise = ([x],(y:ys):xs)
maybe I have a slight misunderstanding here (the examples did not specify it) - but if you want on only monotonic increasing inner lists you just have to change the abs part:
blandly :: (Ord a, Num a) => [a] -> [[a]]
blandly = uncurry (:) . foldr f ([],[])
where f x ([],xss) = ([x],xss)
f x (y:ys,xss)
| 0 <= y-x
&& y-x <= 1 = (x:y:ys,xss)
| otherwise = ([x],(y:ys):xss)

Haskell: List combination for Integers

I have a given list, e.g. [2, 3, 5, 587] and I want to have a complete list of the combination. So want something like [2, 2*3,2*5, 2*587, 3, 3*5, 3*587, 5, 5*587, 587]. Since I am on beginner level with Haskell I am curious how a list manipulation would look like.
Additionally I am curious if the computation of the base list might be expensive how would this influence the costs of the function? (If I would assume the list has limit values, i.e < 20)
Rem.: The order of the list could be done afterwards, but I have really no clue if this is cheaper within the function or afterwards.

The others have explained how to make pairs, so I concern myself here with getting the combinations.
If you want the combinations of all lengths, that's just the power set of your list, and can be computed the following way:
powerset :: [a] -> [[a]]
powerset (x:xs) = let xs' = powerset xs in xs' ++ map (x:) xs'
powerset [] = [[]]
-- powerset [1, 2] === [[],[2],[1],[1,2]]
-- you can take the products:
-- map product $ powerset [1, 2] == [1, 2, 1, 2]
There's an alternative powerset implementation in Haskell that's considered sort of a classic:
import Control.Monad
powerset = filterM (const [True, False])
You could look at the source of filterM to see how it works essentially the same way as the other powerset above.
On the other hand, if you'd like to have all the combinations of a certain size, you could do the following:
combsOf :: Int -> [a] -> [[a]]
combsOf n _ | n < 1 = [[]]
combsOf n (x:xs) = combsOf n xs ++ map (x:) (combsOf (n - 1) xs)
combsOf _ _ = []
-- combsOf 2 [1, 2, 3] === [[2,3],[1,3],[1,2]]

So it seems what you want is all pairs of products from the list:
ghci> :m +Data.List
ghci> [ a * b | a:bs <- tails [2, 3, 5, 587], b <- bs ]
[6,10,1174,15,1761,2935]
But you also want the inital numbers:
ghci> [ a * b | a:bs <- tails [2, 3, 5, 587], b <- 1:bs ]
[2,6,10,1174,3,15,1761,5,2935,587]
This uses a list comprehension, but this could also be done with regular list operations:
ghci> concatMap (\a:bs -> a : map (a*) bs) . init $ tails [2, 3, 5, 587]
[2,6,10,1174,3,15,1761,5,2935,587]
The latter is a little easier to explain:
Data.List.tails produces all the suffixes of a list:
ghci> tails [2, 3, 5, 587]
[[2,3,5,587],[3,5,587],[5,587],[587],[]]
Prelude.init drops the last element from a list. Here I use it to drop the empty suffix, since processing that causes an error in the next step.
ghci> init [[2,3,5,587],[3,5,587],[5,587],[587],[]]
[[2,3,5,587],[3,5,587],[5,587],[587]]
ghci> init $ tails [2, 3, 5, 587]
[[2,3,5,587],[3,5,587],[5,587],[587]]
Prelude.concatMap runs a function over each element of a list, and combines the results into a flattened list. So
ghci> concatMap (\a -> replicate a a) [1,2,3]
[1, 2, 2, 3, 3, 3]
\(a:bs) -> a : map (a*) bs does a couple things.
I pattern match on my argument, asserting that it matches an list with at least one element (which is why I dropped the empty list with init) and stuffs the initial element into a and the later elements into bs.
This produces a list that has the same first element as the argument a:, but
Multiplies each of the later elements by a (map (a*) bs).

You can get the suffixes of a list using Data.List.tails.
This gives you a list of lists, you can then do the inner multiplications you want on this list with a function like:
prodAll [] = []
prodAll (h:t) = h:(map (* h) $ t)
You can then map this function over each inner list and concatenate the results:
f :: Num a => [a] -> [a]
f = concat . map prodAll . tails

How can I convert this binary recursive function into a tail-recursive form?

There is a clear way to convert binary recursion to tail recursion for sets closed under a function, i.e. integers with addition for the Fibonacci sequence:
(Using Haskell)
fib :: Int -> Int
fib n = fib' 0 1 n
fib' :: Int -> Int -> Int
fib' x y n
| n < 1 = y
| otherwise = fib' y (x + y) (n - 1)
This works because we have our desired value, y, and our operation, x + y, where x + y returns an integer just like y does.
However, what if I want to use a set that is not closed under a function? I want to take a function that splits a list into two lists and then does the same to those two lists (i.e. like recursively creating a binary tree), where I stop when another function magically says when to stop when it looks at the resulting split:
[1, 2, 3, 4, 5] -> [[1, 3, 4], [2, 5]] -> [[1, 3], [4], [2], [5]]
That is,
splitList :: [Int] -> [[Int]]
splitList intList
| length intList < 2 = [intList]
| magicFunction x y > 0 = splitList x ++ splitList y
| otherwise = [intList]
where
x = some sublist of intList
y = the other sublist of intList
Now, how can this binary recursion be converted to tail recursion? The prior method won't explicitly work, as (Int + Int -> Int is the same as the inputs) but (Split [Int] -/> [[Int]] is not the same as the input). As such, the accumulator would need to be changed (I assume).

There is a general trick to make any function tail recursive: rewrite it in continuation-passing style (CPS). The basic idea behind CPS is that every function takes an additional parameter--a function to call when they're done. Then, instead of returning a value, the original functions calls the function that was passed in. This latter function is called a "continuation" because it continues the computation on to its next step.
To illustrate this idea, I'm just going to use your function as an example. Note the changes to the type signature as well as the structure of the code:
splitListCPS :: [Int] -> ([[Int]] -> r) -> r
splitListCPS intList cont
| length intList < 2 = cont [intList]
| magicFunction x y > 0 = splitListCPS x $ \ r₁ ->
splitListCPS y $ \ r₂ ->
cont $ r₁ ++ r₂
| otherwise = cont [intList]
You can then wrap this up into a normal-looking function as follows:
splitList :: [Int] -> [[Int]]
splitList intList = splitListCPS intList (\ r -> r)
If you follow the slightly convoluted logic, you'll see that these two functions are equivalent. The tricky bit is the recursive case. There, we immediately call splitListCPS with x. The function \ r₁ -> ... that tells splitListCPS what to do when it's done--in this case, call splitListCPS with the next argument (y). Finally, once we have both results, we just combine the results and pass that into the original continuation (cont). So at the end, we get the same result we had originally (namely splitList x ++ splitList y) but instead of returning it, we just use the continuation.
Also, if you look through the above code, you'll note that all the recursive calls are in tail position. At each step, our last action is always either a recursive call or using the continuation. With a clever compiler, this sort of code can actually be fairly efficient.
In a certain sense, this technique is actually similar to what you did for fib; however, instead of maintaining an accumulator value we sort of maintain an accumulator of the computation we're doing.

You don't generally want tail-recursion in Haskell. What you do want, is productive corecursion (see also this), describing what in SICP is called an iterative process.
You can fix the type inconsistency in your function by enclosing initial input in a list. In your example
[1, 2, 3, 4, 5] -> [[1, 3, 4], [2, 5]] -> [[1, 3], [4], [2], [5]]
only the first arrow is inconsistent, so change it into
[[1, 2, 3, 4, 5]] -> [[1, 3, 4], [2, 5]] -> [[1, 3], [4], [2], [5]]
which illustrates the process of iteratively applying concatMap splitList1, where
splitList1 xs
| null $ drop 1 xs = [xs]
| magic a b > 0 = [a,b] -- (B)
| otherwise = [xs]
where (a,b) = splitSomeHow xs
You want to stop if no (B) case was fired at a certain iteration.
(edit: removed the intermediate version)
But it is much better to produce the portions of the output that are ready, as soon as possible:
splitList :: [Int] -> [[Int]]
splitList xs = g [xs] -- explicate the stack
where
g [] = []
g (xs : t)
| null $ drop 1 xs = xs : g t
| magic a b > 0 = g (a : b : t)
| otherwise = xs : g t
where (a,b) = splitSomeHow xs
-- magic a b = 1
-- splitSomeHow = splitAt 2
Don't forget to compile with -O2 flag.

Reorganizing a list of lists

How to solve this problem?
The problem is to reorder the list-of-lists of doubles:
[ [a, b, c], [aa, bb, cc] ]
into this:
[ [a, aa], [b, bb], [c, cc] ]
After poking about I came up with the following (a function that increasingly diggs deeper and deeper into sublists, taking their head and joining them together):
organize xs = organize' xs head
--recursive function (type stolen from ghci)
organize':: [[a]] -> ([a] -> b) -> [b]
organize' [] f = []
organize' xs f = (map f xs)++(organize' xs (f . tail)
This doesn't work too good (which I thought it did) - in my joy of success I completely missed the error:
Exception: Prelude.head: empty list

Your mention of "doubles" implies that you want a list of 2-tuples (ie, "doubles"), rather than a list of 2-element lists. (Or perhaps this wording was particular to my Function Programming 101 lecturer!)
In which case, zip does exactly this:
zip [1, 2, 3] [4, 5, 6] = [(1,4),(2,5),(3,6)]
If you do need a list of 2-element lists (instead of tuples), you can use zipWith:
organize [xs,ys] = zipWith (\x y -> [x,y]) xs ys
Or are you looking for something that will work with any number of lists? In that case (as others have commented) transpose from Data.List is what you're after:
transpose [[1,2,3],[4,5,6]] = [[1,4],[2,5],[3,6]]

Haskell: surprising behavior of "groupBy"

I'm trying to figure out the behavior of the library function groupBy (from Data.List), which purports to group elements of a list by an "equality test" function passed in as the first argument. The type signature suggests that the equality test just needs to have type
(a -> a -> Bool)
However, when I use (<) as the "equality test" in GHCi 6.6, the results are not what I expect:
ghci> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Instead I'd expect runs of strictly increasing numbers, like this:
[[1,2,3],[2,4],[1,5,9]]
What am I missing?

Have a look at the ghc implementation of groupBy:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
Now compare these two outputs:
Prelude List> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Prelude List> groupBy (<) [8, 2, 3, 2, 4, 1, 5, 9]
[[8],[2,3],[2,4],[1,5,9]]
In short, what happens is this: groupBy assumes that the given function (the first argument) tests for equality, and thus assumes that the comparison function is reflexive, transitive and symmetric (see equivalence relation). The problem here is that the less-than relation is not reflexive, nor symmetric.
Edit: The following implementation only assumes transitivity:
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _ [] = []
groupBy' _ [x] = [[x]]
groupBy' cmp (x:xs#(x':_)) | cmp x x' = (x:y):ys
| otherwise = [x]:r
where r#(y:ys) = groupBy' cmp xs

The fact that "<" isn't an equality test.
You might expect some behavior because you'd implement it differently, but that isn't what it promises.
An example of why what it outputs is a reasonable answer is if it sweeps through it, doing
[1, 2, 3, 2, 4, 1, 5, 9] ->
[[1,2,3], [2,4], [1,5,9]]
Now has 3 groups of equal elements. So it checks if any of them are in fact the same:
Since it knows all elements in each group is equal, it can just look at the first element in each, 1, 2 and 1.
1 > 2? Yes! So it merges the first two groups.
1 > 1? No! So it leaves the last group be.
And now it's compared all elements for equality.
...only, you didn't pass it the kind of function it expected.
In short, when it wants an equality test, give it an equality test.

The problem is that the reference implementation of groupBy in the Haskell Report compares elements against the first element, so the groups are not strictly increasing (they just have to be all bigger than the first element). What you want instead is a version of groupBy that tests on adjacent elements, like the implementation here.

I'd just like to point out that the groupBy function also requires your list to be sorted before being applied.
For example:
equalityOp :: (a, b1) -> (a, b2) -> Bool
equalityOp x y = fst x == fst y
testData = [(1, 2), (1, 4), (2, 3)]
correctAnswer = groupBy equalityOp testData == [[(1, 2), (1, 4)], [(2, 3)]]
otherTestData = [(1, 2), (2, 3), (1, 4)]
incorrectAnswer = groupBy equalityOp otherTestData == [[(1, 2)], [(2, 3)], [(1, 4)]]
This behaviour comes about because groupBy is using span in its definition. To get reasonable behaviour which doesn't rely on us having the underlying list in any particular order we can define a function:
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' eq [] = []
groupBy' eq (x:xs) = (x:similarResults) : (groupBy' eq differentResults)
where similarResults = filter (eq x) xs
differentResults = filter (not . eq x) xs

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Merge multiple lists if condition is true - haskell

Related

Split ranges in Haskell

Haskell: List combination for Integers

How can I convert this binary recursive function into a tail-recursive form?

Reorganizing a list of lists

Haskell: surprising behavior of "groupBy"

Categories

Resources