Generate a list of unique combinations from a list - haskell

I want to generate a list of all unique ways to choose two from a list of numbers in Haskell. So from the list [1,2,3] I would like [[1,2],[2,3],[1,3]]. Order is not important so I want to avoid producing both [1,2] and [2,1] for example.
My current solution is:
pairs :: Ord a => [a] -> [[a]]
pairs x = nub $ map sort $ map (take 2) (permutations x)
This isn't a particularly nice solution however and it certainly has some serious performance issues. Is there a simple and efficient solution to this problem?

pairs xs = [[x1, x2] | (x1:xs1) <- tails xs, x2 <- xs1]
...assuming the list starts out unique, or you could compose this with nub otherwise.

Related

Take symmetrical pairs, of different numbers from list

I have a list like this:
[(2,3),(2,5),(2,7),(3,2),(3,4),(3,6),(4,3),(4,5),(4,7),(5,2),(5,4),(5,6),(6,3),(6,5),(6,7),(7,2),(7,4),(7,6)]
The digits are from [2..7]. I want to take a set where there are any symmetrical pairs. e.g. [(1,2),(2,1)], but those two numbers aren't used again in the set. An example would be:
[(3,6),(6,3),(2,5),(5,2),(4,7),(7,4)]
I wanted to first put symmetric pairs together as I thought it might be easier to work with so i created this function, which actually creates the pairs and puts them in another list
g xs = [ (y,x):(x,y):[] | (x,y) <- xs ]
with which the list turns to this:
[[(3,2),(2,3)],[(5,2),(2,5)],[(7,2),(2,7)],[(2,3),(3,2)],[(4,3),(3,4)],[(6,3),(3,6)],[(3,4),(4,3)],[(5,4),(4,5)],[(7,4),(4,7)],[(2,5),(5,2)],[(4,5),(5,4)],[(6,5),(5,6)],[(3,6),(6,3)],[(5,6),(6,5)],[(7,6),(6,7)],[(2,7),(7,2)],[(4,7),(7,4)],[(6,7),(7,6)]]
Then from here I was hoping to somehow remove duplicates.
I made a function that will look at all of the fst elements of all of the pairs:
flatList xss = [ x | xs <- xss, (x,y) <- xs ]
to use with another function to remove the duplicates.
h (x:xs) | (fst (head x)) `elem` (flatList xs) = h xs
| otherwise = (head x):(last x):(h xs)
which gives me the list
[(3,6),(6,3),(5,6),(6,5),(2,7),(7,2),(4,7),(7,4),(6,7),(7,6)]
which has duplicate numbers. That function only takes into account the first element of the first pair in the list of lists,the problem is when I also take into account the first element of the second pair (or the second element of the first pair):
h (x:xs) | (fst (head x)) `elem` (flatList xs) || (fst (last x)) `elem` (flatList xs) = h xs
| otherwise = (head x):(last x):(h xs)
I only get these two pairs:
[(6,7),(7,6)]
I see that the problem is that this method of deleting duplicates grabs the last repeated element, and would work with a list of digits, but not a list of pairs, as it misses pairs it needs to take.
Is there another way to solve this, or an alteration I could make?
It probably makes more sense to use a 2-tuple of 2-tuples in your list comprehension, since that makes it more easy to do pattern matching, and thus "by contract" enforces the fact that there are two items. We thus can construct 2-tuples that contain the 2-tuples with:
g :: Eq a => [(a, a)] -> [((a, a), (a, a))]
g xs = [ (t, s) | (t#(x,y):ts) <- tails xs, let s = (y, x), elem s ts ]
Here the elem s ts checks if the "swapped" 2-tuple occurs in the rest of the list.
Then we still need to filter the elements. We can make use of a function that uses an accumulator for the thus far obtained items:
h :: Eq a => [((a, a), (a, a))] -> [(a, a)]
h = go []
where go _ [] = []
go seen ((t#(x, y), s):xs)
| notElem x seen && notElem y seen = t : s : go (x:y:seen) xs
| otherwise = go seen xs
For the given sample input, we thus get:
Prelude Data.List> (h . g) [(2,3),(2,5),(2,7),(3,2),(3,4),(3,6),(4,3),(4,5),(4,7),(5,2),(5,4),(5,6),(6,3),(6,5),(6,7),(7,2),(7,4),(7,6)]
[(2,3),(3,2),(4,5),(5,4),(6,7),(7,6)]
after reading a few times your question, I got an elegant solution to your problem. Thinking that if you have a list of pairs without any repeated number, you can get the list of swapped pairs easily, solving your problem. So your problem can be reduce to given a list, get the list of all pairs using each number just one.
For a given list, there are many solutions to this, ex: for [1,2,3,4] valid solutions are: [(2,4),(4,2),(1,3),(3,1)] and [(2,3),(3,2),(1,4),(4,1)], etc... The approach here is:
take a permutation if the original list (say [1,4,3,2])
pick one element for each half and pair them together (for simplicity, you can pick consecutive elements too)
for each pair, create a the swapped pair and put all together
By doing so you end up with a list of non repeating numbers of pairs and its symmetric. More over, looping around all permutaitons, you can get all the solutions to your problem.
import Data.List (permutations, splitAt)
import Data.Tuple (swap)
-- This function splits a list by the half of the length
splitHalf :: [a] -> ([a], [a])
splitHalf xs = splitAt (length xs `quot` 2) xs
-- This zip a pair of list into a list of pairs
zipHalfs :: ([a], [a]) -> [(a,a)]
zipHalfs (xs, ys) = zip xs ys
-- Given a list of tuples, creates a larger list with all tuples and all swapped tuples
makeSymetrics :: [(a,a)] -> [(a,a)]
makeSymetrics xs = foldr (\t l -> t:(swap t):l) [] xs
-- This chain all of the above.
-- Take all permutations of xs >>> for each permutations >>> split it in two >>> zip the result >>> make swapped pairs
getPairs :: [a] -> [[(a,a)]]
getPairs xs = map (makeSymetrics . zipHalfs . splitHalf) $ permutations xs
>>> getPairs [1,2,3,4]
[[(1,3),(3,1),(2,4),(4,2)],[(2,3),(3,2),(1,4),(4,1)] ....

A better way of optimizing for permutation of function compositions over an input?

I have a list of functions and their 'apply priority'.
It looks like this. Length of it is 33
listOfAllFunctions = [ (f1, 1)
, (f2, 2)
, ...
, ...
, (f33, 33)
]
What I want to do is generate a list of permutations of the above list with no duplicates and I only want 8 unique elements in the inner list.
Which I'm implementing like this
prioratizedFunctions :: [[(MyDataType -> MyDataType, Int)]]
prioratizedFunctions = nubBy removeDuplicates
$ sortBy (comparing snd)
<$> take 8
<$> permutations listOfAllFunctions
where removeDuplicates is defined like
removeDuplicates a b = map snd a == map snd b
Lastly I'm turning the sublists which'd be [(MyDataType -> MyDataType, Int)] to a composition of functions and a [Int]
with this function
compFunc :: [(MyDataType -> MyDataType, Int)] -> MyDataType -> (MyDataType, [Int])
compFunc listOfDataAndInts target = (foldr ((.) . fst) id listOfDataAndInts target
, map snd listOfDataAndInts)
Applying the above function like this (flip compFunc) target <$> prioratizedFunctions
All of the above is a simplified version of the actual code but it should provide the gist it.
The problem is that this code takes practically forever to execute. From some prototyping I think the blame of it falls on my implementation of permutations function inside prioratizedFunctions.
So I was wondering, is there a better way of doing what I want (basically generating permutation of listOfAllFunctions where each list only contains 8 elements, every list of elements sorted by their priority with snd and containing no duplicate list)
or is the problem inherently a long process?
I was generating unnecessary permutations.
This choose function is basically a non-deterministic take function
choose 0 xs = [[]]
choose n [] = []
choose n (x:xs) = map (x:) (choose (n-1) xs) ++ choose n xs
which improved performance by a lot.

Generate all changes of signs

I have many lists. Say for short [[1,2],[3,4]].
I need to generate all changes of signs of each element. Thus, for the short example, the result would be
[[1,2],[3,4],[-1,2],[1,-2],[-1,-2],[-3,4],[3,-4],[-3,-4]]
Is there a package to perform such an operation ? Otherwise what algorithm could I use ? (I confess I have not thought a lot about it ...).
It this can help, all my lists have the same length.
Edit
Hmm.. maybe an idea like that:
x = [[2*i,2*j] | i <- [1, -1], j <- [-1,1]]
x
[[2,-2],[2,2],[-2,-2],[-2,2]]
The problem can be broken down to 2 steps:
For a given list of numbers, generate all the possible signs
For the list of lists, apply the function from (1) to each list, then concat the results.
For 1. you can write a simple recursive function that first processes the tail of the list, then for each resulting combination, it generates two versions for the two signs.
signs :: [Int] -> [[Int]]
signs [] = [[]]
signs (x : xs)
= let ps = signs xs
in map (x :) ps ++ map ((-x) :) ps
For 2. simply map the signs function over the input, and concat them. This is what the concatMap function does:
signsAll :: [[Int]] -> [[Int]]
signsAll = concatMap signs
I tried to use an applicative here, because it looks like you are almost there
[id,negate] <*> [3,4]
but it turned out that I need sequence and map, which, in this case, can be combined into a traverse:
traverse (\x->[x,-x]) [3,4]
[[3,4],[3,-4],[-3,4],[-3,-4]]
As others mentioned, now you need concatMap for your function:
concatMap (traverse (\x->[x,-x])) [[3,4],[1,2]]
[[3,4],[3,-4],[-3,4],[-3,-4],[1,2],[1,-2],[-1,2],[-1,-2]]

Generating subsets of set. Laziness?

I have written a function generating subsets of subset. It caused stack overflow when I use in the following way subsets [1..]. And it is "normal" behaviour when it comes to "normal" (no-lazy) languages. And now, I would like to improve my function to be lazy.
P.S. I don't understand laziness ( And I try to understand it) so perhaps my problem is strange for you- please explain. :)
P.S. 2 Feel free to say me something about my disability in Haskell ;)
subsets :: [a] -> [[a]]
subsets (x:xs) = (map (\ e -> x:e) (subsets xs)) ++ (subsets xs)
subsets [] = [[]]
There's two problems with that function. First, it recurses twice, which makes it exponentially more ineffiecient than necessary (if we disregard the exponential number of results...), because each subtree is recalculated every time for all overlapping subsets; this can be fixed by leting the recursive call be the same value:
subsets' :: [a] -> [[a]]
subsets' [] = [[]]
subsets' (x:xs) = let s = subsets' xs
in map (x:) s ++ s
This will already allow you to calculate length $ subsets' [1..25] in a few seconds, while length $ subsets [1..25] takes... well, I didn't wait ;)
The other issue is that with your version, when you give it an infinite list, it will recurse on the infinite tail of that list first. To generate all finite subsets in a meaningful way, we need to ensure two things: first, we must build up each set from smaller sets (to ensure termination), and second, we should ensure a fair order (ie., not generate the list [[1], [2], ...] first and never get to the rest). For this, we start from [[]] and recursively add the current element to everything we have already generated, and then remember the new list for the next step:
subsets'' :: [a] -> [[a]]
subsets'' l = [[]] ++ subs [[]] l
where subs previous (x:xs) = let next = map (x:) previous
in next ++ subs (previous ++ next) xs
subs _ [] = []
Which results in this order:
*Main> take 100 $ subsets'' [1..]
[[],[1],[2],[2,1],[3],[3,1],[3,2],[3,2,1],[4],[4,1],[4,2],[4,2,1],[4,3],[4,3,1],[4,3,2],[4,3,2,1],[5],[5,1],[5,2],[5,2,1],[5,3],[5,3,1],[5,3,2],[5,3,2,1],[5,4],[5,4,1],[5,4,2],[5,4,2,1],[5,4,3],[5,4,3,1],[5,4,3,2],[5,4,3,2,1],[6],[6,1],[6,2],[6,2,1],[6,3],[6,3,1],[6,3,2],[6,3,2,1],[6,4],[6,4,1],[6,4,2],[6,4,2,1],[6,4,3],[6,4,3,1],[6,4,3,2],[6,4,3,2,1],[6,5],[6,5,1],[6,5,2],[6,5,2,1],[6,5,3],[6,5,3,1],[6,5,3,2],[6,5,3,2,1],[6,5,4],[6,5,4,1],[6,5,4,2],[6,5,4,2,1],[6,5,4,3],[6,5,4,3,1],[6,5,4,3,2],[6,5,4,3,2,1],[7],[7,1],[7,2],[7,2,1],[7,3],[7,3,1],[7,3,2],[7,3,2,1],[7,4],[7,4,1],[7,4,2],[7,4,2,1],[7,4,3],[7,4,3,1],[7,4,3,2],[7,4,3,2,1],[7,5],[7,5,1],[7,5,2],[7,5,2,1],[7,5,3],[7,5,3,1],[7,5,3,2],[7,5,3,2,1],[7,5,4],[7,5,4,1],[7,5,4,2],[7,5,4,2,1],[7,5,4,3],[7,5,4,3,1],[7,5,4,3,2],[7,5,4,3,2,1],[7,6],[7,6,1],[7,6,2],[7,6,2,1]]
You can't generate all the subsets of an infinite set: they form an uncountable set. Cardinality makes it impossible.
At most, you can try to generate all the finite subsets. For that, you can't proceed by induction, from [] onwards, since you'll never reach []. You need to proceed inductively from the beginning of the list, instead of the end.
A right fold solution would be:
powerset :: Foldable t => t a -> [[a]]
powerset xs = []: foldr go (const []) xs [[]]
where go x f a = let b = (x:) <$> a in b ++ f (a ++ b)
then:
\> take 8 $ powerset [1..]
[[],[1],[2],[2,1],[3],[3,1],[3,2],[3,2,1]]

Long working of program that count Ints

I want to write program that takes array of Ints and length and returns array that consist in position i all elements, that equals i, for example
[0,0,0,1,3,5,3,2,2,4,4,4] 6 -> [[0,0,0],[1],[2,2],[3,3],[4,4,4],[5]]
[0,0,4] 7 -> [[0,0],[],[],[],[4],[],[]]
[] 3 -> [[],[],[]]
[2,2] 3 -> [[],[],[2,2]]
So, that's my solution
import Data.List
import Data.Function
f :: [Int] -> Int -> [[Int]]
f ls len = g 0 ls' [] where
ls' = group . sort $ ls
g :: Int -> [[Int]] -> [[Int]] -> [[Int]]
g val [] accum
| len == val = accum
| otherwise = g (val+1) [] (accum ++ [[]])
g val (x:xs) accum
| len == val = accum
| val == head x = g (val+1) xs (accum ++ [x])
| otherwise = g (val+1) (x:xs) (accum ++ [[]])
But query f [] 1000000 works really long, why?
I see we're accumulating over some data structure. I think foldMap. I ask "Which Monoid"? It's some kind of lists of accumulations. Like this
newtype Bunch x = Bunch {bunch :: [x]}
instance Semigroup x => Monoid (Bunch x) where
mempty = Bunch []
mappend (Bunch xss) (Bunch yss) = Bunch (glom xss yss) where
glom [] yss = yss
glom xss [] = xss
glom (xs : xss) (ys : yss) = (xs <> ys) : glom xss yss
Our underlying elements have some associative operator <>, and we can thus apply that operator pointwise to a pair of lists, just like zipWith does, except that when we run out of one of the lists, we don't truncate, rather we just take the other. Note that Bunch is a name I'm introducing for purposes of this answer, but it's not that unusual a thing to want. I'm sure I've used it before and will again.
If we can translate
0 -> Bunch [[0]] -- single 0 in place 0
1 -> Bunch [[],[1]] -- single 1 in place 1
2 -> Bunch [[],[],[2]] -- single 2 in place 2
3 -> Bunch [[],[],[],[3]] -- single 3 in place 3
...
and foldMap across the input, then we'll get the right number of each in each place. There should be no need for an upper bound on the numbers in the input to get a sensible output, as long as you are willing to interpret [] as "the rest is silence". Otherwise, like Procrustes, you can pad or chop to the length you need.
Note, by the way, that when mappend's first argument comes from our translation, we do a bunch of ([]++) operations, a.k.a. ids, then a single ([i]++), a.k.a. (i:), so if foldMap is right-nested (which it is for lists), then we will always be doing cheap operations at the left end of our lists.
Now, as the question works with lists, we might want to introduce the Bunch structure only when it's useful. That's what Control.Newtype is for. We just need to tell it about Bunch.
instance Newtype (Bunch x) [x] where
pack = Bunch
unpack = bunch
And then it's
groupInts :: [Int] -> [[Int]]
groupInts = ala' Bunch foldMap (basis !!) where
basis = ala' Bunch foldMap id [iterate ([]:) [], [[[i]] | i <- [0..]]]
What? Well, without going to town on what ala' is in general, its impact here is as follows:
ala' Bunch foldMap f = bunch . foldMap (Bunch . f)
meaning that, although f is a function to lists, we accumulate as if f were a function to Bunches: the role of ala' is to insert the correct pack and unpack operations to make that just happen.
We need (basis !!) :: Int -> [[Int]] to be our translation. Hence basis :: [[[Int]]] is the list of images of our translation, computed on demand at most once each (i.e., the translation, memoized).
For this basis, observe that we need these two infinite lists
[ [] [ [[0]]
, [[]] , [[1]]
, [[],[]] , [[2]]
, [[],[],[]] , [[3]]
... ...
combined Bunchwise. As both lists have the same length (infinity), I could also have written
basis = zipWith (++) (iterate ([]:) []) [[[i]] | i <- [0..]]
but I thought it was worth observing that this also is an example of Bunch structure.
Of course, it's very nice when something like accumArray hands you exactly the sort of accumulation you need, neatly packaging a bunch of grungy behind-the-scenes mutation. But the general recipe for an accumulation is to think "What's the Monoid?" and "What do I do with each element?". That's what foldMap asks you.
The (++) operator copies the left-hand list. For this reason, adding to the beginning of a list is quite fast, but adding to the end of a list is very slow.
In summary, avoid adding things to the end of a list. Try to always add to the beginning instead. One simple way to do that is to build the list backwards, and then reverse it at the end. A more devious trick is to use "difference lists" (Google it). Another possibility is to use Data.Sequence rather than a list.
The first thing that should be noted is the most obvious way to implement this is use a data structure that allows random access, an array is an obviously choice. Note that you need to add the elements to the array multiple times and somehow "join them".
accumArray is perfect for this.
So we get:
f l i = elems $ accumArray (\l e -> e:l) [] (0,i-1) (map (\e -> (e,e)) l)
And we're good to go (see full code here).
This approach does involve converting the final array back into a list, but that step is very likely faster than say sorting the list, which often involves scanning the list at least a few times for a list of decent size.
Whenever you use ++ you have to recreate the entire list, since lists are immutable.
A simple solution would be to use :, but that builds a reversed list. However that can be fixed using reverse, which results in only building two lists (instead of 1 million in your case).
Your concept of glomming things onto an accumulator is a very useful one, and both MathematicalOrchid and Guvante show how you can use that concept reasonably efficiently. But in this case, there is a simpler approach that is likely also faster. You started with
group . sort $ ls
and this was a very good place to start! You get a list that's almost the one you want, except that you need to fill in some blanks. How can we figure those out? The simplest way, though probably not quite the most efficient, is to work with a list of all the numbers you want to count up to: [0 .. len-1].
So we start with
f ls len = g [0 .. len-1] (group . sort $ ls)
where
?
How do we define g? By pattern matching!
f ls len = g [0 .. len-1] (group . sort $ ls)
where
-- We may or may not have some lists left,
-- but we counted as high as we decided we
-- would
g [] _ = []
-- We have no lists left, so the rest of the
-- numbers are not represented
g ns [] = map (const []) ns
-- This shouldn't be possible, because group
-- doesn't make empty lists.
g _ ([]:_) = error "group isn't working!"
-- Finally, we have some work to do!
g (n:ns) xls#(xl#(x:_):xls')
| n == x = xl : g ns xls'
| otherwise = [] : g ns xls
That was nice, but making the list of numbers isn't free, so you might be wondering how you can optimize it. One method I invite you to try is using your original technique of keeping a separate counter, but following this same sort of structure.

Resources