Haskell: surprising behavior of "groupBy" - haskell

I'm trying to figure out the behavior of the library function groupBy (from Data.List), which purports to group elements of a list by an "equality test" function passed in as the first argument. The type signature suggests that the equality test just needs to have type
(a -> a -> Bool)
However, when I use (<) as the "equality test" in GHCi 6.6, the results are not what I expect:
ghci> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Instead I'd expect runs of strictly increasing numbers, like this:
[[1,2,3],[2,4],[1,5,9]]
What am I missing?

Have a look at the ghc implementation of groupBy:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
Now compare these two outputs:
Prelude List> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Prelude List> groupBy (<) [8, 2, 3, 2, 4, 1, 5, 9]
[[8],[2,3],[2,4],[1,5,9]]
In short, what happens is this: groupBy assumes that the given function (the first argument) tests for equality, and thus assumes that the comparison function is reflexive, transitive and symmetric (see equivalence relation). The problem here is that the less-than relation is not reflexive, nor symmetric.
Edit: The following implementation only assumes transitivity:
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _ [] = []
groupBy' _ [x] = [[x]]
groupBy' cmp (x:xs#(x':_)) | cmp x x' = (x:y):ys
| otherwise = [x]:r
where r#(y:ys) = groupBy' cmp xs

The fact that "<" isn't an equality test.
You might expect some behavior because you'd implement it differently, but that isn't what it promises.
An example of why what it outputs is a reasonable answer is if it sweeps through it, doing
[1, 2, 3, 2, 4, 1, 5, 9] ->
[[1,2,3], [2,4], [1,5,9]]
Now has 3 groups of equal elements. So it checks if any of them are in fact the same:
Since it knows all elements in each group is equal, it can just look at the first element in each, 1, 2 and 1.
1 > 2? Yes! So it merges the first two groups.
1 > 1? No! So it leaves the last group be.
And now it's compared all elements for equality.
...only, you didn't pass it the kind of function it expected.
In short, when it wants an equality test, give it an equality test.

The problem is that the reference implementation of groupBy in the Haskell Report compares elements against the first element, so the groups are not strictly increasing (they just have to be all bigger than the first element). What you want instead is a version of groupBy that tests on adjacent elements, like the implementation here.

I'd just like to point out that the groupBy function also requires your list to be sorted before being applied.
For example:
equalityOp :: (a, b1) -> (a, b2) -> Bool
equalityOp x y = fst x == fst y
testData = [(1, 2), (1, 4), (2, 3)]
correctAnswer = groupBy equalityOp testData == [[(1, 2), (1, 4)], [(2, 3)]]
otherTestData = [(1, 2), (2, 3), (1, 4)]
incorrectAnswer = groupBy equalityOp otherTestData == [[(1, 2)], [(2, 3)], [(1, 4)]]
This behaviour comes about because groupBy is using span in its definition. To get reasonable behaviour which doesn't rely on us having the underlying list in any particular order we can define a function:
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' eq [] = []
groupBy' eq (x:xs) = (x:similarResults) : (groupBy' eq differentResults)
where similarResults = filter (eq x) xs
differentResults = filter (not . eq x) xs

Related

Count all occurences of each element in a list

How do I efficiently count all occurences of each element in a list? I thought of using an associative list or some hash map, but immutability gets in the way and it's not evidently clear how an (hopefully) elegant solution should arise.
The signature could be like this:
countOccurences :: [a] -> [(a, Int)]
Example:
countOccurences [1, 1, 2, 3, 1, 2, 4]
results in
[(1, 3), (2, 2), (3, 1), (4, 1)]
(order is not important though).
group . sort will produce an output list such as
> group . sort $ [1, 1, 2, 3, 1, 2, 4]
[[1,1,1],[2,2],[3],[4]]
Hence,
> map (head &&& length) . group . sort $ [1, 1, 2, 3, 1, 2, 4]
[(1,3),(2,2),(3,1),(4,1)]
So, we obtain
import Data.List (group, sort)
import Control.Arrow ((&&&))
countOccurences :: Ord a => [a] -> [(a, Int)]
countOccurences = map (head &&& length) . group . sort
It should require only O(n log n) time.
Since chi provided a solution using group . sort, here is one that uses Data.Map:
import qualified Data.Map.Strict as M
import Data.Map.Strict (Map)
histogram :: Ord a => [a] -> Map a Int
histogram = M.fromListWith (+) . (`zip` [1,1..])
This also uses O(n log n) time.
I thought of using an associative list or some hash map, but immutability gets in the way
Data.Map is a tree-based associative map, so maybe this representation is for you.
If you'd rather like an [(a, Int)], M.assocs can convert the Data.Map back:
countOccurrences :: Ord a => [a] -> [(a, Int)]
countOccurrences = M.assocs . histogram

Split ranges in Haskell

Given a list like:
[1, 2, 2, 6, 7, 8, 10, 11, 12, 15]
Split it into blandly increasing ranges (maybe equal):
[[1, 2, 2], [6, 7, 8], [10, 11, 12], [15]]
I tried using a recursive approach:
splitRanges [] = [[]]
splitRanges (x:y:xs)
| x `elem` [y, y + 1] = [x, y] : splitRanges xs
| otherwise = xs
So if the item is one less or equal to the item after I fuse them. But it says I am trying to build an infinite type:
Occurs check: cannot construct the infinite type: a0 = [a0]
Expected type: [[a0]]
Actual type: [a0]
But what does [the fact that it is monotone] have to do with how the list is split?
That being strictly increasing would give different results.
Or are you really trying to say something else?
I hope I am not.
Will the list always be monotone?
No, splitting a monotone list means making it into just one sub-list.
If not, how should that affect the results?
If it is not monotone, you will have many sublists.
Is it always brown into groups of three?
No, the groups may contain n elements.
More examples would be good
splitRanges [1, 3] == [[1], [3]]
splitRanges [1, 2, 5] == [[1, 2], [3]]
splitRanges [0, 0, 1] == [[0, 0, 1]]
splitRanges [1, 5, 7, 9] == [[1], [5], [7], [9]]
I appreciate hints rather than full answers, as I would like to improve myself, copy-pasting is not improvement.
Try breaking the problem into more manageable parts.
First, how would you split just one blandly increasing range from the start of a list? Lets guess that should be splitOne :: [Integer] -> ([Integer], [Integer]).
Second, how can you repeatedly apply splitOne to the left-over list? Try implementing splitMany :: [Integer] -> [[Integer]] by using splitOne.
For splitOne, what should you be trying to find? The first position to split at. What are "split positions"? Lets make that up.
split 0 1 2 3 4 …
list [ | x1, | x2, | x3, | x4, | x5, …]
So a split at 0 is ([], [x1,x2,x3,x4,x5,…]), and a split at 3 is ([x1,x2,x3],[x4,x5,…]). What relationship can you see between the split position and the split list?
How do you determine the first split position of the list? Lets say that is implemented as firstSplitPos :: [Integer] -> Integer. What is the first split position of an empty list?
Can you now implement splitOne using firstSplitPos?
One Possible Answer
-- What are the adjacencies for:
-- 1) empty lists?
-- 2) lists with one element?
-- 3) lists with more than one element?
--
-- Bonus: rewrite in point-free form using <*>
--
adjacencies :: [a] -> [(a,a)]
adjacencies xxs = zip xxs (drop 1 xxs)
-- Bonus: rewrite in point-free form
--
withIndices :: [a] -> [(Int,a)]
withIndices xxs = zip [0..] xxs
-- This is the most involved part of the answer. Pay close
-- attention to:
-- 1) empty lists
-- 2) lists with one element
-- 3) lists which are a blandly increasing sequence
--
firstSplitPos :: (Eq a, Num a) => [a] -> Int
firstSplitPos xxs = maybe (length xxs) pos (find q searchList)
where q (_,(a,b)) = a /= b && a + 1 /= b
searchList = withIndices (adjacencies xxs)
-- Why is the split position one more than the index?
pos (i,_) = i + 1
--
-- Bonus: rewrite in point-free form using <*>
--
splitOne :: (Eq a, Num a) => [a] -> ([a],[a])
splitOne xxs = splitAt (firstSplitPos xxs) xxs
splitMany :: (Eq a, Num a) => [a] -> [[a]]
-- What happens if we remove the case for []?
splitMany [] = []
splitMany xxs = let (l, r) = splitOne xxs in l : splitMany r
Another Approach
This is my explanation of Carsten's solution. It is already succinct but I have elected for a variation which does not use a 2-tuple.
We know that Haskell lists are defined inductively. To demonstrate this, we can define an equivalent data type.
data List a = Cons a (List a) -- Cons = (:)
| Nil -- Nil = []
Then ask the question: can we use induction on lists for the solution? If so, we only have to solve two cases: Cons and Nil. The type signature of foldr shows us exactly that:
foldr :: (a -> b -> b) -- Cons case
-> b -- Nil case
-> [a] -- The list
-> b -- The result
What if the list is Nil? Then the only blandly increasing sequence is the empty sequence. Therefore:
nilCase = [[]]
We might want nilCase = [] instead, as that also seems reasonable — i.e. there are no blandly increasing sequences.
Now you need some imagination. In the Cons case we only get to look at one new element at a time. With this new element, we could decide whether it belongs to the right-adjacent sequence or if it begins a new sequence.
What do I mean by right-adjacent? In [5,4,1,2,2,7], 1 belongs to the right-adjacent sequence [2,2].
How might this look?
-- The rest of the list is empty
consCase new [] = [new] : []
-- The right-adjacent sequence is empty
consCase new ([]:ss) = [new] : ss
-- The right-adjacent sequence is non-empty
-- Why `new + 1 == x` and not `new == x + 1`?
consCase new sss#(xxs#(x:_):ss)
| new == x || new + 1 == x = (new:xxs):ss
| otherwise = [new]:sss
Now that we solved the Nil case and the Cons case, we are done!
splitRanges = foldr consCase nilCase
It would be useful and idiomatic to write your function to take a predicate, instead of writing your split condition into the function itself:
splitBy2 :: (a -> a -> Bool) -> [a] -> [[a]]
splitBy2 ok xs = snd $ f xs [] []
where f (a:b:xs) acc_list acc_out_lists | ok a b = ...
I hope you don't mind spoiling part of it, but as the comments are discussing what you want (and I hope I've got it) maybe you are interested in another possible solution?
I don't want to spoil it all but I think you can easily work this out:
blandly :: (Ord a, Num a) => [a] -> [[a]]
blandly = g . foldr f ([],[])
where f x ([],xss) = ([x],xss)
f x (y:ys,xss)
| abs (x-y) <= 1 = undefined
| otherwise = undefined
g (ys,xss) = undefined
you just have to fill in the undefined holes
The idea is just to fold the list from the right, accumulating your inner lists in the first item of the tuple, s long as the elements are not to far away; and if they are: to push it to the second item.
If done correctly it will yield:
λ> blandly [1,3]
[[1],[3]]
λ> blandly [1,2,5]
[[1,2],[5]]
λ> blandly [0,0,1]
[[0,0,1]]
λ> blandly [1,5,7,9]
[[1],[5],[7],[9]]
which seems to be what you want
1 hour later - I think I can post my solution - just stop reading if you don't want to get spoiled
blandly :: (Ord a, Num a) => [a] -> [[a]]
blandly = uncurry (:) . foldr f ([],[])
where f x ([],xs) = ([x],xs)
f x (y:ys,xs)
| abs (x-y) <= 1 = (x:y:ys,xs)
| otherwise = ([x],(y:ys):xs)
maybe I have a slight misunderstanding here (the examples did not specify it) - but if you want on only monotonic increasing inner lists you just have to change the abs part:
blandly :: (Ord a, Num a) => [a] -> [[a]]
blandly = uncurry (:) . foldr f ([],[])
where f x ([],xss) = ([x],xss)
f x (y:ys,xss)
| 0 <= y-x
&& y-x <= 1 = (x:y:ys,xss)
| otherwise = ([x],(y:ys):xss)

Get n elements of list having the highest property

I'm new to Haskell and trying to implement some genetic algorithms.
Currently I fail with the selection of the n best element of a list of individuals (where each individual is a list for itself.
An individual is created as follows:
ind1 :: [Int]
ind1 = [1, 1, 1, 1, 1, 1, 1]
ind2 :: [Int]
ind2 = [0, 0, 0, 0, 0, 0, 0]
The appropriate population consists of a list of those individuals:
pop :: [[Int]]
pop = [ind1, ind2]
What I want to achieve is to get the best n individuals of the population, where the "best" is determined by the sum of its elements, e.g.,
> sum ind1
7
> sum ind2
0
I started creating a function for creating tuples with individual and its quality:
f x = [(ind, sum ind) | ind <- x]
so at least I got something like this:
[([1, 1, 1, 1, 1, 1, 1], 7), ([0, 0, 0, 0, 0, 0, 0], 0)]
How do I get from here to the expected result? I do not even manage to get the "fst" of the tuple where "snd == max".
I started with recursive approaches as seen in different topics, but unfortunately without reasonable result.
Any suggestions, probably also where to read?
Thank you!
The best choice here is to use sortBy from Data.List:
sortBy :: (a -> a -> Ordering) -> [a] -> [a]
The sortBy function is higher order, so it takes a function as one of its arguments. The function it needs is one that takes two elements and returns a Ordering value (LT, EQ or GT). You can write your own custom comparison function, but the Data.Ord module has comparing, which exists to help with writing these comparison functions:
comparing :: Ord b => (a -> b) -> (a -> a -> Ordering)
Hopefully you can see how comparing pairs with sortBy, you pass it a function to convert your type to a known comparable type, and then you have a function of the right type to pass to sortBy. So in practice you can do
import Data.List (sortBy)
import Data.Ord (comparing)
-- Some types to make things more readable
type Individual = [Int]
type Fitness = Int
-- Here's our fitness function (change as needed)
fitness :: Individual -> Fitness
fitness = sum
-- Redefining so it can be used with `map`
f :: Individual -> (Individual, Fitness)
f ind = (ind, fitness ind)
-- If you do want to see the fitness of the top n individuals
solution1 :: Int -> [Individual] -> [(Individual, Fitness)]
solution1 n inds = take n $ sortBy (flip $ comparing snd) $ map f inds
-- If you just want the top n individuals
solution2 :: Int -> [Individual] -> [Individual]
solution2 n inds = take n $ sortBy (flip $ comparing fitness) inds
The flip in the arguments to sortBy forces the sort to be descending instead of the default ascending, so the first n values returned from sortBy will be the n values with the highest fitness in descending order. If you wanted to try out different fitness functions then you could do something like
fittestBy :: (Individual -> Fitness) -> Int -> [Individual] -> [Individual]
fittestBy fit n = take n . sortBy (flip $ comparing fit)
Then you'd have
solution2 = fittestBy sum
But you could also have
solution3 = fittestBy product
if you wanted to change your fitness function to be the product rather than the sum.
Use sortBy and on.
> take 2 $ sortBy (flip compare `on` sum) [[1,2],[0,4],[1,1]]
[[0,4],[1,2]]

Haskell: List combination for Integers

I have a given list, e.g. [2, 3, 5, 587] and I want to have a complete list of the combination. So want something like [2, 2*3,2*5, 2*587, 3, 3*5, 3*587, 5, 5*587, 587]. Since I am on beginner level with Haskell I am curious how a list manipulation would look like.
Additionally I am curious if the computation of the base list might be expensive how would this influence the costs of the function? (If I would assume the list has limit values, i.e < 20)
Rem.: The order of the list could be done afterwards, but I have really no clue if this is cheaper within the function or afterwards.
The others have explained how to make pairs, so I concern myself here with getting the combinations.
If you want the combinations of all lengths, that's just the power set of your list, and can be computed the following way:
powerset :: [a] -> [[a]]
powerset (x:xs) = let xs' = powerset xs in xs' ++ map (x:) xs'
powerset [] = [[]]
-- powerset [1, 2] === [[],[2],[1],[1,2]]
-- you can take the products:
-- map product $ powerset [1, 2] == [1, 2, 1, 2]
There's an alternative powerset implementation in Haskell that's considered sort of a classic:
import Control.Monad
powerset = filterM (const [True, False])
You could look at the source of filterM to see how it works essentially the same way as the other powerset above.
On the other hand, if you'd like to have all the combinations of a certain size, you could do the following:
combsOf :: Int -> [a] -> [[a]]
combsOf n _ | n < 1 = [[]]
combsOf n (x:xs) = combsOf n xs ++ map (x:) (combsOf (n - 1) xs)
combsOf _ _ = []
-- combsOf 2 [1, 2, 3] === [[2,3],[1,3],[1,2]]
So it seems what you want is all pairs of products from the list:
ghci> :m +Data.List
ghci> [ a * b | a:bs <- tails [2, 3, 5, 587], b <- bs ]
[6,10,1174,15,1761,2935]
But you also want the inital numbers:
ghci> [ a * b | a:bs <- tails [2, 3, 5, 587], b <- 1:bs ]
[2,6,10,1174,3,15,1761,5,2935,587]
This uses a list comprehension, but this could also be done with regular list operations:
ghci> concatMap (\a:bs -> a : map (a*) bs) . init $ tails [2, 3, 5, 587]
[2,6,10,1174,3,15,1761,5,2935,587]
The latter is a little easier to explain:
Data.List.tails produces all the suffixes of a list:
ghci> tails [2, 3, 5, 587]
[[2,3,5,587],[3,5,587],[5,587],[587],[]]
Prelude.init drops the last element from a list. Here I use it to drop the empty suffix, since processing that causes an error in the next step.
ghci> init [[2,3,5,587],[3,5,587],[5,587],[587],[]]
[[2,3,5,587],[3,5,587],[5,587],[587]]
ghci> init $ tails [2, 3, 5, 587]
[[2,3,5,587],[3,5,587],[5,587],[587]]
Prelude.concatMap runs a function over each element of a list, and combines the results into a flattened list. So
ghci> concatMap (\a -> replicate a a) [1,2,3]
[1, 2, 2, 3, 3, 3]
\(a:bs) -> a : map (a*) bs does a couple things.
I pattern match on my argument, asserting that it matches an list with at least one element (which is why I dropped the empty list with init) and stuffs the initial element into a and the later elements into bs.
This produces a list that has the same first element as the argument a:, but
Multiplies each of the later elements by a (map (a*) bs).
You can get the suffixes of a list using Data.List.tails.
This gives you a list of lists, you can then do the inner multiplications you want on this list with a function like:
prodAll [] = []
prodAll (h:t) = h:(map (* h) $ t)
You can then map this function over each inner list and concatenate the results:
f :: Num a => [a] -> [a]
f = concat . map prodAll . tails

Merge multiple lists if condition is true

I've been trying to wrap my head around this for a while now, but it seems like my lack of Haskell experience just won't get me through it. I couldn't find a similar question here on Stackoverflow (most of them are related to merging all sublists, without any condition)
So here it goes. Let's say I have a list of lists like this:
[[1, 2, 3], [3, 5, 6], [20, 21, 22]]
Is there an efficient way to merge lists if some sort of condition is true? Let's say I need to merge lists that share at least one element. In case of example, result would be:
[[1, 2, 3, 3, 5, 6], [20, 21, 22]]
Another example (when all lists can be merged):
[[1, 2], [2, 3], [3, 4]]
And it's result:
[[1, 2, 2, 3, 3, 4]]
Thanks for your help!
I don't know what to say about efficiency, but we can break down what's going on and get several different functionalities at least. Particular functionalities might be optimizable, but it's important to clarify exactly what's needed.
Let me rephrase the question: For some set X, some binary relation R, and some binary operation +, produce a set Q = {x+y | x in X, y in X, xRy}. So for your example, we might have X being some set of lists, R being "xRy if and only if there's at least one element in both x and y", and + being ++.
A naive implementation might just copy the set-builder notation itself
shareElement :: Eq a => [a] -> [a] -> Bool
shareElement xs ys = or [x == y | x <- xs, y <- ys]
v1 :: (a -> a -> Bool) -> (a -> a -> b) -> [a] -> [b]
v1 (?) (<>) xs = [x <> y | x <- xs, y <- xs, x ? y]
then p = v1 shareElement (++) :: Eq a => [[a]] -> [[a]] might achieve what you want. Except it probably doesn't.
Prelude> p [[1], [1]]
[[1,1],[1,1],[1,1],[1,1]]
The most obvious problem is that we get four copies: two from merging the lists with themselves, two from merging the lists with each other "in both directions". The problem occurs because List isn't the same as Set so we can't kill uniques. Of course, that's an easy fix, we'll just use Set everywhere
import Data.Set as Set
v2 :: (a -> a -> Bool) -> (a -> a -> b) -> Set.Set a -> Set.Set b
v2 (?) (<>) = Set.fromList . v1 (?) (<>) . Set.toList
So we can try again, p = v2 (shareElementonSet.toList) Set.union with
Prelude Set> p $ Set.fromList $ map Set.fromList [[1,2], [2,1]]
fromList [fromList [1,2]]
which seems to work. Note that we have to "go through" List because Set can't be made an instance of Monad or Applicative due to its Ord constraint.
I'd also note that there's a lot of lost behavior in Set. For instance, we fight either throwing away order information in the list or having to handle both x <> y and y <> x when our relation is symmetric.
Some more convenient versions can be written like
v3 :: Monoid a => (a -> a -> Bool) -> [a] -> [a]
v3 r = v2 r mappend
and more efficient ones can be built if we assume that the relationship is, say, an equality relation since then instead of having an O(n^2) operation we can do it in O(nd) where d is the number of partitions (cosets) of the relation.
Generally, it's a really interesting problem.
I just happened to write something similar here: Finding blocks in arrays
You can just modify it so (although I'm not too sure about the efficiency):
import Data.List (delete, intersect)
example1 = [[1, 2, 3], [3, 5, 6], [20, 21, 22]]
example2 = [[1, 2], [2, 3], [3, 4]]
objects zs = map concat . solve zs $ [] where
areConnected x y = not . null . intersect x $ y
solve [] result = result
solve (x:xs) result =
let result' = solve' xs [x]
in solve (foldr delete xs result') (result':result) where
solve' xs result =
let ys = filter (\y -> any (areConnected y) result) xs
in if null ys
then result
else solve' (foldr delete xs ys) (ys ++ result)
OUTPUT:
*Main> objects example1
[[20,21,22],[3,5,6,1,2,3]]
*Main> objects example2
[[3,4,2,3,1,2]]

Resources