Count all occurences of each element in a list - haskell

How do I efficiently count all occurences of each element in a list? I thought of using an associative list or some hash map, but immutability gets in the way and it's not evidently clear how an (hopefully) elegant solution should arise.
The signature could be like this:
countOccurences :: [a] -> [(a, Int)]
Example:
countOccurences [1, 1, 2, 3, 1, 2, 4]
results in
[(1, 3), (2, 2), (3, 1), (4, 1)]
(order is not important though).

group . sort will produce an output list such as
> group . sort $ [1, 1, 2, 3, 1, 2, 4]
[[1,1,1],[2,2],[3],[4]]
Hence,
> map (head &&& length) . group . sort $ [1, 1, 2, 3, 1, 2, 4]
[(1,3),(2,2),(3,1),(4,1)]
So, we obtain
import Data.List (group, sort)
import Control.Arrow ((&&&))
countOccurences :: Ord a => [a] -> [(a, Int)]
countOccurences = map (head &&& length) . group . sort
It should require only O(n log n) time.

Since chi provided a solution using group . sort, here is one that uses Data.Map:
import qualified Data.Map.Strict as M
import Data.Map.Strict (Map)
histogram :: Ord a => [a] -> Map a Int
histogram = M.fromListWith (+) . (`zip` [1,1..])
This also uses O(n log n) time.
I thought of using an associative list or some hash map, but immutability gets in the way
Data.Map is a tree-based associative map, so maybe this representation is for you.
If you'd rather like an [(a, Int)], M.assocs can convert the Data.Map back:
countOccurrences :: Ord a => [a] -> [(a, Int)]
countOccurrences = M.assocs . histogram

Related

Haskell: List combination for Integers

I have a given list, e.g. [2, 3, 5, 587] and I want to have a complete list of the combination. So want something like [2, 2*3,2*5, 2*587, 3, 3*5, 3*587, 5, 5*587, 587]. Since I am on beginner level with Haskell I am curious how a list manipulation would look like.
Additionally I am curious if the computation of the base list might be expensive how would this influence the costs of the function? (If I would assume the list has limit values, i.e < 20)
Rem.: The order of the list could be done afterwards, but I have really no clue if this is cheaper within the function or afterwards.
The others have explained how to make pairs, so I concern myself here with getting the combinations.
If you want the combinations of all lengths, that's just the power set of your list, and can be computed the following way:
powerset :: [a] -> [[a]]
powerset (x:xs) = let xs' = powerset xs in xs' ++ map (x:) xs'
powerset [] = [[]]
-- powerset [1, 2] === [[],[2],[1],[1,2]]
-- you can take the products:
-- map product $ powerset [1, 2] == [1, 2, 1, 2]
There's an alternative powerset implementation in Haskell that's considered sort of a classic:
import Control.Monad
powerset = filterM (const [True, False])
You could look at the source of filterM to see how it works essentially the same way as the other powerset above.
On the other hand, if you'd like to have all the combinations of a certain size, you could do the following:
combsOf :: Int -> [a] -> [[a]]
combsOf n _ | n < 1 = [[]]
combsOf n (x:xs) = combsOf n xs ++ map (x:) (combsOf (n - 1) xs)
combsOf _ _ = []
-- combsOf 2 [1, 2, 3] === [[2,3],[1,3],[1,2]]
So it seems what you want is all pairs of products from the list:
ghci> :m +Data.List
ghci> [ a * b | a:bs <- tails [2, 3, 5, 587], b <- bs ]
[6,10,1174,15,1761,2935]
But you also want the inital numbers:
ghci> [ a * b | a:bs <- tails [2, 3, 5, 587], b <- 1:bs ]
[2,6,10,1174,3,15,1761,5,2935,587]
This uses a list comprehension, but this could also be done with regular list operations:
ghci> concatMap (\a:bs -> a : map (a*) bs) . init $ tails [2, 3, 5, 587]
[2,6,10,1174,3,15,1761,5,2935,587]
The latter is a little easier to explain:
Data.List.tails produces all the suffixes of a list:
ghci> tails [2, 3, 5, 587]
[[2,3,5,587],[3,5,587],[5,587],[587],[]]
Prelude.init drops the last element from a list. Here I use it to drop the empty suffix, since processing that causes an error in the next step.
ghci> init [[2,3,5,587],[3,5,587],[5,587],[587],[]]
[[2,3,5,587],[3,5,587],[5,587],[587]]
ghci> init $ tails [2, 3, 5, 587]
[[2,3,5,587],[3,5,587],[5,587],[587]]
Prelude.concatMap runs a function over each element of a list, and combines the results into a flattened list. So
ghci> concatMap (\a -> replicate a a) [1,2,3]
[1, 2, 2, 3, 3, 3]
\(a:bs) -> a : map (a*) bs does a couple things.
I pattern match on my argument, asserting that it matches an list with at least one element (which is why I dropped the empty list with init) and stuffs the initial element into a and the later elements into bs.
This produces a list that has the same first element as the argument a:, but
Multiplies each of the later elements by a (map (a*) bs).
You can get the suffixes of a list using Data.List.tails.
This gives you a list of lists, you can then do the inner multiplications you want on this list with a function like:
prodAll [] = []
prodAll (h:t) = h:(map (* h) $ t)
You can then map this function over each inner list and concatenate the results:
f :: Num a => [a] -> [a]
f = concat . map prodAll . tails

Convert List of Tuples to List of Lists Haskell

I have [("m","n"),("p","q"),("r","s")]. How can I convert it to [["m","n"],["p","q"],["r","s"]]?
Can anyone please help me? Thanks.
Write a single function to convert a pair to a list:
pairToList :: (a, a) -> [a]
pairToList (x,y) = [x,y]
Then you only have to map pairToList:
tuplesToList :: [(a,a)] -> [[a]]
tuplesToList = map pairToList
Or in a single line:
map (\(x,y) -> [x,y])
Using lens you can do this succinctly for arbitrary length homogenous tuples:
import Control.Lens
map (^..each) [("m","n"),("p","q"),("r","s")] -- [["m","n"],["p","q"],["r","s"]]
map (^..each) [(1, 2, 3)] -- [[1, 2, 3]]
Note though that the lens library is complex and rather beginner-unfriendly.
List comprehension version:
[[x,y] | (x,y) <- [("m","n"),("p","q"),("r","s")]]

Merge multiple lists if condition is true

I've been trying to wrap my head around this for a while now, but it seems like my lack of Haskell experience just won't get me through it. I couldn't find a similar question here on Stackoverflow (most of them are related to merging all sublists, without any condition)
So here it goes. Let's say I have a list of lists like this:
[[1, 2, 3], [3, 5, 6], [20, 21, 22]]
Is there an efficient way to merge lists if some sort of condition is true? Let's say I need to merge lists that share at least one element. In case of example, result would be:
[[1, 2, 3, 3, 5, 6], [20, 21, 22]]
Another example (when all lists can be merged):
[[1, 2], [2, 3], [3, 4]]
And it's result:
[[1, 2, 2, 3, 3, 4]]
Thanks for your help!
I don't know what to say about efficiency, but we can break down what's going on and get several different functionalities at least. Particular functionalities might be optimizable, but it's important to clarify exactly what's needed.
Let me rephrase the question: For some set X, some binary relation R, and some binary operation +, produce a set Q = {x+y | x in X, y in X, xRy}. So for your example, we might have X being some set of lists, R being "xRy if and only if there's at least one element in both x and y", and + being ++.
A naive implementation might just copy the set-builder notation itself
shareElement :: Eq a => [a] -> [a] -> Bool
shareElement xs ys = or [x == y | x <- xs, y <- ys]
v1 :: (a -> a -> Bool) -> (a -> a -> b) -> [a] -> [b]
v1 (?) (<>) xs = [x <> y | x <- xs, y <- xs, x ? y]
then p = v1 shareElement (++) :: Eq a => [[a]] -> [[a]] might achieve what you want. Except it probably doesn't.
Prelude> p [[1], [1]]
[[1,1],[1,1],[1,1],[1,1]]
The most obvious problem is that we get four copies: two from merging the lists with themselves, two from merging the lists with each other "in both directions". The problem occurs because List isn't the same as Set so we can't kill uniques. Of course, that's an easy fix, we'll just use Set everywhere
import Data.Set as Set
v2 :: (a -> a -> Bool) -> (a -> a -> b) -> Set.Set a -> Set.Set b
v2 (?) (<>) = Set.fromList . v1 (?) (<>) . Set.toList
So we can try again, p = v2 (shareElementonSet.toList) Set.union with
Prelude Set> p $ Set.fromList $ map Set.fromList [[1,2], [2,1]]
fromList [fromList [1,2]]
which seems to work. Note that we have to "go through" List because Set can't be made an instance of Monad or Applicative due to its Ord constraint.
I'd also note that there's a lot of lost behavior in Set. For instance, we fight either throwing away order information in the list or having to handle both x <> y and y <> x when our relation is symmetric.
Some more convenient versions can be written like
v3 :: Monoid a => (a -> a -> Bool) -> [a] -> [a]
v3 r = v2 r mappend
and more efficient ones can be built if we assume that the relationship is, say, an equality relation since then instead of having an O(n^2) operation we can do it in O(nd) where d is the number of partitions (cosets) of the relation.
Generally, it's a really interesting problem.
I just happened to write something similar here: Finding blocks in arrays
You can just modify it so (although I'm not too sure about the efficiency):
import Data.List (delete, intersect)
example1 = [[1, 2, 3], [3, 5, 6], [20, 21, 22]]
example2 = [[1, 2], [2, 3], [3, 4]]
objects zs = map concat . solve zs $ [] where
areConnected x y = not . null . intersect x $ y
solve [] result = result
solve (x:xs) result =
let result' = solve' xs [x]
in solve (foldr delete xs result') (result':result) where
solve' xs result =
let ys = filter (\y -> any (areConnected y) result) xs
in if null ys
then result
else solve' (foldr delete xs ys) (ys ++ result)
OUTPUT:
*Main> objects example1
[[20,21,22],[3,5,6,1,2,3]]
*Main> objects example2
[[3,4,2,3,1,2]]

How to group similar items in a list using Haskell?

Given a list of tuples like this:
dic = [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
How to group items of dic resulting in a list grp where,
grp = [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]
I'm actually a newcomer to Haskell...and seems to be falling in love with it..
Using group or groupBy in Data.List will only group similar adjacent items in a list.
I wrote an inefficient function for this, but it results in memory failures as I need to process a very large coded string list. Hope you would help me find a more efficient way.
Whenever possible, reuse library code.
import Data.Map
sortAndGroup assocs = fromListWith (++) [(k, [v]) | (k, v) <- assocs]
Try it out in ghci:
*Main> sortAndGroup [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
fromList [(1,["bb","cc","aa"]),(2,["aa"]),(3,["gg","ff"])]
EDIT In the comments, some folks are worried about whether (++) or flip (++) is the right choice. The documentation doesn't say which way things get associated; you can find out by experimenting, or you can sidestep the whole issue using difference lists:
sortAndGroup assocs = ($[]) <$> fromListWith (.) [(k, (v:)) | (k, v) <- assocs]
-- OR
sortAndGroup = fmap ($[]) . M.fromListWith (.) . map (fmap (:))
These alternatives are about the same length as the original, but they're a bit less readable to me.
Here's my solution:
import Data.Function (on)
import Data.List (sortBy, groupBy)
import Data.Ord (comparing)
myGroup :: (Eq a, Ord a) => [(a, b)] -> [(a, [b])]
myGroup = map (\l -> (fst . head $ l, map snd l)) . groupBy ((==) `on` fst)
. sortBy (comparing fst)
This works by first sorting the list with sortBy:
[(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
=> [(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]
then grouping the list elements by the associated key with groupBy:
[(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]
=> [[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]
and then transforming the grouped items to tuples with map:
[[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]
=> [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]`)
Testing:
> myGroup dic
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]
Also you can use TransformListComp extension, for example:
Prelude> :set -XTransformListComp
Prelude> import GHC.Exts (groupWith, the)
Prelude GHC.Exts> let dic = [ (1, "aa"), (1, "bb"), (1, "cc") , (2, "aa"), (3, "ff"), (3, "gg")]
Prelude GHC.Exts> [(the key, value) | (key, value) <- dic, then group by key using groupWith]
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]
If the list is not sorted on the first element, I don't think you can do better than O(nlog(n)).
One simple way would be to just sort and then use anything from the answer of second part.
You can use from Data.Map a map like Map k [a] to use first element of tuple as key and keep on adding to the values.
You can write your own complex function, which even after you all the attempts will still take O(nlog(n)).
If list is sorted on the first element as is the case in your example, then the task is trivial for something like groupBy as given in the answer by #Mikhail or use foldr and there are numerous other ways.
An example of using foldr is here:
grp :: Eq a => [(a,b)] -> [(a,[b])]
grp = foldr f []
where
f (z,s) [] = [(z,[s])]
f (z,s) a#((x,y):xs) | x == z = (x,s:y):xs
| otherwise = (z,[s]):a
{-# LANGUAGE TransformListComp #-}
import GHC.Exts
import Data.List
import Data.Function (on)
process :: [(Integer, String)] -> [(Integer, [String])]
process list = [(the a, b) | let info = [ (x, y) | (x, y) <- list, then sortWith by y ], (a, b) <- info, then group by a using groupWith]

Haskell: surprising behavior of "groupBy"

I'm trying to figure out the behavior of the library function groupBy (from Data.List), which purports to group elements of a list by an "equality test" function passed in as the first argument. The type signature suggests that the equality test just needs to have type
(a -> a -> Bool)
However, when I use (<) as the "equality test" in GHCi 6.6, the results are not what I expect:
ghci> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Instead I'd expect runs of strictly increasing numbers, like this:
[[1,2,3],[2,4],[1,5,9]]
What am I missing?
Have a look at the ghc implementation of groupBy:
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
Now compare these two outputs:
Prelude List> groupBy (<) [1, 2, 3, 2, 4, 1, 5, 9]
[[1,2,3,2,4],[1,5,9]]
Prelude List> groupBy (<) [8, 2, 3, 2, 4, 1, 5, 9]
[[8],[2,3],[2,4],[1,5,9]]
In short, what happens is this: groupBy assumes that the given function (the first argument) tests for equality, and thus assumes that the comparison function is reflexive, transitive and symmetric (see equivalence relation). The problem here is that the less-than relation is not reflexive, nor symmetric.
Edit: The following implementation only assumes transitivity:
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _ [] = []
groupBy' _ [x] = [[x]]
groupBy' cmp (x:xs#(x':_)) | cmp x x' = (x:y):ys
| otherwise = [x]:r
where r#(y:ys) = groupBy' cmp xs
The fact that "<" isn't an equality test.
You might expect some behavior because you'd implement it differently, but that isn't what it promises.
An example of why what it outputs is a reasonable answer is if it sweeps through it, doing
[1, 2, 3, 2, 4, 1, 5, 9] ->
[[1,2,3], [2,4], [1,5,9]]
Now has 3 groups of equal elements. So it checks if any of them are in fact the same:
Since it knows all elements in each group is equal, it can just look at the first element in each, 1, 2 and 1.
1 > 2? Yes! So it merges the first two groups.
1 > 1? No! So it leaves the last group be.
And now it's compared all elements for equality.
...only, you didn't pass it the kind of function it expected.
In short, when it wants an equality test, give it an equality test.
The problem is that the reference implementation of groupBy in the Haskell Report compares elements against the first element, so the groups are not strictly increasing (they just have to be all bigger than the first element). What you want instead is a version of groupBy that tests on adjacent elements, like the implementation here.
I'd just like to point out that the groupBy function also requires your list to be sorted before being applied.
For example:
equalityOp :: (a, b1) -> (a, b2) -> Bool
equalityOp x y = fst x == fst y
testData = [(1, 2), (1, 4), (2, 3)]
correctAnswer = groupBy equalityOp testData == [[(1, 2), (1, 4)], [(2, 3)]]
otherTestData = [(1, 2), (2, 3), (1, 4)]
incorrectAnswer = groupBy equalityOp otherTestData == [[(1, 2)], [(2, 3)], [(1, 4)]]
This behaviour comes about because groupBy is using span in its definition. To get reasonable behaviour which doesn't rely on us having the underlying list in any particular order we can define a function:
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' eq [] = []
groupBy' eq (x:xs) = (x:similarResults) : (groupBy' eq differentResults)
where similarResults = filter (eq x) xs
differentResults = filter (not . eq x) xs

Resources