Haskell equivalent to Scala's groupBy - haskell

Scala has a function groupBy on lists that accepts a function for extracting keys from list items, and returns another list where the items are tuples consisting of the key and the list of items producing that key. In other words, something like this:
List(1,2,3,4,5,6,7,8,9).groupBy(_ % 2)
// List((0, List(2,4,6,8)), (1, List(1,3,5,7,9)))
(Actually, it looks like in current versions it provides a Map instead, but that's not important). C# has an even more useful version that lets you map the values at the same time (very useful if, say, your key function is just extracting part of a tuple).
Haskell has a groupBy, but it's somewhat different - it groups runs of things according to some comparison function.
Before I go and write it, is there an equivalent of Scala's groupBy in Haskell? Hoogle doesn't have anything for what I'd expect the signature to look like (below), but I may have just got it wrong.
Eq b => (a -> b) -> [a] -> [(b,[a])]

You can write the function yourself rather easily, but you need to place an Ord or Hashable constraint on the result of the classifier function if you want an efficient solution. Example:
import Control.Arrow ((&&&))
import Data.List
import Data.Function
myGroupBy :: (Ord b) => (a -> b) -> [a] -> [(b, [a])]
myGroupBy f = map (f . head &&& id)
. groupBy ((==) `on` f)
. sortBy (compare `on` f)
> myGroupBy (`mod` 2) [1..9]
[(0,[2,4,6,8]),(1,[1,3,5,7,9])]
You can also use a hash map like Data.HashMap.Strict instead of sorting for expected linear time.

Specifically, the following should work:
scalaGroupBy f = groupBy ((==) `on` f) . sortBy (comparing f)
modulo that this doesn't get you the result of f in each group, but if you really need it you can always post-process with
map (\xs -> (f (head xs), xs)) . scalaGroupBy f

This isn't a function in the List library.
You can write it as the composition of sortBy and groupBy.

Putting a trace in f reveals that, with #Niklas solution, f is evaluated 3 times for each element on any list of length 2 or more. I took the liberty of modifying it so that f is applied to each element only once. It's not clear however whether the cost of creating and destroying tuples is less than the cost of evaluating f multiple times (since f can be arbitrary).
import Control.Arrow ((&&&))
import Data.List
import Data.Function
myGroupBy' :: (Ord b) => (a -> b) -> [a] -> [(b, [a])]
myGroupBy' f = map (fst . head &&& map snd)
. groupBy ((==) `on` fst)
. sortBy (compare `on` fst)
. map (f &&& id)

This solution will break and group by on (f x), regardless wether it is sorted or not
f = (`mod` (2::Int))
list = [1,3,4,6,8,9] :: [Int]
myGroupBy :: Eq t => (b -> t) -> [b] -> [(t, [b])]
myGroupBy f (z:zs) = reverse $ foldl (g f) [(f z,[z])] zs
where
-- folding function
g f ((tx, xs):previous) y = if (tx == ty)
then (tx, y:xs):previous
else (ty, [y]):(tx, reverse xs):previous
where ty = f y
main = print $ myGroupBy f list
result:
[(1,[1,3]),(0,[4,6,8]),(1,[9])]

Since Scala groupBy returns an immutable HashMap, which does not require ordering, the corresponding Haskell implementation should return a HashMap as well.
import qualified Data.HashMap.Strict as M
scalaGroupBy :: (Eq k, Hashable k) => (v -> k) -> [v] -> M.HashMap k [v]
scalaGroupBy f l = M.fromListWith (++) [ (f a, [a]) | a <- l]

We can also use the SQL-like then group by syntax in list comprehension, which requires TransformListComp language extension.
Since Scala groupBy returns a Map, we can call fromDistinctAscList to convert the list comprehension to a Map.
$ stack repl --package containers
Prelude> :set -XTransformListComp
Prelude> import Data.Map.Strict ( fromDistinctAscList, Map )
Prelude Data.Map.Strict> import GHC.Exts ( groupWith, the )
Prelude Data.Map.Strict GHC.Exts> :{
Prelude Data.Map.Strict GHC.Exts| scalaGroupBy f l =
Prelude Data.Map.Strict GHC.Exts| fromDistinctAscList
Prelude Data.Map.Strict GHC.Exts| [ (the key, value)
Prelude Data.Map.Strict GHC.Exts| | value <- l
Prelude Data.Map.Strict GHC.Exts| , let key = f value
Prelude Data.Map.Strict GHC.Exts| , then group by key using groupWith
Prelude Data.Map.Strict GHC.Exts| ]
Prelude Data.Map.Strict GHC.Exts| :}
Prelude Data.Map.Strict GHC.Exts> :type scalaGroupBy
scalaGroupBy :: Ord b => (t -> b) -> [t] -> Map b [t]
Prelude Data.Map.Strict GHC.Exts> scalaGroupBy (`mod` 2) [1, 2, 3, 4, 5, 6, 7, 8, 9]
fromList [(0,[2,4,6,8]),(1,[1,3,5,7,9])]
The only difference from Scala groupBy is that the above implementation returns a sorted map instead of a hash map. For implementation that returns a hash map, see my other answer at https://stackoverflow.com/a/64204797/955091.

Related

Removing duplicate elements in a Seq

wondering how to implement nub over a Seq a
I get that one could do:
nubSeq :: Seq a -> Seq a
nubSeq = fromList . nub . toList
Just wondering is there something standard that does not convert to Lists in order to call nub :: [a]->[a]?
An implementation that occurred to me, based obviously on nub, is:
nubSeq :: (Eq a) => Seq a -> Seq a
nubSeq = Data.Sequence.foldrWithIndex
(\_ x a -> case x `Data.Sequence.elemIndexR` a of
Just _ -> a
Nothing -> a |> x) Data.Sequence.empty
But there must be something more elegant?
thanks.
Not sure whether this qualifies as more elegant but it splits the concerns in independent functions (caveat: you need an Ord constraint on a):
seqToNubMap takes a Seq and outputs a Map associating to each a the smallest index at which it appeared in the sequence
mapToList takes a Map of values and positions and produces a list of values in increasing order according to the specified positions
nubSeq combines these to generate a sequence without duplicates
The whole thing should be O(n*log(n)), I believe:
module NubSeq where
import Data.Map as Map
import Data.List as List
import Data.Sequence as Seq
import Data.Function
seqToNubMap :: Ord a => Seq a -> Map a Int
seqToNubMap = foldlWithIndex (\ m k v -> insertWith min v k m) Map.empty
mapToList :: Ord a => Map a Int -> [a]
mapToList = fmap fst . List.sortBy (compare `on` snd) . Map.toList
nubSeq :: Ord a => Seq a -> Seq a
nubSeq = Seq.fromList . mapToList . seqToNubMap
Or a simpler alternative following #DavidFletcher's comment:
nubSeq' :: forall a. Ord a => Seq a -> Seq a
nubSeq' xs = Fold.foldr cons nil xs Set.empty where
cons :: a -> (Set a -> Seq a) -> (Set a -> Seq a)
cons x xs seen
| x `elem` seen = xs seen
| otherwise = x <| xs (Set.insert x seen)
nil :: Set a -> Seq a
nil _ = Seq.empty
Another way with an Ord constraint - use a scan to make the sets of
elements that appear in each prefix of the list. Then we can filter out
any element that's already been seen.
import Data.Sequence as Seq
import Data.Set as Set
nubSeq :: Ord a => Seq a -> Seq a
nubSeq xs = (fmap fst . Seq.filter (uncurry notElem)) (Seq.zip xs seens)
where
seens = Seq.scanl (flip Set.insert) Set.empty xs
Or roughly the same thing as a mapAccumL:
nubSeq' :: Ord a => Seq a -> Seq a
nubSeq' = fmap fst . Seq.filter snd . snd . mapAccumL f Set.empty
where
f s x = (Set.insert x s, (x, x `notElem` s))
(If I was using lists I would use Maybes instead of the pairs with
Bool, then use catMaybes instead of filtering. There doesn't seem to be catMaybes
for Sequence though.)
I think your code should be pretty efficient. Since Sequences are tree data structures using another tree type data structure like Map or HashMap to store and lookup the previous items doesn't make too much sense to me.
Instead i take the first item and check it's existence in the rest. If exists i drop that item and proceed the same with the rest recursively. If not then construct a new sequence with first element is the unique element and the rest is the result of nubSeq fed by the rest. Should be typical. I use ViewPatterns.
{-# LANGUAGE ViewPatterns #-}
import Data.Sequence as Seq
nubSeq :: Eq a => Seq a -> Seq a
nubSeq (viewl -> EmptyL) = empty
nubSeq (viewl -> (x :< xs)) | elemIndexL x xs == Nothing = x <| nubSeq xs
| otherwise = nubSeq xs
*Main> nubSeq . fromList $ [1,2,3,4,4,2,3,6,7,1,2,3,4]
fromList [6,7,1,2,3,4]

Filter Duplicate elements from a [[String]] Haskell

I have a list in the form [["A1","A1","A1"] .. ["G3","G3","G3"]] which contains many duplicate elements like ["A1","A2","A3"] and ["A3","A2","A1"].
How do I filter out such duplicate elements?
if check the above two elements for equality, it shows false
*Main> ["A1","A2","A3"] == ["A3","A2","A1"]
False
nubBy :: (a -> a -> Bool) -> [a] -> [a] is a relevant function that removes duplicates from a list via an arbitrary equality test.
A version of the function you're looking for is:
import Data.List (sort, nubBy)
removeDuplicates' :: Ord a => [[a]] -> [[a]]
removeDuplicates' = nubBy (\l1 l2 = sort l1 == sort l2)
Of course, this does require that a is an Ord, not just an Eq, as well as using sort, which is (as stated below) an expensive function. So it is certainly not ideal. However, I don't know specifically how you want to do the equality tests on those lists, so I'll leave the details to you.
#AJFarmar's answer solves the issue. But it can be done a bit more efficient: since sort is an expensive function. We want to save on such function calls.
We can use:
import Data.List(nubBy, sort)
import Data.Function(on)
removeDuplicates' :: Ord a => [[a]] -> [[a]]
removeDuplicates' = map snd . nubBy ((==) `on` fst) . map ((,) =<< sort)
what we here do is first construct a map ((,) =<< sort). This means that for every element x in the original list, we construct a tuple (sort x,x). Now we will perform a nubBy on the first elements of the two tuples we want to sort. After we have sorted, we will perform a map snd where we - for every tuple (sort x,x) return the second item.
We can generalize this by constructing a nubOn function:
import Data.List(nubBy)
import Data.Function(on)
nubOn :: Eq b => (a -> b) -> [a] -> [a]
nubOn f = map snd . nubBy ((==) `on` fst) . map ((,) =<< f)
In that case removeDuplicates' is nubOn sort.
You may not even need to sort. You just need to see if all items are the same like;
\xs ys -> length xs == (length . filter (== True) $ (==) <$> xs <*> ys)
you just need to know that (==) <$> ["A1","A2","A3"] <*> ["A3","A2","A1"] would in fact return [False,False,True,False,True,False,True,False,False]
As per #rampion 's rightful comment let's take it further and import Data.Set then it gets pretty dandy.
import Data.Set as S
equity :: Ord a => [a] -> [a] -> Bool
equity = (. S.fromList) . (==) . S.fromList
*Main> equity ["A1","A2","A3"] ["A3","A2","A1"]
True

How can I filter this list?

I have [a] that can be converted to [b]. Each a is distinct, but each b may not be. I want to filter my [a] on the condition that the filtered [a] contains no duplicates when converted to [b].
Can someone help me to achieve this?
Edit
To serve as assistance, I'll provide an example.
as = [1..10]
conv = even
bs = map even as
-- bs = [False,True,False,True,False,True,False,True,False,True]
-- filter <cond> as -- [1,2]
Assume that f is the function that converts from a to b. You can then proceed in three steps:
You pair each element of your list with its image under f: map (id &&& f);
You remove every pair of which the second element has already appeared in the now obtained list: nubBy (on (==) snd);
You drop the second component of each pair: map fst.
Hence:
import Control.Arrow ((&&&))
import Data.Function (on)
import Data.List (nubBy)
filterOn :: Eq b => (a -> b) -> [a] -> [a]
filterOn f = map fst . nubBy ((==) `on` snd) . map (id &&& f)
For example:
> filterOn even [1 .. 10]
[1,2]
It is impossible to do this with a cond :: a -> Bool function and solely filter, i.e. with (filter cond) [1..10] yielding [1,2].
The problem is that filter looks at each element in your array exactly once, and you have no information about previous elements.

Haskell: how to map a tuple?

In Haskell, I can easily map a list:
map (\x -> 2*x) [1,2]
gives me [2,4]. Is there any "mapTuple" function which would work like that?
mapTuple (\x -> 2*x) (1,2)
with the result being (2,4).
Here's a rather short point-free solution:
import Control.Monad (join)
import Control.Arrow ((***))
mapTuple = join (***)
Searching at Hoogle gives no exact matches for (a -> b) -> (a, a) -> (b, b), which is the type you require, but it is pretty easy to do yourself:
mapTuple :: (a -> b) -> (a, a) -> (b, b)
mapTuple f (a1, a2) = (f a1, f a2)
Note, you will have to define a new function for 3-tuples, 4-tuples etc - although such a need might be a sign, that you are not using tuples like they were intended: In general, tuples hold values of different types, so wanting to apply a single function to all values is not very common.
You could use Bifunctor:
import Control.Monad (join)
import Data.Bifunctor (bimap)
join bimap (2*) (1,2)
This works not only for pairs, but for a number of other types as well, e.g. for Either.
Bifunctor is in base as of version 4.8. Previously it was provided by the bifunctors package.
You can also use lens to map tuples:
import Control.Lens
mapPair = over both
Or you can map over tuples with upto 10 elements:
mapNtuple f = traverseOf each (return . f)
You can use arrows from module Control.Arrow to compose functions that work on tuples.
Prelude Control.Arrow> let f = (*2) *** (*2)
Prelude Control.Arrow> f (1,2)
(2,4)
Prelude Control.Arrow> let f' = (*2) *** (*3)
Prelude Control.Arrow> f (2,2)
(4,4)
Prelude Control.Arrow> f' (2,2)
(4,6)
Your mapTuple then becomes
mapTuple f = f *** f
If with your question you asked for a function that maps over tuples of arbitrary arity, then I'm afraid you can't because they would have different types (e.g. the tuple types (a,b) and (a,b,c) are totally different and unrelated).
Here is another way:
mapPair :: (a -> b) -> (a, a) -> (b, b) -- this is the inferred type
mapPair f = uncurry ((,) `on` f)
You need Data.Function imported for on function.
To add another solution to this colourful set... You can also map over arbitrary n-tuples using Scrap-Your-Boilerplate generic programming. For example:
import Data.Data
import Data.Generics.Aliases
double :: Int -> Int
double = (*2)
tuple :: (Int, Int, Int, Int)
tuple = gmapT (mkT double) (1,2,3,4)
Note that the explicit type annotations are important, as SYB selects the fields by type. If one makes one tuple element type Float, for example, it wouldn't be doubled anymore.
Yes, for tuples of 2 items, you can use first and second to map the contents of a tuple (Don't worry about the type signature; a b c can be read as b -> c in this situation). For larger tuples, you should consider using a data structure and lenses instead.
The extra package provides the both function in the Data.Tuple.Extra module. From the docs:
Apply a single function to both components of a pair.
> both succ (1,2) == (2,3)
both :: (a -> b) -> (a, a) -> (b, b)
You can also use Applicatives which have additional benefit of giving you possibility to apply different functions for each tuple element:
import Control.Applicative
mapTuple :: (a -> a') -> (b -> b') -> (a, b) -> (a', b')
mapTuple f g = (,) <$> f . fst <*> g . snd
Inline version:
(\f -> (,) <$> f . fst <*> f . snd) (*2) (3, 4)
or with different map functions and without lambda:
(,) <$> (*2) . fst <*> (*7) . snd $ (3, 4)
Other possibility would be to use Arrows:
import Control.Arrow
(+2) . fst &&& (+2) . snd $ (2, 3)
I just added a package tuples-homogenous-h98 to Hackage that solves this problem. It adds newtype wrappers for tuples and defines Functor, Applicative, Foldable and Traversable instances for them. Using the package you can do things like:
untuple2 . fmap (2 *) . Tuple2 $ (1, 2)
or zip tuples like:
Tuple2 ((+ 1), (*2)) <*> Tuple2 (1, 10)
The uniplate package provides the descend function in the Data.Generics.Uniplate.Data module. This function will apply the function everywhere the types match, so can be applied to lists, tuples, Either, or most other data types. Some examples:
descend (\x -> 2*x) (1,2) == (2,4)
descend (\x -> 2*x) (1,"test",Just 2) == (2,"test",Just 4)
descend (\x -> 2*x) (1,2,3,4,5) == (2,4,6,8,10)
descend (\x -> 2*x) [1,2,3,4,5] == [2,4,6,8,10]
Yes, you would do:
map (\x -> (fst x *2, snd x *2)) [(1,2)]
fst grabs the first data entry in a tuple, and snd grabs the second; so, the line of code says "take a tuple, and return another tuple with the first and second items double the previous."

Function to show the lowest represented element in a list

If you have a list such as this in Haskell:
data TestType = A | B | C deriving (Ord, Eq, Show)
List1 :: [TestType]
List1 = [A,B,C,B,C,A,B,C,C,C]
Is it possible to write a function to determin which element is represented the least in a list (so in this case 'A')
My initial thought was to write a helper function such as this but now I am not sure if this is the right approach:
appears :: TestType -> [TestType] -> Int
appears _ [] = 0
appears x (y:ys) | x==y = 1 + (appears x ys)
| otherwise = appears x ys
I am still fairly new to Haskell, so apologies for the potentially silly question.
Many thanks
Slightly alternative version to Matt's approach
import Data.List
import Data.Ord
leastFrequent :: Ord a => [a] -> a
leastFrequent = head . minimumBy (comparing length) . group . sort
You can build a map counting how often each item occurs in the list
import qualified Data.Map as Map
frequencies list = Map.fromListWith (+) $ zip list (repeat 1)
Then you can find the least/most represented using minimumBy or maximumBy from Data.List on the list of Map.assocs of the frequency map, or even sort it by frequency using sortBy.
module Frequencies where
import Data.Ord
import Data.List
import qualified Data.Map as Map
frequencyMap :: Ord a => [a] -> Map.Map a Int
frequencyMap list = Map.fromListWith (+) $ zip list (repeat 1)
-- Caution: leastFrequent will cause an error if called on an empty list!
leastFrequent :: Ord a => [a] -> a
leastFrequent = fst . minimumBy (comparing snd) . Map.assocs . frequencyMap
ascendingFrequencies :: Ord a => [a] -> [(a,Int)]
ascendingFrequencies = sortBy (comparing snd) . Map.assocs . frequencyMap
Here's another way to do it:
sort the list
group the list
find the length of each group
return the group with the shortest length
Example:
import GHC.Exts
import Data.List
fewest :: (Eq a) => [a] -> a
fewest xs = fst $ head sortedGroups
where
sortedGroups = sortWith snd $ zip (map head groups) (map length groups)
groups = group $ sort xs
A less elegant idea would be:
At first sort and group the list
then pairing the cases with their number of representations
at last sort them relative to their num of representations
In code this looks like
import Data.List
sortByRepr :: (Ord a) => [a] ->[(a,Int)]
sortByRepr xx = sortBy compareSnd $ map numOfRepres $ group $ sort xx
where compareSnd x y = compare (snd x) (snd y)
numOfRepres x = (head x, length x)
the least you get by applying head to the resulting list.

Resources