What is groupBy supposed to do? - haskell

I wrote something using Data.List.groupBy. It didn't work as expected so I end up writting my own version of groupBy : after all I'm not sure that the Data.List one is supposed to do (there is no real documentation).
Anyway my tests passed with my version of groupBy whereas it fails with the Data.List.
I found (thanks quickcheck) a case where the two function behaves differently, and I still don't understand why there is a difference between the two versions. Is the Data.List version buggy or is is mine ? (Of course mine is a naive implementation and is probably not the most efficient way to do so).
Here is the code :
import qualified Data.List as DL
import Data.Function (on)
import Test.QuickCheck
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _ [] = []
groupBy' eq (x:xs) = xLike:(groupBy' eq xNotLike) where
xLike = x:[ e | e <- xs, x `eq` e ]
xNotLike = [ e | e <- xs, not $ x `eq` e ]
head' [] = Nothing
head' (x:xs) = Just x
prop_a s = (groupBy' by s) == (DL.groupBy by s) where
types = s :: [String]
by = (==) `on` head'
running in ghc quickCheck prop_a returns ["", "a", ""]
*Main> groupBy' ((==) `on` head') ["","a",""]
[["",""],["a"]] # correct in my opinion
*Main> DL.groupBy ((==) `on` head') ["","a",""]
[[""],["a"],[""]] # incorrect.
What's happening ? I can't believe there is a bug in the haskell-platform .

Your version is O (n2) – which can be unacceptably slow in real-world use1.
The standard version avoids this by only grouping adjacent elements if they are equivalent. Hence,
*Main> groupBy ((==) `on` head') ["", "", "a"]
will yield the result you're after.
A simple way to obtain "universal grouping" with groupBy is to first sort the list if that's feasible for the data type.
*Main> groupBy ((==) `on` head') $ DL.sort ["", "a", ""]
The complexity of this is only O (n log n).
1 This didn't prevent the committee from specifying nub as O (n2)...

Data.List.groupBy in Haskell is a usability mistake! A user friendly groupBy should behave like this:
groupByWellBehaved p = foldr (\x rest -> if null rest
then [[x]]
else if p x (head (head rest))
then (x : head rest) : (tail rest)
else [x] : rest) []
Perhaps there is a better implementation, but at least this is O(n).

Related

Haskell - Filtering a list of tuples

Consider this list of tuples:
[(57,48),(58,49),(59,50),(65,56),(65,47),(65,57),(65,49), (41, 11)]
I want to remove a tuple (a, b) if its second element b is equal to the first element of another tuple and all the tuples with the same a that come after it. For example:
The second element of (65,57) is 57 and the first tuple in the list (57,48)has 57 as its first element, so (65,57) should be removed and all tuples that come after it that start with 65, namely (65,49). The tuples that come before it, (65,56) and (65,47), should stay in the list.
Does anyone have an idea how to do this?
For efficiency (single pass), you should create two sets, one for elements you've seen as the first elements of tuples, the other for elements you've seen both as first and second elements (ie. delete if matches first element).
Something like,
{-# LANGUAGE PackageImports #-}
import "lens" Control.Lens (contains, (.~), (^.), (&))
import "yjtools" Data.Function.Tools (applyUnless, applyWhen)
import qualified "containers" Data.IntSet as Set
filterTuples :: Foldable t => t (Int, Int) -> [(Int, Int)]
filterTuples = flip (foldr go $ const []) (Set.empty, Set.empty)
where
go p#(x,y) go' (fsts, deletes) =
let seenFst = fsts ^. contains y
shouldDelete = seenFst || deletes ^. contains x
fsts' = fsts & contains x .~ True
deletes' = deletes & applyWhen seenFst (contains y .~ True)
in applyUnless shouldDelete (p:) $ go' (fsts', deletes')
EDITs: for correctness, clarity, spine-laziness
You could start by creating a distinct set of all the first elements, e.g.:
Prelude Data.List> firsts = nub $ fst <$>
[(57,48),(58,49),(59,50),(65,56),(65,47),
(65,57),(65,49), (41, 11)]
Prelude Data.List> firsts
[57,58,59,65,41]
You could use break or span as Robin Zigmond suggests. You'll need a predicate for that. You could use elem, like this:
Prelude Data.List> elem 48 firsts
False
Prelude Data.List> elem 49 firsts
False
...
Prelude Data.List> elem 57 firsts
True
If you're concerned that elem is too inefficient, you could experiment with creating a Set and use the member function instead.
Perhaps try using mapAccumL starting with the initial list as the accumulator. Then maintain a Predicate as a parameter too which acts as a decider for what has been seen, and this will determine if you can output or not at each step in the traversal.
I'm an absolute beginner in haskell, so there probably is a much more elegant/efficient solution for this. But anyways I wanted to share the solution I came up with:
filterTuples :: [(Int, Int)] -> [(Int,Int)]
filterTuples [] = []
filterTuples (x:xs) = x:filterTuples(concat ((fst temp) : [filter (\z -> fst z /= del) (snd temp)]))
where del = fst (head (snd temp))
temp = break (\y -> (snd y == fst x)) xs
(Glad for feedback on how to improve this)
f consumes a list of pairs: xs; it produces a new list of pairs: ys. ys contains every pair: (a, b) in xs, except the pair whose second element b: previously occurred as first elements: a. When such a pair: (a, b) is encountered, subsequent pairs that have a as their first elements are excluded from ys.
f xs = go xs [] []
where
go [] ys zs = ys
go (x#(a,b):xs) ys zs
| b `elem` as = go xs ys (a:zs)
| a `elem` zs = go xs ys zs
| otherwise = [x] ++ go xs ys zs
as = (nub . fst . unzip) xs

elemIndex in conjunction with splitAt Haskell

I'm pretty new to Haskell and can't figure out how to solve the following problem that I have.
I have to implement the following function "ca" which takes a list and an element and deletes all of the other elements in the list after the input element:
ca:: Eq a => a -> [a] -> [a]
I'm not allowed to change any of the function types and have so far come up with the following code:
ca x xs = let (ys, zs) = splitAt (elemIndex x xs) xs in ys
This produces the following error:
couldn't match expected type 'Int' with the actual type 'Maybe Int'
Now I understand why this error is occurring however I do not understand how I can fix it. Any help would be appreciated.
Instead of using indexes, you can use the break function, which splits a list just before the position where a predicate first becomes True.
> break (== 'x') "aabaaccxaabbcc"
("aabaacc","xaabbcc")
Or, since you're discarding the second part anyway, you can use takeWhile.
> takeWhile (/= 'x') "aabaaccxaabbcc"
"aabaacc"
This should typecheck, although the function is unsafe now:
import Data.List
import Data.Maybe (fromJust)
ca x xs = let (ys, zs) = splitAt (fromJust $ elemIndex x xs) xs in ys
The reason is elemIndex type is elemIndex :: Eq a => a -> [a] -> Maybe Int and you want to extract the Int out of the Maybe and pass it to the splitAt function. This is what fromJust function does. fromJust will extract out the value from Just datatype.
You can try writing the safe alternative of this function using maybe or any other alternative function.
You can do this:
let
Just n = elemIndex x xs
(ys, zs) = splitAt n xs
in ys
However, if the specified element isn't present, this will throw an exception due to a "pattern match failure". A better way is this:
case elemIndex x xs of
Just n -> let (ys, zs) = splitAt n xs in ys
Nothing -> {- decide what to do if there's no match! -}
However, I concur with Hammar. You're not really trying to split the list into two parts; you're only interested in one of those parts. So rather than hunt through the list, find the element you want, return its index, and then hunt through the list a second time, why not just use break or takeWhile?

How can I efficiently filter a pair of lists in Haskell?

I'm working on a simple problem on Programming Praxis: remove all duplicates from a list without changing the order. Assuming the elements are in class Ord, I came up with the following:
import Data.Set (Set)
import qualified Data.Set as Set
buildsets::Ord a => [a] -> [Set a]
buildsets = scanl (flip Set.insert) Set.empty
nub2::Ord a => [a] -> [a]
nub2 thelist = map fst $ filter (not . uncurry Set.member) (zip thelist (buildsets thelist))
As you can see, the buildsets function gets me most of the way there, but that last step (nub2) of putting everything together looks absolutely horrible. Is there a cleaner way to accomplish this?
Since we have to filter the list and we should probably use some set to keep records, we might as well use filterM with the state monad:
import qualified Data.Set as S
import Control.Monad.State.Strict
nub2 :: Ord a => [a] -> [a]
nub2 = (`evalState` S.empty) . filterM go where
go x = state $ \s -> if S.member x s
then (False, s)
else (True, S.insert x s)
If I wanted to somewhat golf the function, I'd to the following:
import Control.Arrow (&&&)
nub2 = (`evalState` S.empty) . filterM (\x -> state (S.notMember x &&& S.insert x))
Simple recursion looks ok to me.
> g xs = go xs S.empty where
> go [] _ = []
> go (x:xs) a | S.member x a = go xs a
> | otherwise = x:go xs (S.insert x a)
Based directly on Sassa NF's suggestion, but with a slight type change for cleanliness:
g x = catMaybes $ unfoldr go (Set.empty, x)
where
go (_,[]) = Nothing
go (s,(x:xs)) = Just (if Set.member x s then Nothing else Just x,
(Set.insert x s, xs))
Sometimes it really cleans up code to pull out and name subpieces. (In some ways this really is the Haskell way to comment code)
This is wordier that what you did above, but I think it is much easier to understand....
First I start with some definitions:
type Info=([Int], S.Set Int) --This is the remaining and seen items at a point in the list
item=head . fst --The current item
rest=fst --Future items
seen=snd --The items already seen
Then I add two self descriptive helper functions:
itemHasBeenSeen::Info->Bool
itemHasBeenSeen info = item info `S.member` seen info
moveItemToSet::Info->Info
moveItemToSet info = (tail $ rest info, item info `S.insert` seen info)
With this the program becomes:
nub2::[Int]->[Int]
nub2 theList =
map item
$ filter (not . itemHasBeenSeen)
$ takeWhile (not . null . rest)
$ iterate moveItemToSet start
where start = (theList, S.empty)
Reading from bottom to top (just as the data flows), you can easily see what it happening:
start=(theList, S.empty), start with the full list, and an empty set.
iterate moveItemToSet start, repeatedly move the first item of the list into the set, saving each iteration of Info in an array.
takeWhile (not . null . rest)- Stop the iteration when you run out of elements.
filter (not . itemHasBeenSeen)- Remove items that have already been seen.
map item- Throw away the helper values....

Haskell - Most frequent value

how can i get the most frequent value in a list example:
[1,3,4,5,6,6] -> output 6
[1,3,1,5] -> output 1
Im trying to get it by my own functions but i cant achieve it can you guys help me?
my code:
del x [] = []
del x (y:ys) = if x /= y
then y:del x y
else del x ys
obj x []= []
obj x (y:ys) = if x== y then y:obj x y else(obj x ys)
tam [] = 0
tam (x:y) = 1+tam y
fun (n1:[]) (n:[]) [] =n1
fun (n1:[]) (n:[]) (x:s) =if (tam(obj x (x:s)))>n then fun (x:[]) ((tam(obj x (x:s))):[]) (del x (x:s)) else(fun (n1:[]) (n:[]) (del x (x:s)))
rep (x:s) = fun (x:[]) ((tam(obj x (x:s))):[]) (del x (x:s))
Expanding on Satvik's last suggestion, you can use (&&&) :: (b -> c) -> (b -> c') -> (b -> (c, c')) from Control.Arrow (Note that I substituted a = (->) in that type signature for simplicity) to cleanly perform a decorate-sort-undecorate transform.
mostCommon list = fst . maximumBy (compare `on` snd) $ elemCount
where elemCount = map (head &&& length) . group . sort $ list
The head &&& length function has type [b] -> (b, Int). It converts a list into a tuple of its first element and its length, so when it is combined with group . sort you get a list of each distinct value in the list along with the number of times it occurred.
Also, you should think about what happens when you call mostCommon []. Clearly there is no sensible value, since there is no element at all. As it stands, all the solutions proposed (including mine) just fail on an empty list, which is not good Haskell. The normal thing to do would be to return a Maybe a, where Nothing indicates an error (in this case, an empty list) and Just a represents a "real" return value. e.g.
mostCommon :: Ord a => [a] -> Maybe a
mostCommon [] = Nothing
mostCommon list = Just ... -- your implementation here
This is much nicer, as partial functions (functions that are undefined for some input values) are horrible from a code-safety point of view. You can manipulate Maybe values using pattern matching (matching on Nothing and Just x) and the functions in Data.Maybe (preferable fromMaybe and maybe rather than fromJust).
In case you would like to get some ideas from code that does what you wish to achieve, here is an example:
import Data.List (nub, maximumBy)
import Data.Function (on)
mostCommonElem list = fst $ maximumBy (compare `on` snd) elemCounts where
elemCounts = nub [(element, count) | element <- list, let count = length (filter (==element) list)]
Here are few suggestions
del can be implemented using filter rather than writing your own recursion. In your definition there was a mistake, you needed to give ys and not y while deleting.
del x = filter (/=x)
obj is similar to del with different filter function. Similarly here in your definition you need to give ys and not y in obj.
obj x = filter (==x)
tam is just length function
-- tam = length
You don't need to keep a list for n1 and n. I have also made your code more readable, although I have not made any changes to your algorithm.
fun n1 n [] =n1
fun n1 n xs#(x:s) | length (obj x xs) > n = fun x (length $ obj x xs) (del x xs)
| otherwise = fun n1 n $ del x xs
rep xs#(x:s) = fun x (length $ obj x xs) (del x xs)
Another way, not very optimal but much more readable is
import Data.List
import Data.Ord
rep :: Ord a => [a] -> a
rep = head . head . sortBy (flip $ comparing length) . group . sort
I will try to explain in short what this code is doing. You need to find the most frequent element of the list so the first idea that should come to mind is to find frequency of all the elements. Now group is a function which combines adjacent similar elements.
> group [1,2,2,3,3,3,1,2,4]
[[1],[2,2],[3,3,3],[1],[2],[4]]
So I have used sort to bring elements which are same adjacent to each other
> sort [1,2,2,3,3,3,1,2,4]
[1,1,2,2,2,3,3,3,4]
> group . sort $ [1,2,2,3,3,3,1,2,4]
[[1,1],[2,2,2],[3,3,3],[4]]
Finding element with the maximum frequency just reduces to finding the sublist with largest number of elements. Here comes the function sortBy with which you can sort based on given comparing function. So basically I have sorted on length of the sublists (The flip is just to make the sorting descending rather than ascending).
> sortBy (flip $ comparing length) . group . sort $ [1,2,2,3,3,3,1,2,4]
[[2,2,2],[3,3,3],[1,1],[4]]
Now you can just take head two times to get the element with the largest frequency.
Let's assume you already have argmax function. You can write
your own or even better, you can reuse list-extras package. I strongly suggest you
to take a look at the package anyway.
Then, it's quite easy:
import Data.List.Extras.Argmax ( argmax )
-- >> mostFrequent [3,1,2,3,2,3]
-- 3
mostFrequent xs = argmax f xs
where f x = length $ filter (==x) xs

Algorithm - How to delete duplicate elements in a Haskell list

I'm having a problem creating an function similar to the nub function.
I need this func to remove duplicated elements form a list.
An element is duplicated when 2 elements have the same email, and it should keep the newer one (is closer to the end of the list).
type Regist = [name,email,,...,date]
type ListRe = [Regist]
rmDup ListRe -> ListRe
rmDup [] = []
rmDup [a] = [a]
rmDup (h:t) | isDup h (head t) = rmDup t
| otherwise = h : rmDup t
isDup :: Regist -> Regist -> Bool
isDup (a:b:c:xs) (d:e:f:ts) = b==e
The problem is that the function doesn't delete duplicated elements unless they are together in the list.
Just use nubBy, and specify an equality function that compares things the way you want.
And I guess reverse the list a couple of times if you want to keep the last element instead of the first.
Slightly doctored version of your original code to make it run:
type Regist = [String]
type ListRe = [Regist]
rmDup :: ListRe -> ListRe
rmDup [] = []
rmDup (x:xs) = x : rmDup (filter (\y -> not(x == y)) xs)
Result:
*Main> rmDup [["a", "b"], ["a", "d"], ["a", "b"]]
[["a","b"],["a","d"]]
Anon is correct: nubBy is the function you are looking for, and can be found in Data.List.
That said, you want a function rem which accepts a list xs and a function f :: a -> a -> Bool (on which elements are compared for removal from xs). Since the definition is recursive, you need a base case and a recursive case.
In the base case xs = [] and rem f xs = [], since the result of removing all duplicate elements from [] is []:
rem :: Eq a => (a -> a -> Bool) -> [a] -> [a]
rem f [] = []
In the recursive case, xs = (a:as). Let as' be the list obtained by removing all elements a' such that f a a' = True from the list as. This is simply the function filter (\a' -> not $ f a a') applied to the list as. Them rem f (a:as) is the result of recursively calling rem f on as', that is, a : rem f as':
rem f (a:as) = a : rem f $ filter (\a' -> not $ f a a') as
Replace f be a function comparing your list elements for the appropriate equality (e-mail addresses).
While nubBy with two reverse's is probably the best among simple solutions (and probably exactly what Justin needs for his task), one should not forget that it isn't the ideal solution in terms of efficiency - after all nubBy is O(n^2) (in the "worst case" - when there are no duplicates). Two reverse's will also take their toll (in the form of memory allocation).
For more efficient implementation Data.Map (O(logN) on inserts) can be used as an intermediate "latest non duplicating element" holder (Set.insert replaces older element with newer if there is a collision):
import Data.List
import Data.Function
import qualified Data.Set as S
newtype Regis i e = Regis { toTuple :: (i,[e]) }
selector (Regis (_,(_:a:_))) = a
instance Eq e => Eq (Regis i e) where
(==) = (==) `on` selector
instance Ord e => Ord (Regis i e) where
compare = compare `on` selector
rmSet xs = map snd . sortBy (compare `on` fst) . map toTuple . S.toList $ set
where
set = foldl' (flip (S.insert . Regis)) S.empty (zip [1..] xs)
While nubBy implementation is definitely much simpler:
rmNub xs = reverse . nubBy ((==) `on` (!!1)) . reverse $ xs
on 10M elements list (with lots of duplication - nub should play nice here) there is 3 times difference in terms of running time and 700 times difference in memory usage. Compiled with GHC with -O2 :
input = take 10000000 $ map (take 10) $ permutations [1..]
test1 = rmNub input
test2 = rmSet input
Not sure about the nature of the author's data though (the real data might change the picture).
(Assuming you want to figure out an answer, not just call a library function that does this job for you.)
You get what you ask for. What if h is not equal to head t but is instead equal to the 3rd element of t? You need to write an algorithm that compares h with every element of t, not just the first element.
Why not putting everything in a Map from email to Regist (of course respecting your "keep the newest" rule), and then transform the values of the map back in the list? That's the most efficient way I can think of.
I used Alexei Polkhanov's answer and came to the following, so you can remove duplicates from lists with a type that extends Eq class.
removeDuplicates :: Eq a => [[a]] -> [[a]]
removeDuplicates [] = []
removeDuplicates (x:xs) = x : removeDuplicates (filter (\y -> not (x == y)) xs)
Examples:
*Verdieping> removeDuplicates [[1],[2],[1],[1,2],[1,2]]
[[1],[2],[1,2]]
*Verdieping> removeDuplicates [["a","b"],["a"],["a","b"],["c"],["c"]]
[["a","b"],["a"],["c"]]

Resources