Create function to remove duplicates from Haskell list

Create function to remove duplicates from Haskell list - haskell

I am trying to build a function that takes the first element of a string, and removes all other elements equal to it from the string. Then does the same for the second character.
Ie - "Heello" would become "Helo" and "Chocolate" "Chlate"
My original attempt
removeSuccessor :: String -> String
removeSuccessor x = [c | c <- x, x ! `elem` c]
But that doesn't seem to work.. suggestions?

You could keep a set of all elements seen and only keep the current one if it hasn't been seen yet:
import Data.Set
removeDups :: Ord a => [a] -> Set a -> [a]
removeDups [] sofar = []
removeDups (x:rest) sofar
| member x sofar = (removeDups rest sofar)
| otherwise = x:(removeDups rest (insert x sofar))
Usage:
removeDups "Heello" empty -- "Helo"
removeDups "Chocolate" empty -- "Choclate"
Run time is O(n log n), I think.
Or you can use nub from Data.List:
Prelude Data.List> import Data.List
Prelude Data.List> nub "Heello"
"Helo"
Prelude Data.List> nub "Chocolate"
"Choclate"
Run-time is O(n^2).

Related

Haskell - Filtering a list of tuples

Consider this list of tuples:
[(57,48),(58,49),(59,50),(65,56),(65,47),(65,57),(65,49), (41, 11)]
I want to remove a tuple (a, b) if its second element b is equal to the first element of another tuple and all the tuples with the same a that come after it. For example:
The second element of (65,57) is 57 and the first tuple in the list (57,48)has 57 as its first element, so (65,57) should be removed and all tuples that come after it that start with 65, namely (65,49). The tuples that come before it, (65,56) and (65,47), should stay in the list.
Does anyone have an idea how to do this?

For efficiency (single pass), you should create two sets, one for elements you've seen as the first elements of tuples, the other for elements you've seen both as first and second elements (ie. delete if matches first element).
Something like,
{-# LANGUAGE PackageImports #-}
import "lens" Control.Lens (contains, (.~), (^.), (&))
import "yjtools" Data.Function.Tools (applyUnless, applyWhen)
import qualified "containers" Data.IntSet as Set
filterTuples :: Foldable t => t (Int, Int) -> [(Int, Int)]
filterTuples = flip (foldr go $ const []) (Set.empty, Set.empty)
where
go p#(x,y) go' (fsts, deletes) =
let seenFst = fsts ^. contains y
shouldDelete = seenFst || deletes ^. contains x
fsts' = fsts & contains x .~ True
deletes' = deletes & applyWhen seenFst (contains y .~ True)
in applyUnless shouldDelete (p:) $ go' (fsts', deletes')
EDITs: for correctness, clarity, spine-laziness

You could start by creating a distinct set of all the first elements, e.g.:
Prelude Data.List> firsts = nub $ fst <$>
[(57,48),(58,49),(59,50),(65,56),(65,47),
(65,57),(65,49), (41, 11)]
Prelude Data.List> firsts
[57,58,59,65,41]
You could use break or span as Robin Zigmond suggests. You'll need a predicate for that. You could use elem, like this:
Prelude Data.List> elem 48 firsts
False
Prelude Data.List> elem 49 firsts
False
...
Prelude Data.List> elem 57 firsts
True
If you're concerned that elem is too inefficient, you could experiment with creating a Set and use the member function instead.

Perhaps try using mapAccumL starting with the initial list as the accumulator. Then maintain a Predicate as a parameter too which acts as a decider for what has been seen, and this will determine if you can output or not at each step in the traversal.

I'm an absolute beginner in haskell, so there probably is a much more elegant/efficient solution for this. But anyways I wanted to share the solution I came up with:
filterTuples :: [(Int, Int)] -> [(Int,Int)]
filterTuples [] = []
filterTuples (x:xs) = x:filterTuples(concat ((fst temp) : [filter (\z -> fst z /= del) (snd temp)]))
where del = fst (head (snd temp))
temp = break (\y -> (snd y == fst x)) xs
(Glad for feedback on how to improve this)

f consumes a list of pairs: xs; it produces a new list of pairs: ys. ys contains every pair: (a, b) in xs, except the pair whose second element b: previously occurred as first elements: a. When such a pair: (a, b) is encountered, subsequent pairs that have a as their first elements are excluded from ys.
f xs = go xs [] []
where
go [] ys zs = ys
go (x#(a,b):xs) ys zs
| b `elem` as = go xs ys (a:zs)
| a `elem` zs = go xs ys zs
| otherwise = [x] ++ go xs ys zs
as = (nub . fst . unzip) xs

Finding elements and their indices in a list

I need to have both the elements of a list satisfying a predicate and the indices of these elements. I can achieve this as follows:
import Data.List (findIndices)
list :: [Int]
list = [3,2,4,1,9]
indices = findIndices (>2) list
elems = [list!!i | i <- indices]
-- same as: elems = filter (>2) list
Isn't there a package providing a function giving both the elements and their indices in "one shot" ? I'm surprised I don't find this function somewhere. Otherwise, how to do such a function, improving my above code ? I don't believe this code is optimal since it somehow accesses twice to the elements of the list. I took a quick look at the source code of findIndices but I don't understand it yet.

You can make it more efficient – avoid the !! access – by filtering a list of (index, element) tuples.
let (indices, elems) = unzip [(i, x) | (i, x) <- zip [0..] list, x > 2]
Split into an appropriate function:
findItems :: (a -> Bool) -> [a] -> [(Int, a)]
findItems predicate = filter (predicate . snd) . zip [0..]
let (indices, elems) = unzip $ findItems (>2) list
There might be a more straightforward way, and I’ll be happy to find out about it :)

I think Ry's suggestion is just fine. For a more direct, and in particular more generic one, you could use lens tooling:
Prelude> import Control.Lens as L
Prelude L> import Control.Arrow as A
Prelude L A> ifoldr (\i x -> if x>2 then (i:)***(x:) else id) ([],[]) [3,2,4,1,9]
([0,2,4],[3,4,9])
This can immediately be used also on arrays (where the index extraction is much more useful)
Prelude L A> import qualified Data.Vector as V
Prelude L A V> ifoldr (\i x -> if x>2 then (i:)***(x:) else id) ([],[]) $ V.fromList [3,2,4,1,9]
([0,2,4],[3,4,9])
...even on unboxed ones, though these aren't Foldable:
Prelude L A V> import qualified Data.Vector.Unboxed as VU
Prelude L A V VU> import Data.Vector.Generic.Lens as V
ifoldrOf vectorTraverse (\i x -> if x>2 then (i:)***(x:) else id) ([],[]) $ VU.fromList [3,2,4,1,9]
([0,2,4],[3.0,4.0,9.0])

(indices, elems) = unzip [ item | item <- zip [0..] ls, (snd item) > 2 ]
Not sure that it's any more efficient, but it gets it done in "one shot".

Rotate a matrix in Haskell

I have this type Mat a = [[a]] to represent a matrix in haskell.
I have to write a function which rotate a matrix, for e.g [[1,2,3],[0,4,5],[0,0,6]] will become [[3,5,6],[2,4,0],[1,0,0]] so I made this:
rotateLeft :: Mat a->Mat a
rotateLeft [[]] = []
rotateLeft (h:t) = (map last (h:t)):(rotateLeft (map init (h:t)))
but the output is
[[3,5,6],[2,4,0],[1,0,0],[*** Exception: Prelude.last: empty list
I don't know what to put in the base case to avoid this exception.
Apreciate any help.

I'm an old man in a hurry. I'd do it like this (importing Data.List)
rotl :: [[x]] -> [[x]]
rotl = transpose . map reverse

I think the simplest solution would be:
import Data.List
rotateLeft :: [[a]] -> [[a]]
rotateLeft = reverse . transpose
rotateRight :: [[a]] -> [[a]]
rotateRight = transpose . reverse
Data.List is a standard module.
transpose slices rows into columns, which is almost like rotating, but leaves the columns in the wrong order, so we just reverse them.

Your list won't be empty but a list of empty lists, you can do the following to pattern match based on the first sublist (assuming Mat ensures data structure consistency)
rl [] = []
rl ([]:_) = []
rl m = map last m : (rl (map init m))
rl mat
[[3,5,6],[2,4,0],[1,0,0]]
You're missing the second case.

The problem is that your pattern isn't matching. Stepping through what your code does, we start with:
Prelude> let x = [[1,2,3],[0,4,5],[0,0,6]]
Prelude> :m +Data.List
Prelude Data.List> map last x
[3,5,6]
Prelude Data.List> let y = map init x
Prelude Data.List> y
[[1,2],[0,4],[0,0]]
Prelude Data.List> map last y
[2,4,0]
Prelude Data.List> let z = map init y
Prelude Data.List> z
[[1],[0],[0]]
Prelude Data.List> map last z
[1,0,0]
Prelude Data.List> map init z
[[],[],[]]
So the basic problem is that your base case you're matching on is not [[],[],[]] but is instead [[]], so that pattern doesn't match.
You now have more or less three options: you can (a) try to terminate when the first empty list is seen; this is written in Haskell as any null, using the any function and the null function, both defined in the Prelude; or (b) you can hardcode that this only works for 3x3 matrices, and just match against [[],[],[]], or (c) you can try to terminate when all lists are empty (all null), in which case you can either skip elements that don't exist or wrap everything in the Maybe x datatype, so that missing elements are represented by Nothing while present elements are represented by Just x.

rotateLeft [] = []
rotateLeft ([]:_) = []
rotateLeft (h:t) = (map last (h:t)):(rotateLeft (map init (h:t)))
The first pattern is for the case when the head of the list of lists is longer than other elements.
You got the second pattern wrong: if we have a proper matrix (ie elements are of the same length), then the base case is a list of empty lists. However, you wrote [[]], which is only true if the initial list consists of a single list.

What is groupBy supposed to do?

I wrote something using Data.List.groupBy. It didn't work as expected so I end up writting my own version of groupBy : after all I'm not sure that the Data.List one is supposed to do (there is no real documentation).
Anyway my tests passed with my version of groupBy whereas it fails with the Data.List.
I found (thanks quickcheck) a case where the two function behaves differently, and I still don't understand why there is a difference between the two versions. Is the Data.List version buggy or is is mine ? (Of course mine is a naive implementation and is probably not the most efficient way to do so).
Here is the code :
import qualified Data.List as DL
import Data.Function (on)
import Test.QuickCheck
groupBy' :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy' _ [] = []
groupBy' eq (x:xs) = xLike:(groupBy' eq xNotLike) where
xLike = x:[ e | e <- xs, x `eq` e ]
xNotLike = [ e | e <- xs, not $ x `eq` e ]
head' [] = Nothing
head' (x:xs) = Just x
prop_a s = (groupBy' by s) == (DL.groupBy by s) where
types = s :: [String]
by = (==) `on` head'
running in ghc quickCheck prop_a returns ["", "a", ""]
*Main> groupBy' ((==) `on` head') ["","a",""]
[["",""],["a"]] # correct in my opinion
*Main> DL.groupBy ((==) `on` head') ["","a",""]
[[""],["a"],[""]] # incorrect.
What's happening ? I can't believe there is a bug in the haskell-platform .

Your version is O (n2) – which can be unacceptably slow in real-world use1.
The standard version avoids this by only grouping adjacent elements if they are equivalent. Hence,
*Main> groupBy ((==) `on` head') ["", "", "a"]
will yield the result you're after.
A simple way to obtain "universal grouping" with groupBy is to first sort the list if that's feasible for the data type.
*Main> groupBy ((==) `on` head') $ DL.sort ["", "a", ""]
The complexity of this is only O (n log n).
1 This didn't prevent the committee from specifying nub as O (n2)...

Data.List.groupBy in Haskell is a usability mistake! A user friendly groupBy should behave like this:
groupByWellBehaved p = foldr (\x rest -> if null rest
then [[x]]
else if p x (head (head rest))
then (x : head rest) : (tail rest)
else [x] : rest) []
Perhaps there is a better implementation, but at least this is O(n).

Algorithm - How to delete duplicate elements in a Haskell list

I'm having a problem creating an function similar to the nub function.
I need this func to remove duplicated elements form a list.
An element is duplicated when 2 elements have the same email, and it should keep the newer one (is closer to the end of the list).
type Regist = [name,email,,...,date]
type ListRe = [Regist]
rmDup ListRe -> ListRe
rmDup [] = []
rmDup [a] = [a]
rmDup (h:t) | isDup h (head t) = rmDup t
| otherwise = h : rmDup t
isDup :: Regist -> Regist -> Bool
isDup (a:b:c:xs) (d:e:f:ts) = b==e
The problem is that the function doesn't delete duplicated elements unless they are together in the list.

Just use nubBy, and specify an equality function that compares things the way you want.
And I guess reverse the list a couple of times if you want to keep the last element instead of the first.

Slightly doctored version of your original code to make it run:
type Regist = [String]
type ListRe = [Regist]
rmDup :: ListRe -> ListRe
rmDup [] = []
rmDup (x:xs) = x : rmDup (filter (\y -> not(x == y)) xs)
Result:
*Main> rmDup [["a", "b"], ["a", "d"], ["a", "b"]]
[["a","b"],["a","d"]]

Anon is correct: nubBy is the function you are looking for, and can be found in Data.List.
That said, you want a function rem which accepts a list xs and a function f :: a -> a -> Bool (on which elements are compared for removal from xs). Since the definition is recursive, you need a base case and a recursive case.
In the base case xs = [] and rem f xs = [], since the result of removing all duplicate elements from [] is []:
rem :: Eq a => (a -> a -> Bool) -> [a] -> [a]
rem f [] = []
In the recursive case, xs = (a:as). Let as' be the list obtained by removing all elements a' such that f a a' = True from the list as. This is simply the function filter (\a' -> not $ f a a') applied to the list as. Them rem f (a:as) is the result of recursively calling rem f on as', that is, a : rem f as':
rem f (a:as) = a : rem f $ filter (\a' -> not $ f a a') as
Replace f be a function comparing your list elements for the appropriate equality (e-mail addresses).

While nubBy with two reverse's is probably the best among simple solutions (and probably exactly what Justin needs for his task), one should not forget that it isn't the ideal solution in terms of efficiency - after all nubBy is O(n^2) (in the "worst case" - when there are no duplicates). Two reverse's will also take their toll (in the form of memory allocation).
For more efficient implementation Data.Map (O(logN) on inserts) can be used as an intermediate "latest non duplicating element" holder (Set.insert replaces older element with newer if there is a collision):
import Data.List
import Data.Function
import qualified Data.Set as S
newtype Regis i e = Regis { toTuple :: (i,[e]) }
selector (Regis (_,(_:a:_))) = a
instance Eq e => Eq (Regis i e) where
(==) = (==) `on` selector
instance Ord e => Ord (Regis i e) where
compare = compare `on` selector
rmSet xs = map snd . sortBy (compare `on` fst) . map toTuple . S.toList $ set
where
set = foldl' (flip (S.insert . Regis)) S.empty (zip [1..] xs)
While nubBy implementation is definitely much simpler:
rmNub xs = reverse . nubBy ((==) `on` (!!1)) . reverse $ xs
on 10M elements list (with lots of duplication - nub should play nice here) there is 3 times difference in terms of running time and 700 times difference in memory usage. Compiled with GHC with -O2 :
input = take 10000000 $ map (take 10) $ permutations [1..]
test1 = rmNub input
test2 = rmSet input
Not sure about the nature of the author's data though (the real data might change the picture).

(Assuming you want to figure out an answer, not just call a library function that does this job for you.)
You get what you ask for. What if h is not equal to head t but is instead equal to the 3rd element of t? You need to write an algorithm that compares h with every element of t, not just the first element.

Why not putting everything in a Map from email to Regist (of course respecting your "keep the newest" rule), and then transform the values of the map back in the list? That's the most efficient way I can think of.

I used Alexei Polkhanov's answer and came to the following, so you can remove duplicates from lists with a type that extends Eq class.
removeDuplicates :: Eq a => [[a]] -> [[a]]
removeDuplicates [] = []
removeDuplicates (x:xs) = x : removeDuplicates (filter (\y -> not (x == y)) xs)
Examples:
*Verdieping> removeDuplicates [[1],[2],[1],[1,2],[1,2]]
[[1],[2],[1,2]]
*Verdieping> removeDuplicates [["a","b"],["a"],["a","b"],["c"],["c"]]
[["a","b"],["a"],["c"]]

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Create function to remove duplicates from Haskell list - haskell

Related

Haskell - Filtering a list of tuples

Finding elements and their indices in a list

Rotate a matrix in Haskell

What is groupBy supposed to do?

Algorithm - How to delete duplicate elements in a Haskell list

Categories

Resources