Filter Duplicate elements from a [[String]] Haskell - haskell

I have a list in the form [["A1","A1","A1"] .. ["G3","G3","G3"]] which contains many duplicate elements like ["A1","A2","A3"] and ["A3","A2","A1"].
How do I filter out such duplicate elements?
if check the above two elements for equality, it shows false
*Main> ["A1","A2","A3"] == ["A3","A2","A1"]
False

nubBy :: (a -> a -> Bool) -> [a] -> [a] is a relevant function that removes duplicates from a list via an arbitrary equality test.
A version of the function you're looking for is:
import Data.List (sort, nubBy)
removeDuplicates' :: Ord a => [[a]] -> [[a]]
removeDuplicates' = nubBy (\l1 l2 = sort l1 == sort l2)
Of course, this does require that a is an Ord, not just an Eq, as well as using sort, which is (as stated below) an expensive function. So it is certainly not ideal. However, I don't know specifically how you want to do the equality tests on those lists, so I'll leave the details to you.

#AJFarmar's answer solves the issue. But it can be done a bit more efficient: since sort is an expensive function. We want to save on such function calls.
We can use:
import Data.List(nubBy, sort)
import Data.Function(on)
removeDuplicates' :: Ord a => [[a]] -> [[a]]
removeDuplicates' = map snd . nubBy ((==) `on` fst) . map ((,) =<< sort)
what we here do is first construct a map ((,) =<< sort). This means that for every element x in the original list, we construct a tuple (sort x,x). Now we will perform a nubBy on the first elements of the two tuples we want to sort. After we have sorted, we will perform a map snd where we - for every tuple (sort x,x) return the second item.
We can generalize this by constructing a nubOn function:
import Data.List(nubBy)
import Data.Function(on)
nubOn :: Eq b => (a -> b) -> [a] -> [a]
nubOn f = map snd . nubBy ((==) `on` fst) . map ((,) =<< f)
In that case removeDuplicates' is nubOn sort.

You may not even need to sort. You just need to see if all items are the same like;
\xs ys -> length xs == (length . filter (== True) $ (==) <$> xs <*> ys)
you just need to know that (==) <$> ["A1","A2","A3"] <*> ["A3","A2","A1"] would in fact return [False,False,True,False,True,False,True,False,False]
As per #rampion 's rightful comment let's take it further and import Data.Set then it gets pretty dandy.
import Data.Set as S
equity :: Ord a => [a] -> [a] -> Bool
equity = (. S.fromList) . (==) . S.fromList
*Main> equity ["A1","A2","A3"] ["A3","A2","A1"]
True

Related

Given a list, how can I perform some transformation only on sub-lists whose each two elements satisfy a binary predicate?

(In my actual use case I have a list of type [SomeType], SomeType having a finite number of constructors, all nullary; in the following I'll use String instead of [SomeType] and use only 4 Chars, to simplify a bit.)
I have a list like this "aaassddddfaaaffddsssadddssdffsdf" where each element can be one of 'a', 's', 'd', 'f', and I want to do some further processing on each contiguous sequence of non-as, let's say turning them upper case and reversing the sequence, thus obtaining "aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD". (I've added the reversing requirement to make it clear that the processing involves all the contiguous non 'a'-s at the same time.)
To turn each sub-String upper case, I can use this:
func :: String -> String
func = reverse . map Data.Char.toUpper
But how do I run that func only on the sub-Strings of non-'a's?
My first thought is that Data.List.groupBy can be useful, and the overall solution could be:
concat $ map (\x -> if head x == 'a' then x else func x)
$ Data.List.groupBy ((==) `on` (== 'a')) "aaassddddfaaaffddsssadddssdffsdf"
This solution, however, does not convince me, as I'm using == 'a' both when grouping (which to me seems good and unavoidable) and when deciding whether I should turn a group upper case.
I'm looking for advices on how I can accomplish this small task in the best way.
You could classify the list elements by the predicate before grouping. Note that I’ve reversed the sense of the predicate to indicate which elements are subject to the transformation, rather than which elements are preserved.
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Arrow ((&&&))
import Data.Function (on)
import Data.Monoid (First(..))
mapSegmentsWhere
:: forall a. (a -> Bool) -> ([a] -> [a]) -> [a] -> [a]
mapSegmentsWhere p f
= concatMap (applyMatching . sequenceA) -- [a]
. groupBy ((==) `on` fst) -- [[(First Bool, a)]]
. map (First . Just . p &&& id) -- [(First Bool, a)]
where
applyMatching :: (First Bool, [a]) -> [a]
applyMatching (First (Just matching), xs)
= applyIf matching f xs
applyIf :: forall a. Bool -> (a -> a) -> a -> a
applyIf condition f
| condition = f
| otherwise = id
Example use:
> mapSegmentsWhere (/= 'a') (reverse . map toUpper) "aaassddddfaaaffddsssadddssdffsdf"
"aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD"
Here I use the First monoid with sequenceA to merge the lists of adjacent matching elements from [(Bool, a)] to (Bool, [a]), but you could just as well use something like map (fst . head &&& map snd). You can also skip the ScopedTypeVariables if you don’t want to write the type signatures; I just included them for clarity.
If we need to remember the difference between the 'a's and the rest, let's put them in different branches of an Either. In fact, let's define a newtype now that we are at it:
{-# LANGUAGE DeriveFoldable #-}
{-# LANGUAGE DeriveFunctor #-}
{-# LANGUAGE ViewPatterns #-}
import Data.Bifoldable
import Data.Char
import Data.List
newtype Bunched a b = Bunched [Either a b] deriving (Functor, Foldable)
instance Bifunctor Bunched where
bimap f g (Bunched b) = Bunched (fmap (bimap f g) b)
instance Bifoldable Bunched where
bifoldMap f g (Bunched b) = mconcat (fmap (bifoldMap f g) b)
fmap will let us work over the non-separators. fold will return the concatenation of the non-separators, bifold will return the concatenation of everything. Of course, we could have defined separate functions unrelated to Foldable and Bifoldable, but why avoid already existing abstractions?
To split the list, we can use an unfoldr that alternately searches for as and non-as with the span function:
splitty :: Char -> String -> Bunched String String
splitty c str = Bunched $ unfoldr step (True, str)
where
step (_, []) = Nothing
step (True, span (== c) -> (as, ys)) = Just (Left as, (False, ys))
step (False, span (/= c) -> (xs, ys)) = Just (Right xs, (True, ys))
Putting it to work:
ghci> bifold . fmap func . splitty 'a' $ "aaassddddfaaaffddsssadddssdffsdf"
"aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD"
Note: Bunched is actually the same as Tannen [] Either from the bifunctors package, if you don't mind the extra dependency.
There are other answers here, but I think they get too excited about iteration abstractions. A manual recursion, alternately taking things that match the predicate and things that don't, makes this problem exquisitely simple:
onRuns :: Monoid m => (a -> Bool) -> ([a] -> m) -> ([a] -> m) -> [a] -> m
onRuns p = go p (not . p) where
go _ _ _ _ [] = mempty
go p p' f f' xs = case span p xs of
(ts, rest) -> f ts `mappend` go p' p f' f rest
Try it out in ghci:
Data.Char> onRuns ('a'==) id (reverse . map toUpper) "aaassddddfaaaffddsssadddssdffsdf"
"aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD"
Here is a simple solution - function process below - that only requires that you define two functions isSpecial and func. Given a constructor from your type SomeType, isSpecial determines whether it is one of those constructors that form a special sublist or not. The function func is the one you included in your question; it defines what should happen with the special sublists.
The code below is for character lists. Just change isSpecial and func to make it work for your lists of constructors.
isSpecial c = c /= 'a'
func = reverse . map toUpper
turn = map (\x -> ([x], isSpecial x))
amalgamate [] = []
amalgamate [x] = [x]
amalgamate ((xs, xflag) : (ys, yflag) : rest)
| xflag /= yflag = (xs, xflag) : amalgamate ((ys, yflag) : rest)
| otherwise = amalgamate ((xs++ys, xflag) : rest)
work = map (\(xs, flag) -> if flag then func xs else xs)
process = concat . work . amalgamate . turn
Let's try it on your example:
*Main> process "aaassddddfaaaffddsssadddssdffsdf"
"aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD"
*Main>
Applying one function at a time, shows the intermediate steps taken:
*Main> turn "aaassddddfaaaffddsssadddssdffsdf"
[("a",False),("a",False),("a",False),("s",True),("s",True),("d",True),
("d",True),("d",True),("d",True),("f",True),("a",False),("a",False),
("a",False),("f",True),("f",True),("d",True),("d",True),("s",True),
("s",True),("s",True),("a",False),("d",True),("d",True),("d",True),
("s",True),("s",True),("d",True),("f",True),("f",True),("s",True),
("d",True),("f",True)]
*Main> amalgamate it
[("aaa",False),("ssddddf",True),("aaa",False),("ffddsss",True),
("a",False),("dddssdffsdf",True)]
*Main> work it
["aaa","FDDDDSS","aaa","SSSDDFF","a","FDSFFDSSDDD"]
*Main> concat it
"aaaFDDDDSSaaaSSSDDFFaFDSFFDSSDDD"
*Main>
We can just do what you describe, step by step, getting a clear simple minimal code which we can easily read and understand later on:
foo :: (a -> Bool) -> ([a] -> [a]) -> [a] -> [a]
foo p f xs = [ a
| g <- groupBy ((==) `on` fst)
[(p x, x) | x <- xs] -- [ (True, 'a'), ... ]
, let (t:_, as) = unzip g -- ( [True, ...], "aaa" )
, a <- if t then as else (f as) ] -- final concat
-- unzip :: [(b, a)] -> ([b], [a])
We break the list into same-p spans and unpack each group with the help of unzip. Trying it out:
> foo (=='a') reverse "aaabcdeaa"
"aaaedcbaa"
So no, using == 'a' is avoidable and hence not especially good, introducing an unnecessary constraint on your data type when all we need is equality on Booleans.

Finding all palindromic word pairs

I came up with an unreal problem: finding all palindromic word pairs in a vocabulary, so I wrote the solution below,
import Data.List
findParis :: Ord a => [[a]] -> [[[a]]]
findPairs ss =
filter ((== 2) . length)
. groupBy ((==) . reverse)
. sortBy (compare . reverse)
$ ss
main = do
print . findPairs . permutations $ ['a'..'c']
-- malfunctioning: only got partial results [["abc","cba"]]
-- expected: [["abc","cba"],["bac","cab"],["bca","acb"]]
Could you help correct it if worthy of trying?
#Solution
Having benefited from #David Young #chi comments the tuned working code goes below,
import Data.List (delete)
import Data.Set hiding (delete, map)
findPairs :: Ord a => [[a]] -> [([a], [a])]
findPairs ss =
let
f [] = []
f (x : xs) =
let y = reverse x
in
if x /= y
then
let ss' = delete y xs
in (x, y) : f ss'
else f xs
in
f . toList
. intersection (fromList ss)
$ fromList (map reverse ss)
import Data.List
import Data.Ord
-- find classes of equivalence by comparing canonical forms (CF)
findEquivalentSets :: Ord b => (a->b) -> [a] -> [[a]]
findEquivalentSets toCanonical =
filter ((>=2) . length) -- has more than one
-- with the same CF?
. groupBy ((((== EQ) .) .) (comparing toCanonical)) -- group by CF
. sortBy (comparing toCanonical) -- compare CFs
findPalindromes :: Ord a => [[a]] -> [[[a]]]
findPalindromes = findEquivalentSets (\x -> min x (reverse x))
This function lets us find many kinds of equivalence as long as we can assign some effectively computable canonical form (CF) to our elements.
When looking for palindromic pairs, two strings are equivalent if one is a reverse of the other. The CF is the lexicographically smaller string.
findAnagrams :: Ord a => [[a]] -> [[[a]]]
findAnagrams = findEquivalentSets sort
In this example, two strings are equivalent if one is an anagram of the other. The CF is the sorted string (banana → aaabnn).
Likewise we can find SOUNDEX equivalents and whatnot.
This is not terribly efficient as one needs to compute the CF on each comparison. We can cache it, at the expense of readability.
findEquivalentSets :: Ord b => (a->b) -> [a] -> [[a]]
findEquivalentSets toCanonical =
map (map fst) -- strip CF
. filter ((>=2) . length) -- has more than one
-- with the same CF?
. groupBy ((((== EQ) .) .) (comparing snd)) -- group by CF
. sortBy (comparing snd) -- compare CFs
. map (\x -> (x, toCanonical x)) -- pair the element with its CF
Here's an approach you might want to consider.
Using sort implies that there's some keying function word2key that yields the same value for both words of a palindromic pair. The first one that comes to mind for me is
word2key w = min w (reverse w)
So, map the keying function over the list of words, sort, group by equality, take groups of length 2, and then recover the two words from the key (using the fact that the key is either equal to the word or its reverse.
Writing that, with a couple of local definitions for clarity, gives:
findPals :: (Ord a, Eq a) => [[a]] -> [[[a]]]
findPals = map (key2words . head) .
filter ((== 2) . length) .
groupBy (==) .
sort .
(map word2key)
where word2key w = min w (reverse w)
key2words k = [k, reverse k]
Edit:
I posted my answer in a stale window without refreshing, so missed the very nice response from n.m. above.
Mea culpa.
So I'll atone by mentioning that both answers are variations on the well-known (in Perl circles) "Schwartzian transform" which itself applies a common Mathematical pattern -- h = f' . g . f -- translate a task to an alternate representation in which the task is easier, do the work, then translate back to the original representation.
The Schwartzian transform tuples up a value with its corresponding key, sorts by the key, then pulls the original value back out of the key/value tuple.
The little hack I included above was based on the fact that key2words is the non-deterministic inverse relation of word2key. It is only valid when two words have the same key, but that's exactly the case in the question, and is insured by the filter.
overAndBack :: (Ord b, Eq c) => (a -> b) -> ([b] -> [c]) -> (c -> d) -> [a] -> [d]
overAndBack f g f' = map f' . g . sort . map f
findPalPairs :: (Ord a, Eq a) => [[a]] -> [[[a]]]
findPalPairs = overAndBack over just2 back
where over w = min w (reverse w)
just2 = filter ((== 2) . length) . groupBy (==)
back = (\k -> [k, reverse k]) . head
Which demos as
*Main> findPalPairs $ words "I saw no cat was on a chair"
[["no","on"],["saw","was"]]
Thanks for the nice question.

Compute the mode of a list

Now I am going to compute the most common item in a list.
I did it step by step.
sort the list
group the list
count how many times of every number
I don't know how to continue...please help. This is the code I have
done now
group :: Eq a => [a] -> [[a]]
group = groupBy (==)
-- | The 'groupBy' function is the non-overloaded version of 'group'.
groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _ [] = []
groupBy eq (x:xs) = (x:ys) : groupBy eq zs
where (ys,zs) = span (eq x) xs
task4 xs = (map (\l#(x:xs) -> (x,length l)) (group (sortList xs)))
<main> task4 [1,1,1,2,2,3]
[(1,3),(2,2),(3,1)]
Let's first assume you're only interested in how often the mode turns up. So you'd just use
map length . group $ sortList xs
giving you a list of lengths of the individual groups. Then what's left to do is retrieve the maximum.
Now, that's not all you want. Basically, you want the maximum of your tuple list, but it should compare only the length field, not the element one. You might thus be temped to hoogle for Ord b => [(a, b)] -> a (or -> (a,b)), but there is no such standard function.
However, as you've already searched for maximum and a special form of this is just what you want, you can scroll down that result page, and you'll find maximumBy. It allows you to specify which "property" should be considered for comparison.
The preferred way to use this is the rather self-explanatory
GHCi> :m +Data.Ord Data.List
GHCi> :t maximumBy (comparing snd)
maximumBy (comparing snd) :: Ord a => [(a1, a)] -> (a1, a)
So we're handling the information of how to access the crucial field as an argument to the function. Well, but as snd is itself a function, we can as well use any other one! So there's no real need to build up the elem+runlength tuples in the first place, you might as well just do
maximumBy (comparing length) . group $ sort xs
which gives the proper result (though that's a bit less efficient). In short:
mode :: Ord a => [a] -> a
mode = maximumBy (comparing length) . group . sort
You're very close. If instead of:
(\l#(x:xs) -> (x,length l))
you wrote:
(\l#(x:xs) -> (length l,x))
then you could simply take maximum of your output:
maximum [(3,1),(2,2),(1,3)] == (3,1)

How can I filter this list?

I have [a] that can be converted to [b]. Each a is distinct, but each b may not be. I want to filter my [a] on the condition that the filtered [a] contains no duplicates when converted to [b].
Can someone help me to achieve this?
Edit
To serve as assistance, I'll provide an example.
as = [1..10]
conv = even
bs = map even as
-- bs = [False,True,False,True,False,True,False,True,False,True]
-- filter <cond> as -- [1,2]
Assume that f is the function that converts from a to b. You can then proceed in three steps:
You pair each element of your list with its image under f: map (id &&& f);
You remove every pair of which the second element has already appeared in the now obtained list: nubBy (on (==) snd);
You drop the second component of each pair: map fst.
Hence:
import Control.Arrow ((&&&))
import Data.Function (on)
import Data.List (nubBy)
filterOn :: Eq b => (a -> b) -> [a] -> [a]
filterOn f = map fst . nubBy ((==) `on` snd) . map (id &&& f)
For example:
> filterOn even [1 .. 10]
[1,2]
It is impossible to do this with a cond :: a -> Bool function and solely filter, i.e. with (filter cond) [1..10] yielding [1,2].
The problem is that filter looks at each element in your array exactly once, and you have no information about previous elements.

Function to show the lowest represented element in a list

If you have a list such as this in Haskell:
data TestType = A | B | C deriving (Ord, Eq, Show)
List1 :: [TestType]
List1 = [A,B,C,B,C,A,B,C,C,C]
Is it possible to write a function to determin which element is represented the least in a list (so in this case 'A')
My initial thought was to write a helper function such as this but now I am not sure if this is the right approach:
appears :: TestType -> [TestType] -> Int
appears _ [] = 0
appears x (y:ys) | x==y = 1 + (appears x ys)
| otherwise = appears x ys
I am still fairly new to Haskell, so apologies for the potentially silly question.
Many thanks
Slightly alternative version to Matt's approach
import Data.List
import Data.Ord
leastFrequent :: Ord a => [a] -> a
leastFrequent = head . minimumBy (comparing length) . group . sort
You can build a map counting how often each item occurs in the list
import qualified Data.Map as Map
frequencies list = Map.fromListWith (+) $ zip list (repeat 1)
Then you can find the least/most represented using minimumBy or maximumBy from Data.List on the list of Map.assocs of the frequency map, or even sort it by frequency using sortBy.
module Frequencies where
import Data.Ord
import Data.List
import qualified Data.Map as Map
frequencyMap :: Ord a => [a] -> Map.Map a Int
frequencyMap list = Map.fromListWith (+) $ zip list (repeat 1)
-- Caution: leastFrequent will cause an error if called on an empty list!
leastFrequent :: Ord a => [a] -> a
leastFrequent = fst . minimumBy (comparing snd) . Map.assocs . frequencyMap
ascendingFrequencies :: Ord a => [a] -> [(a,Int)]
ascendingFrequencies = sortBy (comparing snd) . Map.assocs . frequencyMap
Here's another way to do it:
sort the list
group the list
find the length of each group
return the group with the shortest length
Example:
import GHC.Exts
import Data.List
fewest :: (Eq a) => [a] -> a
fewest xs = fst $ head sortedGroups
where
sortedGroups = sortWith snd $ zip (map head groups) (map length groups)
groups = group $ sort xs
A less elegant idea would be:
At first sort and group the list
then pairing the cases with their number of representations
at last sort them relative to their num of representations
In code this looks like
import Data.List
sortByRepr :: (Ord a) => [a] ->[(a,Int)]
sortByRepr xx = sortBy compareSnd $ map numOfRepres $ group $ sort xx
where compareSnd x y = compare (snd x) (snd y)
numOfRepres x = (head x, length x)
the least you get by applying head to the resulting list.

Resources