I need to write a simple function for one of my assignments that should remove all the duplicates from a given list except for the first occurrence of the element in the list.
Here is what I wrote:
remDup :: [Int]->[Int]
remDup []=[]
remDup (x:xs)
| present x xs==True = remDup xs
| otherwise = x:remDup xs
where
present :: Int->[Int]->Bool
present x [] = False
present x (y:ys)
| x==y =True
| otherwise = present x ys
But this code removes the duplicates except for the last occurrence of the element.
That is, if the given list is [1,2,3,3,2], it produces [1,3,2] instead of [1,2,3].
How to do it the other way around?
How about this idea:
remDup [] = []
remDup (x:xs) = x : remDup ( remove x xs )
where remove x xs removes all occurrences of x from the list xs (implementation left as an exercise.)
For every element you encounter, you simply want to check if you have encountered it before; build up a Set of encountered elements and use that to check if an element should be deleted.
remDup :: [Int] -> [Int]
remDup xs = helper S.empty xs
where
helper s [] = []
helper s (x:xs) | S.elem x s = helper xs
| otherwise = x:helper (S.insert x s) xs
You could reverse it, run your current duplicate remover, and then reverse the result.
So this is what I finally came up with after following user5402's advice.
remDup1 [] = []
remDup1 (x:xs) = x:remDup1(remove x xs)
remove x []=[]
remove x (y:ys)
| x==y = remove x ys
| x/=y = y:(remove x ys)
If you care about efficiency, you should think about using HashSet as an auxiliary data structure. Doing that, we can get an average-case complexity of O(n log n) and actually O(n) in practice (source).
import Data.Hashable
import Data.HashSet (HashSet)
import qualified Data.HashSet as HashSet
remDupSet :: (Hashable a, Eq a) => [a] -> [a]
remDupSet l = remDupSetAux HashSet.empty l
where remDupSetAux :: (Hashable a, Eq a) => HashSet a -> [a] -> [a]
remDupSetAux _ [] = []
remDupSetAux s (x:xs) = if x `HashSet.member` s
then remDupSetAux s xs
else x : remDupSetAux (HashSet.insert x s) xs
I just quickly wrote a program to compare the performance of this solution with the top-voted one:
import Data.List
import Data.Hashable
import Data.HashSet (HashSet)
import qualified Data.HashSet as HashSet
import Data.Time.Clock
import Control.DeepSeq
main :: IO ()
main = do
let a = [1..20000] :: [Int]
putStrLn "Test1: 20000 different values"
test "remDup" $ remDup a
test "remDupSet" $ remDupSet a
putStrLn ""
let b = replicate 20000 1 :: [Int]
putStrLn "Test2: one value repeted 20000 times"
test "remDup" $ remDup b
test "remDupSet" $ remDupSet b
test :: (NFData a) => String -> a -> IO ()
test s a = do time1 <- getCurrentTime
time2 <- a `deepseq` getCurrentTime
putStrLn $ s ++ ": " ++ show (diffUTCTime time2 time1)
remDup :: (Eq a) => [a] -> [a]
remDup [] = []
remDup (x:xs) = x : remDup (delete x xs)
remDupSet :: (Hashable a, Eq a) => [a] -> [a]
remDupSet l = remDupSetAux HashSet.empty l
where remDupSetAux :: (Hashable a, Eq a) => HashSet a -> [a] -> [a]
remDupSetAux _ [] = []
remDupSetAux s (x:xs) = if x `HashSet.member` s
then remDupSetAux s xs
else x : remDupSetAux (HashSet.insert x s) xs
As expected, there is a huge difference mainly when there are many distinct values:
Test1: 20000 different values
remDup: 15.79859s
remDupSet: 0.007725s
Test2: one value repeted 20000 times
remDup: 0.001084s
remDupSet: 0.00064s
Related
I have created a program to remove first smallest element but I dont how to do for second largest:
withoutBiggest (x:xs) =
withoutBiggestImpl (biggest x xs) [] (x:xs)
where
biggest :: (Ord a) => a -> [a] -> a
biggest big [] = big
biggest big (x:xs) =
if x < big then
biggest x xs
else
biggest big xs
withoutBiggestImpl :: (Eq a) => a -> [a] -> [a] -> [a]
withoutBiggestImpl big before (x:xs) =
if big == x then
before ++ xs
else
withoutBiggestImpl big (before ++ [x]) xs
Here is a simple solution.
Prelude> let list = [10,20,100,50,40,80]
Prelude> let secondLargest = maximum $ filter (/= (maximum list)) list
Prelude> let result = filter (/= secondLargest) list
Prelude> result
[10,20,100,50,40]
Prelude>
A possibility, surely not the best one.
import Data.Permute (rank)
x = [4,2,3]
ranks = rank (length x) x -- this gives [2,0,1]; that means 3 (index 1) is the second smallest
Then:
[x !! i | i <- [0 .. length x -1], i /= 1]
Hmm.. not very cool, let me some time to think to something better please and I'll edit my post.
EDIT
Moreover my previous solution was wrong. This one should be correct, but again not the best one:
import Data.Permute (rank, elems, inverse)
ranks = elems $ rank (length x) x
iranks = elems $ inverse $ rank (length x) x
>>> [x !! (iranks !! i) | i <- filter (/=1) ranks]
[4,2]
An advantage is that this preserves the order of the list, I think.
Here is a solution that removes the n smallest elements from your list:
import Data.List
deleteN :: Int -> [a] -> [a]
deleteN _ [] = []
deleteN i (a:as)
| i == 0 = as
| otherwise = a : deleteN (i-1) as
ntails :: Int -> [a] -> [(a, Int)] -> [a]
ntails 0 l _ = l
ntails n l s = ntails (n-1) (deleteN (snd $ head s) l) (tail s)
removeNSmallest :: Ord a => Int -> [a] -> [a]
removeNSmallest n l = ntails n l $ sort $ zip l [0..]
EDIT:
If you just want to remove the 2nd smallest element:
deleteN :: Int -> [a] -> [a]
deleteN _ [] = []
deleteN i (a:as)
| i == 0 = as
| otherwise = a : deleteN (i-1) as
remove2 :: [a] -> [(a, Int)] -> [a]
remove2 [] _ = []
remove2 [a] _ = []
remove2 l s = deleteN (snd $ head $ tail s) l
remove2Smallest :: Ord a => [a] -> [a]
remove2Smallest l = remove2 l $ sort $ zip l [0..]
It was not clear if the OP is looking for the biggest (as the name withoutBiggest implies) or what. In this case, one solution is to combine the filter :: (a->Bool) -> [a] -> [a] and maximum :: Ord a => [a] -> a functions from the Prelude.
withoutBiggest l = filter (/= maximum l) l
You can remove the biggest elements by first finding it and then filtering it:
withoutBiggest :: Ord a => [a] -> [a]
withoutBiggest [] = []
withoutBiggest xs = filter (/= maximum xs) xs
You can then remove the second-biggest element in much the same way:
withoutSecondBiggest :: Ord a => [a] -> [a]
withoutSecondBiggest xs =
case withoutBiggest xs of
[] -> xs
rest -> filter (/= maximum rest) xs
Assumptions made:
You want each occurrence of the second-biggest element removed.
When there is zero/one element in the list, there isn't a second element, so there isn't a second-biggest element. Having the list without an element that isn't there is equivalent to having the list.
When the list contains only values equivalent to maximum xs, there also isn't a second-biggest element even though there may be two or more elements in total.
The Ord type-class instance implies a total ordering. Otherwise you may have multiple maxima that are not equivalent; otherwise which one is picked as the biggest and second-biggest is not well-defined.
Convert string to multi set as the example below:
"bacaba" --> [(b,2),(a,3),(c,1)]
type MSet a = [(a,Int)]
convert :: Eq a => [a] -> MSet
what is wrong with my code and what is the better way to do it? ty
convert :: Eq a => [a] -> MSet a
convert [] = []
convert (x:xs) = ((x,1+count x xs) : converte xs)
where count x [] = 0
count x (y:ys) = if (x == y) then 1 + count x ys else count x ys
what is the better way to do it?
Your code performs O(n^2); If the type is an instance of Ord type class (as in your example), using Data.Map you may get an O(n log n) performance:
import Data.Map (toList, fromListWith)
convert :: (Ord a) => [a] -> [(a, Int)]
convert xs = toList . fromListWith (+) . zip xs $ repeat 1
This would result in the right counts but the list would be sorted by the keys:
\> convert "bacaba"
[('a',3),('b',2),('c',1)]
If you need to preserve the order, then
import qualified Data.Map as M
import Data.Map (delete, fromListWith)
convert :: (Ord a) => [a] -> [(a, Int)]
convert xs = foldr go (const []) xs . fromListWith (+) . zip xs $ repeat 1
where
go x f cnt = case M.lookup x cnt of
Just i -> (x, i): f (x `delete` cnt)
Nothing -> f cnt
which would output:
\> convert "bacaba"
[('b',2),('a',3),('c',1)]
You are almost there.
The problem is with your recursive call to the convert function. Since you have already computed the number of characters for a particular character, you don't need to calculate them again. Just use a filter function to remove that character out while calling to convert:
convert :: Eq a => [a] -> MSet a
convert [] = []
convert (x:xs) = (x,1+count x xs) : convert (filter (\y -> y /= x) xs)
where count x [] = 0
count x (y:ys) = if (x == y) then 1 + count x ys else count x ys
Or written more concisely:
convert :: Eq a => [a] -> MSet a
convert [] = []
convert (x:xs) = (x,1+count x xs) : convert (filter (/= x) xs)
where count x [] = 0
count x (y:ys) = if (x == y) then 1 + count x ys else count x ys
Demo in ghci:
ghci| > convert "bacaba"
[('b',2),('a',3),('c',1)]
here's my question:
How to extract the same elements from two equal length lists to another list?
For example: given two lists [2,4,6,3,2,1,3,5] and [7,3,3,2,8,8,9,1] the answer should be [1,2,3,3]. Note that the order is immaterial. I'm actually using the length of the return list.
I tried this:
sameElem as bs = length (nub (intersect as bs))
but the problem is nub removes all the duplications. The result of using my function to the former example is 3 the length of [1,3,2] instead of 4 the length of [1,3,3,2]. Is there a solution? Thank you.
Since the position seems to be irrelevant, you can simply sort the lists beforehand and then traverse both lists:
import Data.List (sort)
intersectSorted :: Ord a => [a] -> [a] -> [a]
intersectSorted (x:xs) (y:ys)
| x == y = x : intersectSorted xs ys
| x < y = intersectSorted xs (y:ys)
| x > y = intersectSorted (x:xs) ys
intersectSorted _ _ = []
intersect :: Ord a => [a] -> [a] -> [a]
intersect xs ys = intersectSorted (sort xs) (sort ys)
Note that it's also possible to achieve this with a Map:
import Data.Map.Strict (fromListWith, assocs, intersectionWith, Map)
type Counter a = Map a Int
toCounter :: Ord a => [a] -> Counter a
toCounter = fromListWith (+) . flip zip (repeat 1)
intersectCounter :: Ord a => Counter a -> Counter a -> Counter a
intersectCounter = intersectionWith min
toList :: Counter a -> [a]
toList = concatMap (\(k,c) -> replicate c k) . assocs
intersect :: Ord a => [a] -> [a] -> [a]
intersect xs ys = toList $ intersectCounter (toCounter xs) (toCounter ys)
You could write a function for this. There is probably a more elegant version of this involving lambda's or folds, but this does work for your example:
import Data.List
same (x:xs) ys = if x `elem` ys
then x:same xs (delete x ys)
else same xs ys
same [] _ = []
same _ [] = []
The delete x ys in the then-clause is important, without that delete command items from the first list that occur at least once will be counted every time they're encountered.
Note that the output is not sorted, since you were only interested in the length of the resulting list.
import Data.List (delete)
mutuals :: Eq a => [a] -> [a] -> [a]
mutuals [] _ = []
mutuals (x : xs) ys | x `elem` ys = x : mutuals xs (delete x ys)
| otherwise = mutuals xs ys
gives
mutuals [2,4,6,3,2,1,3,5] [7,3,3,2,8,8,9,1] == [2,3,1,3]
I want to filter a string with a string.
What I want is to use delete every first occurring char.
myFunc :: String -> String -> String
Like:
myFunc "dddog" "bigdddddog" = "biddg"
In "dddog": 3x d, 1x o, 1x g
In the second string it removed 3x d, 1x o and 1x g
So the output: biddg
I can't use filter for it, because it will delete all occurring chars.
And I struggled a long time with it.
Thanks in advance:)
How about
Prelude> :m +Data.List
Prelude Data.List> "bigdddddog" \\ "dddog"
"biddg"
Not the nicest solution, but you can understand easier what's going on:
myfunc :: String -> String -> String
myfunc [] xs = xs
myfunc (x:xs) ys = myfunc xs $ remove x ys
where
remove _ [] = []
remove x (y:ys) = if x == y then ys else y : remove x ys
As you commented, you want to use guards. Do you mean this?
myfunc :: String -> String -> String
myfunc [] xs = xs
myfunc (x:xs) ys = myfunc xs $ remove x ys
remove :: Char -> String -> String
remove _ [] = []
remove x (y:ys)
| x == y = ys
| otherwise = y : remove x ys
some of the other solutions don't seem to produce the same result you posted. I think I have a simple solution that does what you asked for but I may be misunderstanding what you want. All I do in the following code is go though the list and apply 'delete' to every element in the list. It's not exactly efficient but it gets the job done.
import Data.List
myFunc (x:xs) ys = myFunc xs (delete x ys)
myFunc [] ys = ys
There are perhaps more efficient solutions like storing the "to remove" list in a tree with the number of occurences stored as the value then traversing the main list testing to see if the count at that key was still greater than zero. I think that would give you O(n*lg(m)) (where n is the size of the list to be removed from and m is the size of the "to remove" list) rather than O(n*m) as is the case above. This version could also be maid to be lazy I think.
edit:
Here is the tree version I was talking abut using Data.Map. It's a bit complex but should be more efficient for large lists and it is somewhat lazy
myFunc l ys = myFunc' (makeCount l) ys
where makeCount xs = foldr increment (Map.fromList []) xs
increment x a = Map.insertWith (+) x 1 a
decrement x a = Map.insertWith (flip (-)) x 1 a
getCount x a = case Map.lookup x a of
Just c -> c
Nothing -> 0
myFunc' counts (x:xs) = if (getCount x counts) > 0
then myFunc' (decrement x counts) xs
else x : myFunc' counts xs
myFunc' _ [] = []
I am not quite sure about how you want your function to behave, how about this?
import Data.List (isPrefixOf)
myFunc :: String -> String -> String
myFunc _ [] = []
myFunc y x'#(x:xs) | y `isPrefixOf` x' = drop (length y) x'
| otherwise = x : myFilter xs y
This gives the following output in GHCi:
> myFunc "dddog" "bigdddddog"
> "bigdd"
If this is not what you had in mind, please give another input/output example.
I like kaan's elegant solution. In case you meant this...here's one where the "ddd" would only be removed if matched as a whole:
import Data.List (group,isPrefixOf,delete)
f needles str = g (group needles) str where
g needles [] = []
g needles xxs#(x:xs)
| null needle' = [x] ++ g needles xs
| otherwise = let needle = head needle'
in g (delete needle needles) (drop (length needle) xxs)
where needle' = dropWhile (not . flip isPrefixOf xxs) needles
Output:
*Main> f "dddog" "bigdddddog"
"biddg"
*Main> f "dddog" "bdigdogd"
"bdidgd"
No monadic solution yet, there you go:
import Control.Monad.State
myFunc :: String -> State String String
myFunc [] = return ""
myFunc (x:xs) = get >>= f where
f [] = return (x:xs)
f (y:ys) = if y == x then put ys >> myFunc xs
else myFunc xs >>= return . (x:)
main = do
let (a,b) = runState (myFunc "bigdddddog") "dddog" in
putStr a
Using predefined functions from Data.List,
-- mapAccumL :: (acc -> x -> (acc, y)) -> acc -> [x] -> (acc, [y])
-- lookup :: (Eq a) => a -> [(a, b)] -> Maybe b
{-# LANGUAGE PatternGuards #-}
import Data.List
picks [] = [] -- http://stackoverflow.com/a/9889702/849891
picks (x:xs) = (x,xs) : [ (y,x:ys) | (y,ys) <- picks xs]
myFunc a b = concat . snd $ mapAccumL f (picks a) b
where
f acc x | Just r <- lookup x acc = (picks r,[])
f acc x = (acc,[x])
Testing:
Prelude Data.List> myFunc "dddog" "bigdddddog"
"biddg"
edit: this is of course a bit more complex than (\\). I'll let it stand as an illustration. There could be some merit to it still, as it doesn't copy the 2nd (longer?) string over and over, for each non-matching character from the 1st (shorter) string, as delete apparently does, used in (\\) = foldl (flip delete).
I am very new to Haskell. I am trying to write code in Haskell that finds the first duplicate element from the list, and if it does not have the duplicate elements gives the message no duplicates. I know i can do it through nub function but i am trying to do it without it.
This is one way to do it:
import qualified Data.Set as Set
dup :: Ord a => [a] -> Maybe a
dup xs = dup' xs Set.empty
where dup' [] _ = Nothing
dup' (x:xs) s = if Set.member x s
then Just x
else dup' xs (Set.insert x s)
dupString :: (Ord a, Show a) => [a] -> [Char]
dupString x = case dup x of
Just x -> "First duplicate: " ++ (show x)
Nothing -> "No duplicates"
main :: IO ()
main = do
putStrLn $ dupString [1,2,3,4,5]
putStrLn $ dupString [1,2,1,2,3]
putStrLn $ dupString "HELLO WORLD"
Here is how it works:
*Main> main
No duplicates
First duplicate: 1
First duplicate: 'L'
This is not the your final answer, because it does unnecessary work when an element is duplicated multiple times instead of returning right away, but it illustrates how you might go about systematically running through all the possibilities (i.e. "does this element of the list have duplicates further down the list?")
dupwonub :: Eq a => [a] -> [a]
dupwonub [] = []
dupwonub (x:xs) = case [ y | y <- xs, y == x ] of
(y:ys) -> [y]
[] -> dupwonub xs
In case you are still looking into Haskell I thought you might like a faster, but more complicated, solution. This runs in O(n) (I think), but has a slightly harsher restriction on the type of your list, namely has to be of type Ix.
accumArray is an incredibly useful function, really recommend looking into it if you haven't already.
import Data.Array
data Occurances = None | First | Duplicated
deriving Eq
update :: Occurances -> a -> Occurances
update None _ = First
update First _ = Duplicated
update Duplicated _ = Duplicated
firstDup :: (Ix a) => [a] -> a
firstDup xs = fst . first ((== Duplicated).snd) $ (map g xs)
where dupChecker = accumArray update None (minimum xs,maximum xs) (zip xs (repeat ()))
g x = (x, dupChecker ! x)
first :: (a -> Bool) -> [a] -> a
first _ [] = error "No duplicates master"
first f (x:xs) = if f x
then x
else first f xs
Watch out tho, an array of size (minimum xs,maximum xs) could really blow up your space requirements.