Print elements of list that are repeated in Haskell - haskell

I want to print those elements that appear more than once in the list. can you please tell me how can I do that.. I am new to haskell.
for example if I have [1,2,3,3,2,4,5,6,5] that i want to get only [2,3,5] because these are the repeated elements in list.

Another solution: First sort the list, then group equal elements and take only the ones that appear multiple times:
>>> :m + Data.Maybe Data.List
>>> let xs = [1..100000] ++ [8,18..100] ++ [10,132,235]
>>> let safeSnd = listToMaybe . drop 1
>>> mapMaybe safeSnd $ group $ sort xs
[8,10,18,28,38,48,58,68,78,88,98,132,235]
group $ sort xs is a list of lists where each list contains all equal elements.
mapMaybe safe2nd returns only those lists that have a 2nd element (= the orignal element occured more than once in the orginal list).
This is method should be faster than the one using nub, especially for large lists.

Data.Map.Lazy and Data.Map.Strict are host to a bunch of interesting functions for constructing maps (association maps, dictionaries, whatever you want to call them). One of them is fromListWith
fromListWith :: Ord k => (a -> a -> a) -> [(k, a)] -> Map k a
What you want to build is a map that tells you, for each value in your input list, how often it occurs. The values would be the keys of the map (type k), their counts would be the values associated with the keys (type a). You could use the following expression for that:
fromListWith (+) . map (\x -> (x, 1))
First, all values in the list are put into a tuple, together with a count of one. Then, fromListWith builds a map from the list; if a key already exists, it computes a new count using (+).
Once you've done this, you're only interested in the elements that occur more than once. For this, you can use filter (> 1) from Data.Map.
Finally, you just want to know all keys that remain in the map. Use the function keys for this.
In the end, you get the following module:
import qualified Data.Map.Strict as M
findDuplicates :: (Ord a) => [a] -> [a]
findDuplicates
= M.keys
. M.filter (> 1)
. M.fromListWith (+)
. map (\x -> (x, 1 :: Integer))
It's common practice to import certain packages like Data.Map qualified, to avoid name conflicts between modules (e.g. filter from Data.Map and the one from Prelude are very different). In this situation, it's best to choose Data.Map.Strict; see the explanation at the top of Data.Map.
The complexity of this method should be O(n log n).
I thought it could be optimized by using a boolean flag to indicate that the value is a duplicate. However, this turned out to be about 20% slower.

You're basically looking for the list of elements that are not unique, or in other words, the difference between the original list and the list of unique elements. In code:
xs \\ (nub xs)
If you don't want to have duplicates in the result list, you'll want to call nub again:
nub $ xs \\ (nub xs)

Related

Reordering the search space

I try to find the first occurrence in the search space ordered by descending which satisfy a some predicate.
This strategy was chosen because calculating the predicate can be quite expensive, and the probability of finding a solution among the former is quite high.
Here is the solution, which first builds a list of all possible solutions, then arranges and produces a linear search.
import Data.Ord
import Data.List
search :: (Ord a, Num a) => ([a] -> Bool) -> [[a]] -> Maybe [a]
search p = find p . sortOn (Down . sum) . sequence
Example
main = print $ search ((<25) . sum) [[10,2], [10,8,6], [8]]
Output
Just [10,6,8]
Question
Is there a way to generate elements of this space in descending order without sorting?
The exact case described
In this exact case, there is a clear best element in the space, and if any element matches the predicate then the best one does:
-- I have, over the years, found many uses for ensure
ensure p x = x <$ guard p x
search p = ensure p . map minimum
(<25) . sum is a placeholder, but Down . sum is exact
If your predicate is just an example, but your heuristic is really summing, you can use a priority queue to search through the space. For simplicity, I'll use [(b,a)] as my priority queue with priorities b and values a, maintaining the invariant that the list is sorted by b. Of course you should use a better implementation if you want efficiency.
And now we basically just reimplement sequence to produce its elements in priority order and maintain the sum of the lists it produces as their priority. Introducing the priority queue invariant is a small one-time cost up front.
import Data.List
import Data.Ord
increasingSums :: (Ord a, Num a) => [[a]] -> [[a]]
increasingSums = map snd . go . map sort where
go [] = [(0,[])]
go (xs:xss) = let recurse = go xss in mergeOn fst
[ [ (total+h, h:ts)
| (total, ts) <- recurse
]
| h <- xs
]
The only thing missing is mergeOn, which flattens a collection of priority queues into a single one:
mergeOn :: Ord b => (a -> b) -> [[a]] -> [a]
mergeOn f = go . sortOn (f . head) . filter (not . null) where
go [] = []
go ([x]:xss) = x : go xss
go ((x:xs):xss) = x : go (insertBy (comparing (f . head)) xs xss)
Testing in ghci, we can see that this expression finishes in a non-stupid amount of time:
> take 10 . increasingSums . replicate 4 $ [1..1000]
[[1,1,1,1],[2,1,1,1],[1,2,1,1],[1,1,2,1],[1,1,1,2],[2,1,1,2],[1,2,1,2],[1,1,2,2],[1,1,1,3],[2,1,2,1]]
Whereas this expression does not:
> take 10 . sortOn sum . sequence . replicate 4 $ [1..1000]
^C^C^C^COMG how do I quit
Meanwhile it is also competitive for producing the complete list of sums in sorted order (at least before compilation, I didn't test whether the optimized versions are also about equal):
> :set +s
> sum . map sum . increasingSums . replicate 4 $ [1..30]
50220000
(1.99 secs, 1,066,135,432 bytes)
> sum . map sum . sortOn sum . sequence . replicate 4 $ [1..30]
50220000
(2.60 secs, 2,226,497,344 bytes)
Down . sum is a placeholder
Finally, if your heuristic is just an example, and you want a fully general solution that will work for all heuristics, you're out of luck. Doing a structured walk through your search space requires knowing something special about that structure to exploit. (For example, above we know that if x<y then total+x<total+y, and we exploit this to cheaply maintain our priority queue.)

Grouping by function value into Multimap

Assuming I have a list of values like this:
["abc","abd","aab","123"]
I want to group those values into a MultiMap (conceptually, not limited to a specific data structure) in Haskell by using a function that maps any element to a key.
For this example, we shall use take 2 as a mapper.
The result I intend to get is (conceptually, as JSON):
{"ab":["abc","abd"], "aa":["aab"], "12":["123"]}
In this example I will use [(String, [String])] as a Multimap data structure.
My basic idea (conceptually):
let datalist = ["abc","abd","aab","123"]
let mapfn = take 2
let keys = nub $ map mapfn datalist
let valuesForKey key = filter ((==key).mapfn) datalist
let resultMultimap = zip keys $ map valuesForKey keys
My question:
Is there any better way (in base or external packages) to do this? I want to avoid custom code.
If 1) is not applicable, is there any guarantee that GHC will optimize this so one pass over the data list is sufficient to generate the full multimap (as opposed to one filter run per key)?
Conceptually, this question is similar to the SQL GROUP BY statement.
Using fromListWith from Data.Map:
> let xs = ["abc","abd","aab","123"]
> let f = take 2
> Data.Map.fromListWith (++) [(f x, [x]) | x <- xs]
fromList [("12",["123"]),("aa",["aab"]),("ab",["abd","abc"])]
Edit 2014-03-28: My functions have now been published on Hackage, see group-with
Pull requests are welcome!
Based on hammar's excellent answer I put together two reusable functions to solve this problem.
groupWith solves exactly what I asked for. groupWithMulti generalizes the concept by allowing the identifier-generating function (e.g. take 2 in my example) to return multiple identifiers for a single value (where the value is, in my example, one of ["abc","abd","aab","123"]), or none at all.
The value will be added to the Map value for any identifier generated by f.
import Data.Map (Map)
import qualified Data.Map as Map
-- | Group values in a list by their identifier, being returned
-- by a given function. The resulting map contains,
-- for each generated identifier the values (from the original list)
-- that yielded said identifier by using the function
groupWith :: (Ord b) => (a -> b) -> [a] -> (Map b [a])
groupWith f xs = Map.fromListWith (++) [(f x, [x]) | x <- xs]
-- | Like groupWith, but the identifier-generating function
-- may generate multiple outputs (or even none).
-- The corresponding value from the original list will be placed
-- in the identifier-corresponding map entry for each generated
-- identifier
groupWithMulti :: (Ord b) => (a -> [b]) -> [a] -> (Map b [a])
groupWithMulti f xs =
let identifiers x = [(val, [x]) | val <- f x]
in Map.fromListWith (++) $ concat [identifiers x | x <- xs]
Simply use Map.toList to convert the results of these functions back to a tuple list.
When I have some spare time, I will attempt to create a generalized library on Hackage out of this approach on in-memory data grouping.

Working with list of tuples

I've been trying to solve this, but I just can't figure it out. So, I've a list with tuples, for example:
[("Mary", 10), ("John", 45), ("Bradley", 30), ("Mary", 15), ("John", 10)]
and what I want to get is a list with also tuples where, if the name is the same, the numbers of those tuples should be added and, if not, that tuple must be part of the final list too, exemplifying:
[("Mary",25), ("John", 55), ("Bradley", 30)]
I don't know if I explained myself really well, but I think you'll probably understand with the examples.
I've tried this, but it doesn't work:
test ((a,b):[]) = [(a,b)]
test ((a,b):(c,d):xs) | a == c = (a,b+d):test((a,b):xs)
| otherwise = (c,d):test((a,b):xs)
Doing this sort of thing is always awkward with lists, because of their sequential nature--they don't really lend themselves to operations like "find matching items" or "compute a new list by combining specific combinations of list elements" or other things that are by nature non-sequential.
If you step back for a moment, what you really want to do here is, for each distinct String in the list, find all the numbers associated to it and add them up. This sounds more suited to a key-value style data structure, for which the most standard in Haskell is found in Data.Map, which gives you a key-value map for any value type and any ordered key type (that is, an instance of Ord).
So, to build a Map from your list, you can use the fromList function in Data.Map... which, conveniently, expects input in the form of a list of key-value tuples. So you could do this...
import qualified Data.Map as M
nameMap = M.fromList [("Mary", 10), ("John", 45), ("Bradley", 30), ("Mary", 15), ("John", 10)]
...but that's no good, because inserting them directly will overwrite the numbers instead of adding them. You can use M.fromListWith to specify how to combine values when inserting a duplicate key--in the general case, it's common to use this to build a list of values for each key, or similar things.
But in your case we can skip straight to the desired result:
nameMap = M.fromListWith (+) [("Mary", 10), ("John", 45), ("Bradley", 30), ("Mary", 15), ("John", 10)]
This will insert directly if it finds a new name, otherwise it will add the values (the numbers) on a duplicate. You can turn it back into a list of tuples if you like, using M.toList:
namesList = M.toList $ M.fromListWith (+) [("Mary", 10), ("John", 45), ("Bradley", 30), ("Mary", 15), ("John", 10)]
Which gives us a final result of [("Bradley",30),("John",55),("Mary",25)].
But if you want to do more stuff with the collection of names/numbers, it might make more sense to keep it as a Map until you're done.
Here's another way using lists:
import Data.List
answer :: [(String, Int)] -> [(String, Int)]
answer = map (foo . unzip) . groupBy (\x y -> fst x == fst y) . sort
where foo (names, vals) = (head names, sum vals)
It's a fairly straightforward approach.
First, the dot (.) represents function composition which allows us to pass values from one function to the next, that is, the output of one becomes the input of the next, and so on. We start by applying sort which will automatically move the names next to one another in the list. Next we use groupBy to put each pair with similar names into a single list. We end up with a list of lists, each containing pairs with similar names:
[[("Bradley",30)], [("John",10),("John",45)], [("Mary",10),("Mary", 15)]]
Given such a list, how would you handle each sublist?
That is, how would you handle a list containing all the same names?
Obviously we wish to shrink them down into a single pair, which contains the name and the sum of the values. To accomplish this, I chose the function (foo . unzip), but there are many other ways to go about it. unzip takes a list of pairs and creates a single pair. The pair contains 2 lists, the first with all the names, the second with all the values. This pair is then passed to foo by way of function composition, as discussed earlier. foo picks it apart using a pattern, and then applies head to the names, returning only a single name (they're all the same), and applying sum to the list of values. sum is another standard list function that sums the values in a list, naturally.
However, this (foo . unzip) only applies to a single list of pairs, yet we have a list of lists. This is where map comes in. map will apply our (foo . unzip) function to each list in the list, or more generally, each element in the list. We end up with a list containing the results of applying (foo . unzip) to each sublist.
I would recommend looking at all the list functions used in Data.List.
I think the reason your potential solution did not work, is that it will only group elements together if they occur sequentially with the same key in the list. So instead, I'm going to use a map (often called a dictionary if you've used other languages) to remember which keys we've seen and keep the totals. First we need to import the functions we need.
import Data.Map hiding (foldl, foldl', foldr)
import Data.List (foldl')
Now we can just fold along the list, and for each key value pair update our map accordingly.
sumGroups :: (Ord k, Num n) => [(k, n)] -> Map k n
sumGroups list = foldl' (\m (k, n) -> alter (Just . maybe n (+ n)) k m) empty list
So, foldl' walks along the list with a function. It calls the function with each element (here the pair (k, n)), and another argument, the accumulator. This is our map, which starts out as empty. For each element, we alter the map, using a function from Maybe n -> Maybe n. This reflects the fact the map may not already have anything in it under the key k - so we deal with both cases. If there's no previous value, we just return n, otherwise we add n to the previous value. This gives us a map at the end which should contain the sums of the groups. Calling the toList function on the result should give you the list you want.
Testing this in ghci gives:
$ ghci
GHCi, version 7.6.1: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> import Data.Map hiding (foldl, foldl', foldr)
Prelude Data.Map> import Data.List (foldl')
Prelude Data.Map Data.List> let sumGroups list = foldl' (\m (k, n) -> alter (Just . maybe n (+ n)) k m) empty list
Loading package array-0.4.0.1 ... linking ... done.
Loading package deepseq-1.3.0.1 ... linking ... done.
Loading package containers-0.5.0.0 ... linking ... done.
Prelude Data.Map Data.List> toList $ sumGroups $ [("Mary", 10), ("John", 45), ("Bradley", 30), ("Mary", 15), ("John", 10)]
[("Bradley",30),("John",55),("Mary",25)]
Prelude Data.Map Data.List>
The groups come out in sorted order as a bonus, because internally map uses a form of binary tree, and so it's relatively trivial to traverse in order and output a sorted (well, sorted by key anyway) list.
Here are my two cents. Using just the Haskell Prelude.
test tup = sumAll
where
collect ys [] = ys
collect ys (x:xs) =
if (fst x) `notElem` ys
then collect (fst x : ys) xs
else collect ys xs
collectAllNames = collect [] tup
sumOne [] n x = (x, n)
sumOne (y:ys) n x =
if fst y == x
then sumOne ys (n + snd y) x
else sumOne ys n x
sumAll = map (sumOne tup 0) collectAllNames
This method traverses the original list several times.
Collect builds a temporary list holding just the names, skipping name repetitions.
sumOne takes a name, checks what names in the list matches, and adds their numbers. It returns the name as well as the sum.

What is wrong with this list comprehension code?

My aim is to list all elements of the array a whose values are greater than their index positions. I wrote a Haskell code like this.
[a|i<-[0..2],a<-[1..3],a!!i>i]
When tested on ghci prelude prompt, I get the following error message which I am unable to understand.
No instance for (Num [a]) arising from the literal 3 at <interactive>:1:20 Possible fix: add an instance declaration for (Num [a])
Given the expression a!!i, Haskell will infer that a is a list (i.e. a::[a]). Given the expression a<-[1..3], Haskell will infer that a will have type Num a => a (because you are drawing a from a list of Num a => a values). Trying to unify these types, Haskell concludes that a must actually be of type Num a => [a].
The bottom line is that it doesn't make sense to treat a as a list in one context and as an element from a list of numbers in another context.
EDIT
I'm thinking you could do what you want with something like this:
f xs = map fst . filter (uncurry (>)) $ (xs `zip` [0..])
The expression xs `zip` [0..] creates a list of pairs, where the first value in each pair is drawn from xs and the second value from [0..] (an infinite list starting from 0). This serves to associate an index to each value in xs. The expression uncurry (>) converts the < operator into a function that works on pairs. So the expression filter (uncurry (>)) filters a list of pairs to only those elements where the first value is greater than the second. Finally, map fst applies the fst function to each pair of values and returns the result as a list (the fst function returns the first value of a pair).
EDIT 2
Writing pointless code is fun, and so I give you:
f = map snd . filter (uncurry (<)) . zip [0..]
import Data.Maybe
import Control.Monad
f = catMaybes . zipWith (mfilter.(<)) [0..] . map Just
Disclaimer: The given code was not proof read and may have been made outside of sobriety. The author has little recollection of what it is about.

Haskell replace element in list

Is there any built-in function to replace an element at a given index in haskell?
Example:
replaceAtIndex(2,"foo",["bar","bar","bar"])
Should give:
["bar", "bar", "foo"]
I know i could make my own function, but it just seems it should be built-in.
If you need to update elements at a specific index, lists aren't the most efficient data structure for that. You might want to consider using Seq from Data.Sequence instead, in which case the function you're looking for is update :: Int -> a -> Seq a -> Seq a.
> import Data.Sequence
> update 2 "foo" $ fromList ["bar", "bar", "bar"]
fromList ["bar","bar","foo"]
As far as I know (and can find) it does not exist by default. However, there exists splitAt in Data.List so:
replaceAtIndex n item ls = a ++ (item:b) where (a, (_:b)) = splitAt n ls
This is O(N) though. If you find yourself doing this a lot, look at another datatype such as array.
There is actual arrays, but lists are really singly linked lists and the notion of replacing an element is not quite as obvious (and accessing an element at a given index may indicate that you shouldn't be using a list, so operations that might encourage it are avoided).
Try this solution:
import Data.List
replaceAtIndex :: Int -> a -> [a] -> [a]
replaceAtIndex i x xs = take i xs ++ [x] ++ drop (i+1) xs
It works as follows:
get the first i items, add the value 'x', add the rest of i+1 items

Resources