mapping over entire data set to get results - haskell

Suppose I have the arrays:
A = "ABACUS"
B = "YELLOW"
And they are zipped so: Pairing = zip A B
I also have a function Connect :: Char -> [(Char,Char)] -> [(Char,Char,Int)]
What I want to do is given a char such as A, find the indices of where it is present in the first string and return the character in the same positions in the second string, as well as the position e.g. if I did Connect 'A' Pairing I'd want (A,Y,0) and (A,L,2) as results.
I know I can do
pos = x!!map fst pairing
to retrieve the positions. And fnd = findIndices (==pos) map snd pairing to get what's in this position in the second string but in Haskell how would I do this over the whole set of data (as if I were using a for loop) and how would I get my outputs?

To do exactly as you asked (but correct the initial letter of function names to be lowercase), I could define
connect :: Char -> [(Char,Char)] -> [(Char,Char,Int)]
connect c pairs = [(a,b,n)|((a,b),n) <- zip pairs [0..], a == c]
so if
pairing = zip "ABACUS" "YELLOW"
we get
ghci> connect 'A' pairing
[('A','Y',0),('A','L',2)]
However, I think it'd be neater to zip once, not twice, using zip3:
connect3 :: Char -> String -> String -> [(Char,Char,Int)]
connect3 c xs ys = filter (\(a,_,_) -> a==c) (zip3 xs ys [0..])
which is equivalent to
connect3' c xs ys = [(a,b,n)| (a,b,n) <- zip3 xs ys [0..], a==c]
they all work as you wanted:
ghci> connect3 'A' "ABACUS" "YELLOW"
[('A','Y',0),('A','L',2)]
ghci> connect3' 'A' "ABACUS" "AQUAMARINE"
[('A','A',0),('A','U',2)]
In comments, you said you'd like to get pairs for matches the other way round.
This time, it'd be most convenient to use the monadic do notation, since lists are an example of a monad.
connectEither :: (Char,Char) -> String -> String -> [(Char,Char,Int)]
connectEither (c1,c2) xs ys = do
(a,b,n) <- zip3 xs ys [0..]
if a == c1 then return (a,b,n) else
if b == c2 then return (b,a,n) else
fail "Doesn't match - leave it out"
I've used the fail function to leave out ones that don't match. The three lines starting if, if and fail are increasingly indented because they're actually one line from Haskell's point of view.
ghci> connectEither ('a','n') "abacus" "banana"
[('a','b',0),('a','n',2),('n','u',4)]
In this case, it hasn't included ('n','a',2) because it's only checking one way.
We can allow both ways by reusing existing functions:
connectBoth :: (Char,Char) -> String -> String -> [(Char,Char,Int)]
connectBoth (c1,c2) xs ys = lefts ++ rights where
lefts = connect3 c1 xs ys
rights = connect3 c2 ys xs
which gives us everything we want to get:
ghci> connectBoth ('a','n') "abacus" "banana"
[('a','b',0),('a','n',2),('n','a',2),('n','u',4)]
but unfortunately things more than once:
ghci> connectBoth ('A','A') "Austria" "Antwerp"
[('A','A',0),('A','A',0)]
So we can get rid of that using nub from Data.List. (Add import Data.List at the top of your file.)
connectBothOnce (c1,c2) xs ys = nub $ connectBoth (c1,c2) xs ys
giving
ghci> connectBothOnce ('A','A') "ABACUS" "Antwerp"
[('A','A',0),('A','t',2)]

I would recommend not zipping the lists together, since that'd just make it more difficult to use the function elemIndices from Data.List. You then have a list of the indices that you can use directly to get the values out of the second list.

You can add indices with another zip, then filter on the given character and convert tuples to triples. Especially because of this repackaging, a list comprehension seems appropriate:
connect c pairs = [(a, b, idx) | ((a, b), idx) <- zip pairs [0..], a == c]

Related

Take symmetrical pairs, of different numbers from list

I have a list like this:
[(2,3),(2,5),(2,7),(3,2),(3,4),(3,6),(4,3),(4,5),(4,7),(5,2),(5,4),(5,6),(6,3),(6,5),(6,7),(7,2),(7,4),(7,6)]
The digits are from [2..7]. I want to take a set where there are any symmetrical pairs. e.g. [(1,2),(2,1)], but those two numbers aren't used again in the set. An example would be:
[(3,6),(6,3),(2,5),(5,2),(4,7),(7,4)]
I wanted to first put symmetric pairs together as I thought it might be easier to work with so i created this function, which actually creates the pairs and puts them in another list
g xs = [ (y,x):(x,y):[] | (x,y) <- xs ]
with which the list turns to this:
[[(3,2),(2,3)],[(5,2),(2,5)],[(7,2),(2,7)],[(2,3),(3,2)],[(4,3),(3,4)],[(6,3),(3,6)],[(3,4),(4,3)],[(5,4),(4,5)],[(7,4),(4,7)],[(2,5),(5,2)],[(4,5),(5,4)],[(6,5),(5,6)],[(3,6),(6,3)],[(5,6),(6,5)],[(7,6),(6,7)],[(2,7),(7,2)],[(4,7),(7,4)],[(6,7),(7,6)]]
Then from here I was hoping to somehow remove duplicates.
I made a function that will look at all of the fst elements of all of the pairs:
flatList xss = [ x | xs <- xss, (x,y) <- xs ]
to use with another function to remove the duplicates.
h (x:xs) | (fst (head x)) `elem` (flatList xs) = h xs
| otherwise = (head x):(last x):(h xs)
which gives me the list
[(3,6),(6,3),(5,6),(6,5),(2,7),(7,2),(4,7),(7,4),(6,7),(7,6)]
which has duplicate numbers. That function only takes into account the first element of the first pair in the list of lists,the problem is when I also take into account the first element of the second pair (or the second element of the first pair):
h (x:xs) | (fst (head x)) `elem` (flatList xs) || (fst (last x)) `elem` (flatList xs) = h xs
| otherwise = (head x):(last x):(h xs)
I only get these two pairs:
[(6,7),(7,6)]
I see that the problem is that this method of deleting duplicates grabs the last repeated element, and would work with a list of digits, but not a list of pairs, as it misses pairs it needs to take.
Is there another way to solve this, or an alteration I could make?
It probably makes more sense to use a 2-tuple of 2-tuples in your list comprehension, since that makes it more easy to do pattern matching, and thus "by contract" enforces the fact that there are two items. We thus can construct 2-tuples that contain the 2-tuples with:
g :: Eq a => [(a, a)] -> [((a, a), (a, a))]
g xs = [ (t, s) | (t#(x,y):ts) <- tails xs, let s = (y, x), elem s ts ]
Here the elem s ts checks if the "swapped" 2-tuple occurs in the rest of the list.
Then we still need to filter the elements. We can make use of a function that uses an accumulator for the thus far obtained items:
h :: Eq a => [((a, a), (a, a))] -> [(a, a)]
h = go []
where go _ [] = []
go seen ((t#(x, y), s):xs)
| notElem x seen && notElem y seen = t : s : go (x:y:seen) xs
| otherwise = go seen xs
For the given sample input, we thus get:
Prelude Data.List> (h . g) [(2,3),(2,5),(2,7),(3,2),(3,4),(3,6),(4,3),(4,5),(4,7),(5,2),(5,4),(5,6),(6,3),(6,5),(6,7),(7,2),(7,4),(7,6)]
[(2,3),(3,2),(4,5),(5,4),(6,7),(7,6)]
after reading a few times your question, I got an elegant solution to your problem. Thinking that if you have a list of pairs without any repeated number, you can get the list of swapped pairs easily, solving your problem. So your problem can be reduce to given a list, get the list of all pairs using each number just one.
For a given list, there are many solutions to this, ex: for [1,2,3,4] valid solutions are: [(2,4),(4,2),(1,3),(3,1)] and [(2,3),(3,2),(1,4),(4,1)], etc... The approach here is:
take a permutation if the original list (say [1,4,3,2])
pick one element for each half and pair them together (for simplicity, you can pick consecutive elements too)
for each pair, create a the swapped pair and put all together
By doing so you end up with a list of non repeating numbers of pairs and its symmetric. More over, looping around all permutaitons, you can get all the solutions to your problem.
import Data.List (permutations, splitAt)
import Data.Tuple (swap)
-- This function splits a list by the half of the length
splitHalf :: [a] -> ([a], [a])
splitHalf xs = splitAt (length xs `quot` 2) xs
-- This zip a pair of list into a list of pairs
zipHalfs :: ([a], [a]) -> [(a,a)]
zipHalfs (xs, ys) = zip xs ys
-- Given a list of tuples, creates a larger list with all tuples and all swapped tuples
makeSymetrics :: [(a,a)] -> [(a,a)]
makeSymetrics xs = foldr (\t l -> t:(swap t):l) [] xs
-- This chain all of the above.
-- Take all permutations of xs >>> for each permutations >>> split it in two >>> zip the result >>> make swapped pairs
getPairs :: [a] -> [[(a,a)]]
getPairs xs = map (makeSymetrics . zipHalfs . splitHalf) $ permutations xs
>>> getPairs [1,2,3,4]
[[(1,3),(3,1),(2,4),(4,2)],[(2,3),(3,2),(1,4),(4,1)] ....

Haskell - Exclude lists based on a test in a nested list comprehension

I want to create a series of possible equations based on a general specification:
test = ["12", "34=", "56=", "78"]
Each string (e.g. "12") represents a possible character at that location, in this case '1' or '2'.)
So possible equations from test would be "13=7" or "1=68".
I know the examples I give are not balanced but that's because I'm deliberately giving a simplified short string.
(I also know that I could use 'sequence' to search all possibilities but I want to be more intelligent so I need a different approach explained below.)
What I want is to try fixing each of the equals in turn and then removing all other equals in the equation. So I want:
[["12","=","56","78"],["12","34","=","78”]]
I've written this nested list comprehension:
(it needs: {-# LANGUAGE ParallelListComp #-} )
fixEquals :: [String] -> [[String]]
fixEquals re
= [
[
if index == outerIndex then equals else remain
| equals <- map (filter (== '=')) re
| remain <- map (filter (/= '=')) re
| index <- [1..]
]
| outerIndex <- [1..length re]
]
This produces:
[["","34","56","78"],["12","=","56","78"],["12","34","=","78"],["12","34","56","”]]
but I want to filter out any with empty lists within them. i.e. in this case, the first and last.
I can do:
countOfEmpty :: (Eq a) => [[a]] -> Int
countOfEmpty = length . filter (== [])
fixEqualsFiltered :: [String] -> [[String]]
fixEqualsFiltered re = filter (\x -> countOfEmpty x == 0) (fixEquals re)
so that "fixEqualsFiltered test" gives:
[["12","=","56","78"],["12","34","=","78”]]
which is what I want but it doesn’t seem elegant.
I can’t help thinking there’s another way to filter these out.
After all, it’s whenever "equals" is used in the if statement and is empty that we want to drop the equals so it seems a waste to build the list (e.g. ["","34","56","78”] and then ditch it.)
Any thoughts appreciated.
I don't know if this is any cleaner than your code, but it might be a bit more clear and maybe more efficient using a recursion:
fixEquals = init . f
f :: [String] -> [[String]]
f [] = [[]]
f (x:xs) | '=' `elem` x = ("=":removeEq xs) : map (removeEq [x] ++) (f xs)
| otherwise = map (x:) (f xs)
removeEq :: [String] -> [String]
removeEq = map (filter (/= '='))
The way it works is that, if there's an '=' in the current string, then it splits the return into two, if not just calls recursively. The init is needed as in the last element returned there's no equal in any string.
Finally, I believe you can probably find a better data structure to do what you need to achieve instead of using list of strings
Let
xs = [["","34","56","78"],["12","=","56","78"],["12","34","=","78"],["12","34","56",""]]
in
filter (not . any null) xs
will give
[["12","=","56","78"],["12","34","=","78"]]
If you want list comprehension then do
[x | x <- xs, and [not $ null y | y <- x]]
I think I'd probably do it this way. First, a preliminary that I've written so many times it's practically burned into my fingers by now:
zippers :: [a] -> [([a], a, [a])]
zippers = go [] where
go _ [] = []
go b (h:e) = (b,h,e):go (h:b) e
Probably running it once or twice in ghci will be a more clear explanation of what this does than any English writing I could do:
> zippers "abcd"
[("",'a',"bcd"),("a",'b',"cd"),("ba",'c',"d"),("cba",'d',"")]
In other words, it gives a way of selecting each element of a list in turn, giving the "leftovers" of what was before and after the selection point. Given that tool, here's our plan: we'll nondeterministically choose a String to serve as our equals sign, double-check that we've got an equals sign in the first place, and then clear out the equals from the others. So:
fixEquals ss = do
(prefix, s, suffix) <- zippers ss
guard ('=' `elem` s)
return (reverse (deleteEquals prefix) ++ ["="] ++ deleteEquals suffix)
deleteEquals = map (filter ('='/=))
Let's try it:
> fixEquals ["12", "34=", "56=", "78"]
[["12","=","56","78"],["12","34","=","78"]]
Perfect! But this is just a stepping-stone to actually generating the equations, right? It turns out to be not that hard to go all the way in one step, skipping this intermediate. Let's do that:
equations ss = do
(prefixes, s, suffixes) <- zippers ss
guard ('=' `elem` s)
prefix <- mapM (filter ('='/=)) (reverse prefixes)
suffix <- mapM (filter ('='/=)) suffixes
return (prefix ++ "=" ++ suffix)
And we can try it in ghci:
> equations ["12", "34=", "56=", "78"]
["1=57","1=58","1=67","1=68","2=57","2=58","2=67","2=68","13=7","13=8","14=7","14=8","23=7","23=8","24=7","24=8"]
The easiest waty to achieve what you want is to create all the combinations and to filter the ones that have a meaning:
Prelude> test = ["12", "34=", "56=", "78"]
Prelude> sequence test
["1357","1358","1367","1368","13=7","13=8","1457","1458","1467","1468","14=7","14=8","1=57","1=58","1=67","1=68","1==7","1==8","2357","2358","2367","2368","23=7","23=8","2457","2458","2467","2468","24=7","24=8"
Prelude> filter ((1==).length.filter('='==)) $ sequence test
["13=7","13=8","14=7","14=8","1=57","1=58","1=67","1=68","23=7","23=8","24=7","24=8","2=57","2=58","2=67","2=68"]
You pointed the drawback: imagine we have the followig list of strings: ["=", "=", "0123456789", "0123456789"]. We will generate 100 combinations and drop them all.
You can look at the combinations as a tree. For the ["12", "34"], you have:
/ \
1 2
/ \ / \
3 4 3 4
You can prune the tree: just ignore the subtrees when you have two = on the path.
Let's try to do it. First, a simple combinations function:
Prelude> :set +m
Prelude> let combinations :: [String] -> [String]
Prelude| combinations [] = [""]
Prelude| combinations (cs:ts) = [c:t | c<-cs, t<-combinations ts]
Prelude|
Prelude> combinations test
["1357","1358","1367","1368","13=7","13=8","1457","1458","1467","1468","14=7","14=8","1=57","1=58","1=67","1=68","1==7","1==8","2357","2358","2367","2368","23=7","23=8","2457","2458","2467","2468","24=7","24=8", ...]
Second, we need a variable to store the current number of = signs met:
if we find a second = sign, just drop the subtree
if we reach the end of a combination with no =, drop the combination
That is:
Prelude> let combinations' :: [String] -> Int -> [String]
Prelude| combinations' [] n= if n==1 then [""] else []
Prelude| combinations' (cs:ts) n = [c:t | c<-cs, let p = n+(fromEnum $ c=='='), p <= 1, t<-combinations' ts p]
Prelude|
Prelude> combinations' test 0
["13=7","13=8","14=7","14=8","1=57","1=58","1=67","1=68","23=7","23=8","24=7","24=8","2=57","2=58","2=67","2=68"]
We use p as the new number of = sign on the path: if p>1, drop the subtree.
If n is zero, we don't have any = sign in the path, drop the combination.
You may use the variable n to store more information, eg type of the last char (to avoid +* sequences).

How do I split a list into sublists at certain points?

How do I manually split [1,2,4,5,6,7] into [[1],[2],[3],[4],[5],[6],[7]]? Manually means without using break.
Then, how do I split a list into sublists according to a predicate? Like so
f even [[1],[2],[3],[4],[5],[6],[7]] == [[1],[2,3],[4,5],[6,7]]
PS: this is not homework, and I've tried for hours to figure it out on my own.
To answer your first question, this is rather an element-wise transformation than a split. The appropriate function to do this is
map :: (a -> b) -> [a] -> [b]
Now, you need a function (a -> b) where b is [a], as you want to transform an element into a singleton list containing the same type. Here it is:
mkList :: a -> [a]
mkList a = [a]
so
map mkList [1,2,3,4,5,6,7] == [[1],[2],...]
As for your second question: If you are not allowed (homework?) to use break, are you then allowed to use takeWhile and dropWhile which form both halves of the result of break.
Anyway, for a solution without them ("manually"), just use simple recursion with an accumulator:
f p [] = []
f p (x:xs) = go [x] xs
where go acc [] = [acc]
go acc (y:ys) | p y = acc : go [y] ys
| otherwise = go (acc++[y]) ys
This will traverse your entire list tail recursively, always remembering what the current sublist is, and when you reach an element where p applies, outputting the current sublist and starting a new one.
Note that go first receives [x] instead of [] to provide for the case where the first element already satisfies p x and we don't want an empty first sublist to be output.
Also, this operates on the original list ([1..7]) instead of [[1],[2]...]. But you can use it on the transformed one as well:
> map concat $ f (odd . head) [[1],[2],[3],[4],[5],[6],[7]]
[[1,2],[3,4],[5,6],[7]]
For the first, you can use a list comprehension:
>>> [[x] | x <- [1,2,3,4,5,6]]
[[1], [2], [3], [4], [5], [6]]
For the second problem, you can use the Data.List.Split module provided by the split package:
import Data.List.Split
f :: (a -> Bool) -> [[a]] -> [[a]]
f predicate = split (keepDelimsL $ whenElt predicate) . concat
This first concats the list, because the functions from split work on lists and not list of lists. The resulting single list is the split again using functions from the split package.
First:
map (: [])
Second:
f p xs =
let rs = foldr (\[x] ~(a:r) -> if (p x) then ([]:(x:a):r) else ((x:a):r))
[[]] xs
in case rs of ([]:r) -> r ; _ -> rs
foldr's operation is easy enough to visualize:
foldr g z [a,b,c, ...,x] = g a (g b (g c (.... (g x z) ....)))
So when writing the combining function, it is expecting two arguments, 1st of which is "current element" of a list, and 2nd is "result of processing the rest". Here,
g [x] ~(a:r) | p x = ([]:(x:a):r)
| otherwise = ((x:a):r)
So visualizing it working from the right, it just adds into the most recent sublist, and opens up a new sublist if it must. But since lists are actually accessed from the left, we keep it lazy with the lazy pattern, ~(a:r). Now it works even on infinite lists:
Prelude> take 9 $ f odd $ map (:[]) [1..]
[[1,2],[3,4],[5,6],[7,8],[9,10],[11,12],[13,14],[15,16],[17,18]]
The pattern for the 1st argument reflects the peculiar structure of your expected input lists.

iterating through a list in haskell

I have a list of list of characters ::[[Char]].
I need to iterate both over the list of strings and also over each character in each string.
Say, my list is present in this variable.
let xs
Please suggest an easy way to iterate.
If you want to apply a function f to every element of a list like this:
[a, b, c, d] → [f a, f b, f c, f d]
then map f xs does the trick. map turns a function on elements to a function on lists. So, we can nest it to operate on lists of lists: if f transforms as into bs, map (map f) transforms [[a]]s into [[b]]s.
If you instead want to perform some IO action for every element of a list (which is more like traditional iteration), then you're probably looking for forM_:1
forM_ :: [a] -> (a -> IO b) -> IO ()
You give it a function, and it calls it with each element of the list in order. For instance, forM_ xs putStrLn is an IO action that will print out every string in xs on its own line. Here's an example of a more involved use of forM_:
main = do
...
forM_ xs $ \s -> do
putStrLn "Here's a string:"
forM_ s print
putStrLn "Now it's done."
If xs contains ["hello", "world"], then this will print out:
Here's a string:
'h'
'e'
'l'
'l'
'o'
Now it's done.
Here's a string:
'w'
'o'
'r'
'l'
'd'
Now it's done.
1 forM_ actually has a more general type, but the simpler version I've shown is more relevant here.
Just that:
[c | x <- xs, c <- x]
The "correct" way to iterate is actually fold. Anything you might ever want to do with a list can be done with a fold. Let's consider what you want to do. You're probably thinking of something like this:
for (row in xs):
for (c in row):
doSomething
The problem is, you're probably making use of mutable variables in doSomething. That's ok, we can deal with that. So suppose you have this.
def iter2d(xs):
outerVar = outerInit
for (row in xs):
innerVar = innerInit(row)
outerVar.adjust1(row)
for (c in row):
innerVar.adjust2(c)
outerVar.adjust3(c, innerVar)
return outerVar
Let's translate that to folds. And immutability.
iter2d :: [[Char]] -> Something
iter2d xs = foldl' outerStep outerInit xs
where outerInit = ... -- same as outerInit above
outerStep acc row = fst $ foldl' innerStep innerInit' row)
where innerInit' = ((adjust1 acc row), innerInit row)
innerInit row = ... -- same as innerInit above
innerStep (outAcc, inAcc) c = (outAcc', inAcc')
where inAcc' = adjust2 inAcc c
outAcc' = adjust3 outAcc c inAcc'
Notice with immutability, we are forced to indicate that outAc' depends on inAcc', rather than inAcc, meaning, the "state" of innerVar after it is updated.
Now you might say "wow that Haskell looks way ugly, why would I ever want to use Haskell". Yes, it does look ugly, but only because I tailored it to be a direct translation of imperative code. Once you get used to using folds instead of "iterating through a list", then you will find that folding is a very powerful technique that lets you do a lot of things in a more elegant way than for loops allow.
map (map f) l
where f :: Char -> Foo is a function to apply to each Char and l :: [[Char]]
returns l' :: [[Foo]]

Algorithm - How to delete duplicate elements in a Haskell list

I'm having a problem creating an function similar to the nub function.
I need this func to remove duplicated elements form a list.
An element is duplicated when 2 elements have the same email, and it should keep the newer one (is closer to the end of the list).
type Regist = [name,email,,...,date]
type ListRe = [Regist]
rmDup ListRe -> ListRe
rmDup [] = []
rmDup [a] = [a]
rmDup (h:t) | isDup h (head t) = rmDup t
| otherwise = h : rmDup t
isDup :: Regist -> Regist -> Bool
isDup (a:b:c:xs) (d:e:f:ts) = b==e
The problem is that the function doesn't delete duplicated elements unless they are together in the list.
Just use nubBy, and specify an equality function that compares things the way you want.
And I guess reverse the list a couple of times if you want to keep the last element instead of the first.
Slightly doctored version of your original code to make it run:
type Regist = [String]
type ListRe = [Regist]
rmDup :: ListRe -> ListRe
rmDup [] = []
rmDup (x:xs) = x : rmDup (filter (\y -> not(x == y)) xs)
Result:
*Main> rmDup [["a", "b"], ["a", "d"], ["a", "b"]]
[["a","b"],["a","d"]]
Anon is correct: nubBy is the function you are looking for, and can be found in Data.List.
That said, you want a function rem which accepts a list xs and a function f :: a -> a -> Bool (on which elements are compared for removal from xs). Since the definition is recursive, you need a base case and a recursive case.
In the base case xs = [] and rem f xs = [], since the result of removing all duplicate elements from [] is []:
rem :: Eq a => (a -> a -> Bool) -> [a] -> [a]
rem f [] = []
In the recursive case, xs = (a:as). Let as' be the list obtained by removing all elements a' such that f a a' = True from the list as. This is simply the function filter (\a' -> not $ f a a') applied to the list as. Them rem f (a:as) is the result of recursively calling rem f on as', that is, a : rem f as':
rem f (a:as) = a : rem f $ filter (\a' -> not $ f a a') as
Replace f be a function comparing your list elements for the appropriate equality (e-mail addresses).
While nubBy with two reverse's is probably the best among simple solutions (and probably exactly what Justin needs for his task), one should not forget that it isn't the ideal solution in terms of efficiency - after all nubBy is O(n^2) (in the "worst case" - when there are no duplicates). Two reverse's will also take their toll (in the form of memory allocation).
For more efficient implementation Data.Map (O(logN) on inserts) can be used as an intermediate "latest non duplicating element" holder (Set.insert replaces older element with newer if there is a collision):
import Data.List
import Data.Function
import qualified Data.Set as S
newtype Regis i e = Regis { toTuple :: (i,[e]) }
selector (Regis (_,(_:a:_))) = a
instance Eq e => Eq (Regis i e) where
(==) = (==) `on` selector
instance Ord e => Ord (Regis i e) where
compare = compare `on` selector
rmSet xs = map snd . sortBy (compare `on` fst) . map toTuple . S.toList $ set
where
set = foldl' (flip (S.insert . Regis)) S.empty (zip [1..] xs)
While nubBy implementation is definitely much simpler:
rmNub xs = reverse . nubBy ((==) `on` (!!1)) . reverse $ xs
on 10M elements list (with lots of duplication - nub should play nice here) there is 3 times difference in terms of running time and 700 times difference in memory usage. Compiled with GHC with -O2 :
input = take 10000000 $ map (take 10) $ permutations [1..]
test1 = rmNub input
test2 = rmSet input
Not sure about the nature of the author's data though (the real data might change the picture).
(Assuming you want to figure out an answer, not just call a library function that does this job for you.)
You get what you ask for. What if h is not equal to head t but is instead equal to the 3rd element of t? You need to write an algorithm that compares h with every element of t, not just the first element.
Why not putting everything in a Map from email to Regist (of course respecting your "keep the newest" rule), and then transform the values of the map back in the list? That's the most efficient way I can think of.
I used Alexei Polkhanov's answer and came to the following, so you can remove duplicates from lists with a type that extends Eq class.
removeDuplicates :: Eq a => [[a]] -> [[a]]
removeDuplicates [] = []
removeDuplicates (x:xs) = x : removeDuplicates (filter (\y -> not (x == y)) xs)
Examples:
*Verdieping> removeDuplicates [[1],[2],[1],[1,2],[1,2]]
[[1],[2],[1,2]]
*Verdieping> removeDuplicates [["a","b"],["a"],["a","b"],["c"],["c"]]
[["a","b"],["a"],["c"]]

Resources