Haskell: Transposition on a list of strings - haskell

New to Haskell and the language has been fun so far. I am hoping for a good hint rather than an answer as I am enjoying the mind-altering that is Haskell.
Question: I have a list of strings and I would like to transpose them.
let x = ["hello", "world"]
would become
["hw", "eo", "lr", "ll", "od"]
What I have so far is this:
transposeString :: [a] -> [a]
transposeString ([]:_) = []
transposeString x = (map head x) : transposeString (map tail x)
I definitely know there is something wrong with the type signature. My rational is that
Let y = ["wow", "top"]
map head y
returns "wt" so recursing this on the rest of the list would work?
Thank you in advance for any hints.

Mind that you do not have to provide a type signature: the Haskell compiler can derive one. If you put your implementation in a file:
transposeString ([]:_) = []
transposeString x = (map head x) : transposeString (map tail x)
and query the type with :t in ghci, it returns:
*Main> :t transposeString
transposeString :: [[b]] -> [[b]]
This makes perfect sense:
you transpose a matrix, which is a list of lists. [[b]] is a list of lists of b elements; and
you can derive it from the implementation yourself: map head x means that elements of x must be a list ([b]) since we perform a mapping, we have to nest the list one additional level so [[b]]. The same for tail.
As far as I know, your implementation is correctly. You can specialize it by saying that [b] ~ String, thus adding a type signature for Strings:
transposeString :: [String] -> [String]
transposeString ([]:_) = []
transposeString x = (map head x) : transposeString (map tail x)
which again makes sense because String ~ [Char] thus b ~ Char. But there is not much point in specializing a functions type: you better always use the most generic type signature. In this case [[b]] -> [[b]].

One point of note. Your type signature for tranposeString allows to accept a flat list as an argument. Tranposing a [String] works because [String]s are really just [[Char]]s, but what happens when you try to call transposeString on an [Int]? After all, the type signature allows for it.
On a side note, I ask you, given your current function, what would happen if your called transposeString []?

Do remember String is [Char]
"abc" == 'a' : 'b' : 'c' : []
From here you can use traversable nature of lists:
transpose :: [[a]] -> [[a]]
transpose mat = getZipList $ sequenceA $ map ZipList mat
test' = transpose [[1,2,3],[4,5,6],[7,8,9]] -- == [[1,4,7],[2,5,8],[3,6,9]]
test'' = transpose ["abc", "deg", "klm"] -- == ["adk","bel","cgm"]
You can also check default implementation of transponse in haskell doc
https://hackage.haskell.org/package/base-4.12.0.0/docs/src/Data.OldList.html#transpose
transpose :: [[a]] -> [[a]]
transpose [] = []
transpose ([] : xss) = transpose xss
transpose ((x:xs) : xss) = (x : [h | (h:_) <- xss]) : transpose (xs : [ t | (_:t) <- xss])

Related

How do I fix ‘Eq a’ has kind ‘GHC.Prim.Constraint’ error in haskell?

I am writing a small function in Haskell to check if a list is a palindrome by comparing it with it's reverse.
checkPalindrome :: [Eq a] -> Bool
checkPalindrome l = (l == reverse l)
where
reverse :: [a] -> [a]
reverse xs
| null xs = []
| otherwise = (last xs) : reverse newxs
where
before = (length xs) - 1
newxs = take before xs
I understand that I should use [Eq a] in the function definition because I use the equality operator later on, but I get this error when I compile:
Expected kind ‘*’, but ‘Eq a’ has kind ‘GHC.Prim.Constraint’
In the type signature for ‘checkPalindrome’:
checkPalindrome :: [Eq a] -> Bool
P.s Feel free to correct me if I am doing something wrong with my indentation, I'm very new to the language.
Unless Haskell adopted a new syntax, your type signature should be:
checkPalindrome :: Eq a => [a] -> Bool
Declare the constraint on the left hand side of a fat-arrow, then use it on the right hand side.
Unlike OO languages, Haskell makes a quite fundamental distinction between
Constraints – typeclasses like Eq.
Types – concrete types like Bool or lists of some type.
In OO languages, both of these would be represented by classes†, but a Haskell type class is completely different. You never have “values of class C”, only “types of class C”. (These concrete types may then contain values, but the classes don't.)
This distinction may seem pedantic, but it's actually very useful. What you wrote, [Eq a] -> Bool, would supposedly mean: each element of the list must be comparable... but comparable to what? You could have elements of different type in the list, how do you know that these elements are comparable to each other? In Haskell, that's no issue, because whenever the function is used you first settle on one type a. This type must be in the Eq class. The list then must have all elements from the same type a. This way you ensure that each element of the list is comparable to all of the others, not just, like, comparable to itself. Hence the signature
checkPalindrome :: Eq a => [a] -> Bool
This is the usual distinction on the syntax level: constraints must always‡ be written on the left of an => (implication arrow).
The constraints before the => are “implicit arguments”: you don't explicitly “pass Eq a to the function” when you call it, instead you just pass the stuff after the =>, i.e. in your example a list of some concrete type. The compiler will then look at the type and automatically look up its Eq typeclass instance (or raise a compile-time error if the type does not have such an instance). Hence,
GHCi, version 7.10.2: http://www.haskell.org/ghc/ :? for help
Prelude> let palin :: Eq a => [a] -> Bool; palin l = l==reverse l
Prelude> palin [1,2,3,2,1]
True
Prelude> palin [1,2,3,4,5]
False
Prelude> palin [sin, cos, tan]
<interactive>:5:1:
No instance for (Eq (a0 -> a0))
(maybe you haven't applied enough arguments to a function?)
arising from a use of ‘palin’
In the expression: palin [sin, cos, tan]
In an equation for ‘it’: it = palin [sin, cos, tan]
...because functions can't be equality-compared.
†Constraints may in OO also be interfaces / abstract base classes, which aren't “quite proper classes” but are still in many ways treated the same way as OO value-classes. Most modern OO languages now also support Haskell-style parametric polymorphism in addition to “element-wise”/covariant/existential polymorphism, but they require somewhat awkward extends trait-mechanisms because this was only implemented as an afterthought.
‡There are also functions which have “constraints in the arguments”, but that's a more advanced concept called rank-n polymorphism.
This is really an extended comment. Aside from your little type error, your function has another problem: it's extremely inefficient. The main problem is your definition of reverse.
reverse :: [a] -> [a]
reverse xs
| null xs = []
| otherwise = (last xs) : reverse newxs
where
before = (length xs) - 1
newxs = take before xs
last is O(n), where n is the length of the list. length is also O(n), where n is the length of the list. And take is O(k), where k is the length of the result. So your reverse will end up taking O(n^2) time. One fix is to just use the standard reverse function instead of writing your own. Another is to build up the result recursively, accumulating the result as you go:
reverse :: [a] -> [a]
reverse xs0 = go [] xs0
go acc [] = acc
go acc (x : xs) = go (x : acc) xs
This version is O(n).
There's another source of inefficiency in your implementation:
checkPalindrome l = (l == reverse l)
This isn't nearly as bad, but let's look at what it does. Suppose we have the string "abcdefedcba". Then we test whether "abcdefedcba" == "abcdefedcba". By the time we've checked half the list, we already know the answer. So we'd like to stop there! There are several ways to accomplish this. The simplest efficient one is probably to calculate the length of the list as part of the process of reversing it so we know how much we'll need to check:
reverseCount :: [a] -> (Int, [a])
reverseCount xs0 = go 0 [] xs0 where
go len acc [] = (len, acc)
go len acc (x : xs) = len `seq`
go (len + 1) (x : acc) xs
Don't worry about the len `seq` bit too much; that's just a bit of defensive programming to make sure laziness doesn't make things inefficient; it's probably not even necessary if optimizations are enabled. Now you can write a version of == that only looks at the first n elements of the lists:
eqTo :: Eq a => Int -> [a] -> [a] -> Bool
eqTo 0 _ _ = True
eqTo _ [] [] = True
eqTo n (x : xs) (y : ys) =
x == y && eqTo (n - 1) xs ys
eqTo _ _ _ = False
So now
isPalindrome xs = eqTo ((len + 1) `quot` 2) xs rev_xs
where
(len, rev_xs) = reverseCount xs
Here's another way, that's more efficient and arguably more elegant, but a bit tricky. We don't actually need to reverse the whole list; we only need to reverse half of it. This saves memory allocation. We can use a tortoise and hare trick:
splitReverse ::
[a] ->
( [a] -- the first half, reversed
, Maybe a -- the middle element
, [a] ) -- the second half, in order
splitReverse xs0 = go [] xs0 xs0 where
go front rear [] = (front, Nothing, rear)
go front (r : rs) [_] = (front, Just r, rs)
go front (r : rs) (_ : _ : xs) =
go (r : front) rs xs
Now
isPalindrome xs = front == rear
where
(front, _, rear) = splitReverse xs
Now for some numbers, using the test case
somePalindrome :: [Int]
somePalindrome = [1..10000] ++ [10000,9999..1]
Your original implementation takes 7.523s (2.316 mutator; 5.204 GC) and allocates 11 gigabytes to build the test list and check if it's a palindrome. My counting implementation takes less than 0.01s and allocates 2.3 megabytes. My tortoise and hare implementation takes less than 0.01s and allocates 1.7 megabytes.

Learning haskell: a recursive function for creating skip-bigrams

I'm working my way through the NLPWP Book, and I'm at the chapter that deals with recursive functions. A recursive function for computing bigrams looks like this:
bigram :: [a] -> [[a]]
bigram [] = []
bigram [_] = []
bigram xs = take 2 xs : bigram (tail xs)
And if I run it on the wordlist = ["colorless", "green", "ideas", "sleep", "furiously"] I get this:
bigram chomsky
[("colorless","green"),("green","ideas"),("ideas","sleep"),("sleep","furiously")]
The exercise says:
A skip-bigram is any pair of words in sentence order. Write a function skipBigrams that extracts skip-bigrams from a sentence as a list of binary tuples, using explicit recursion. Running your function on ["Colorless", "green", "ideas", "sleep", "furiously"] should give the following output:
Prelude> skipBigrams ["Colorless", "green", "ideas", "sleep", "furiously"]
[("Colorless","green"),("Colorless","ideas"),("Colorless","sleep"),("Colorless","furiously"),("green","ideas"),("green","sleep"),("green","furiously"),("ideas","sleep"),("ideas","furiously"),("sleep","furiously")]
Here is the definition I've tried:
skipBigram [] = []
skipBigram [_] = []
skipBigram (x:xs) = [(x, (head xs)), (x, skipBigram xs)]
But I'm getting the following error:
Occurs check: cannot construct the infinite type: t ~ [(t, t)]
Relevant bindings include
xs :: [t] (bound at :3:15)
x :: t (bound at :3:13)
skipBigram :: [t] -> [(t, t)] (bound at :1:1)
In the expression: interactive:IHaskell384.skipBigram xs
In the expression: (x, interactive:IHaskell384.skipBigram xs)
Which, new to Haskell as I am, I don't understand in the slightest. What is an infinite type? A relevant binding?
How should I define skipBigram to resolve this compile-time error?
you get this because your result is a list-of-pairs, where the second-part of the first item in that list is some element and the second-part of the second item in your result list is, whatever you are trying to give back (you use recursion here so it will have the same type) - so you say:
my result is a list-of-tuples, but part of those tuples is the result-type itself
that is what the error tells you
here are some details:
look at your last line
skipBigram (x:xs) = [(x, (head xs)), (x, skipBigram xs)]
you have a list of tuples on the right side so it's type will be like (based on the first element of the result list):
skipBigram :: [a] -> [(a,a)]
but in the second-item you have (x, skipBigram xs) meaning it will have the type (a, [(a,a)]) (remember the type of skipBigram xs is the above part).
and so - comparing the second parts of the tuples - you have a ~ [(a,a)] which produces your error because somehow the type a should be the same as [(a,a)] which you could expand in all eternity ;)
now to the algorithm itself:
It will not work like this - you somehow have to get all combinations and to do this you have to work with the items in the list.
Usually you either do this with list-comprehensions or with the do-notation of the list-monad.
To get going think about this:
f [] = [[]]
f (x:xs) =
let xss = f xs
in [ x:xs | xs <- xss ] ++ xss
test it and play with it in ghci - you will have to combine this with what you got somehow
(ok recursion.ninja ^^ spoiled your fun - I'll let this here anyway if you don't mind)
Try this definition:
skipBigram :: [a] -> [(a,a)]
skipBigram [] = [] -- nothing to do with an empty list
skipBigram (x:xs) = [(x,y) | y <- xs] ++ skipBigram xs
Your skipBigram function is generating all the "2-tuple left-to-right combinations" of words in the list. We can capture this concept with a simple list comprehension in the recursive definition. By recursively concatenating the simple list comprehensions, we gain the desired result list.
The infinite type error is complaining about your use of lists. Your function should have the type [a] -> [a] -> [(a, a)], but when GHC tries to infer your function's type, it gets that a = [a], an infinite type. Relevant bindings are just the types of other variables which may be causing the error.
However, even ignoring the type errors, your function will not do what you want at all. Firstly, your function will always return a list of length two, because you have explicitly constructed the list. Also, the result would include ("Colorless", "Colorless"), because (x, head xs) is the same here as (x, x).
Instead, try this solution
skipBigram :: [a] -> [(a, a)]
skipBigram [] = []
skipBigram (x:xs) = map (x,) xs ++ skipBigram xs
For this function to work, you will need to put the line
{-# LANGUAGE TupleSections #-}
at the beginning of your file.

How do I split a list into sublists at certain points?

How do I manually split [1,2,4,5,6,7] into [[1],[2],[3],[4],[5],[6],[7]]? Manually means without using break.
Then, how do I split a list into sublists according to a predicate? Like so
f even [[1],[2],[3],[4],[5],[6],[7]] == [[1],[2,3],[4,5],[6,7]]
PS: this is not homework, and I've tried for hours to figure it out on my own.
To answer your first question, this is rather an element-wise transformation than a split. The appropriate function to do this is
map :: (a -> b) -> [a] -> [b]
Now, you need a function (a -> b) where b is [a], as you want to transform an element into a singleton list containing the same type. Here it is:
mkList :: a -> [a]
mkList a = [a]
so
map mkList [1,2,3,4,5,6,7] == [[1],[2],...]
As for your second question: If you are not allowed (homework?) to use break, are you then allowed to use takeWhile and dropWhile which form both halves of the result of break.
Anyway, for a solution without them ("manually"), just use simple recursion with an accumulator:
f p [] = []
f p (x:xs) = go [x] xs
where go acc [] = [acc]
go acc (y:ys) | p y = acc : go [y] ys
| otherwise = go (acc++[y]) ys
This will traverse your entire list tail recursively, always remembering what the current sublist is, and when you reach an element where p applies, outputting the current sublist and starting a new one.
Note that go first receives [x] instead of [] to provide for the case where the first element already satisfies p x and we don't want an empty first sublist to be output.
Also, this operates on the original list ([1..7]) instead of [[1],[2]...]. But you can use it on the transformed one as well:
> map concat $ f (odd . head) [[1],[2],[3],[4],[5],[6],[7]]
[[1,2],[3,4],[5,6],[7]]
For the first, you can use a list comprehension:
>>> [[x] | x <- [1,2,3,4,5,6]]
[[1], [2], [3], [4], [5], [6]]
For the second problem, you can use the Data.List.Split module provided by the split package:
import Data.List.Split
f :: (a -> Bool) -> [[a]] -> [[a]]
f predicate = split (keepDelimsL $ whenElt predicate) . concat
This first concats the list, because the functions from split work on lists and not list of lists. The resulting single list is the split again using functions from the split package.
First:
map (: [])
Second:
f p xs =
let rs = foldr (\[x] ~(a:r) -> if (p x) then ([]:(x:a):r) else ((x:a):r))
[[]] xs
in case rs of ([]:r) -> r ; _ -> rs
foldr's operation is easy enough to visualize:
foldr g z [a,b,c, ...,x] = g a (g b (g c (.... (g x z) ....)))
So when writing the combining function, it is expecting two arguments, 1st of which is "current element" of a list, and 2nd is "result of processing the rest". Here,
g [x] ~(a:r) | p x = ([]:(x:a):r)
| otherwise = ((x:a):r)
So visualizing it working from the right, it just adds into the most recent sublist, and opens up a new sublist if it must. But since lists are actually accessed from the left, we keep it lazy with the lazy pattern, ~(a:r). Now it works even on infinite lists:
Prelude> take 9 $ f odd $ map (:[]) [1..]
[[1,2],[3,4],[5,6],[7,8],[9,10],[11,12],[13,14],[15,16],[17,18]]
The pattern for the 1st argument reflects the peculiar structure of your expected input lists.

Haskell Split List Function Infinite Type Error

I am working on a function in Haskell that will take one list and divide it into two evenly sized lists. Here is what I have:
split (x:y:xs) = split2 ([((length(x:y:xs) `div` 2)-2) : x ++ y] : [xs])
split2 (x:xs:[y:ys]) = split2 ((x-1) : [xs] ++ y : [ys])
split2 (0:xs:[y:ys]) = (xs:[y:ys])
The function takes the first two elements of a list, and puts them together into list #2 and appends the first list as a second element. It then gets the length of the list, and divides it by two to find out how many times to run taking into account the fact that it already removed two elements from the first list. It then takes those two pieces of information and puts it into split2, which takes another element from the first list and appends it to the second list in the first element, also it counts down 1 from the number of runs and then runs again.
Problem is, when I run it I get this:
Functions.hs:19:49:
Occurs check: cannot construct the infinite type: t0 = [t0]
In the first argument of `(:)', namely `(y)'
19 refers to line 2, the first split2 function. Not exactly sure how to go about fixing this error. Any ideas?
It's hard to know where to start...
Let's define functions from ever larger chunks of the expression in split2.
f1 (x:y:xs) = (length(x:y:xs) `div` 2)-2
f1 :: [a] -> Int
Ok, so the argument is a list of something, and it returns an Int
f2 (x:y:xs) = ((length(x:y:xs) `div` 2)-2) : x
f2 :: [[Int]] -> [Int]
Here, the length Int is being cons'd with x, so x must be [Int], so (x:y:xs) must be [[Int]]. We can also infer that y has the same type as x, and xs is a list of things of the same type; [[Int]]. So the x ++ y will also be [Int].
So, [xs] will have type [[[Int]]]. Now, we wrap the result in a list constructor, and cons it with [xs]:
f3 (x:y:xs) = [((length(x:y:xs) `div` 2)-2) : x ++ y] : [xs]
f3 :: [[Int]] -> [[[Int]]]
I'm guessing you didn't expect the argument to be a list of lists of lists of Ints.
Now, if we look at split2, the argument pattern (x:xs:[y:ys]) implies that it is of type:
split2 :: [[a]] -> b
x :: [a]
xs :: [a]
y :: a
ys :: [a]
The rhs of the first definition of split2 tries to construct a new list by concatenating (x-1) : [xs] and y : [ys]. However, if we substitute the types into the y : [ys], we find:
y : [ys] :: a : [[a]]
But since (:) :: a -> [a] -> [a], this means that [[a]] must be the same type as [a], or a must be a list of itself, which is not possible.
The (x-1) is also badly typed, because it attempts to subtract one from a list.
I can't tell if you want to split the lists into even and odd elements, or into first and second halves.
Here are two versions that split into the first and second halves, rounding down (RD) or up (RU) if the length is odd:
splitRD xs = splitAt (length xs `div` 2) xs
splitRU xs = splitAt ((length xs + 1) `div` 2) xs
Here's a version that splits the list into even and odd elements:
splitEO [] = ([], [])
splitEO [e] = ([e], [])
splitEO (e:o:xs) = (e:es, o:os) where (es, os) = splitEO xs
Few suggestions
Write types to all the functions. It makes the code more readable and also helps catching errors.
The type of ++ is [a] -> [a] -> [a] and you are adding length of a list along with elements. Since list has to be of uniform type and length returns Int type, so compiler infers type of split as
[[Int]] -> t (assuming split2 returns type t).
When you pass ([((length(x:y:xs)div2)-2) : x ++ y] : [xs]) to split2.
xs is of type [[Int]] which means
[xs] is of type [[[Int]]]
, so compiler infers type of split2 to [[[Int]]] -> t.
Now in the definition of split2
split2 (x:xs:[y:ys]) = split2 ((x-1) : [xs] ++ y : [ys])
ys is of type [[Int]], so y is of type [Int]. xs is of type [[Int]], but you are doing [xs] ++ y, which means both [xs] and y should be of same type ( [a] for some a).
Since you have not provided any types compiler is totally confused how to infer such type.
If you simply want to split the list into two equal parts why not do something more simpler like
split3 :: [a] -> ([a], [a])
split3 [] = ([],[])
split3 [x] = ([x],[])
split3 (x:y:xs) = let (xs',ys') = split3 xs in (x:xs',y:ys')
You seem to be passing state around in a list instead of as values to a function, which creates problems when it seems to the compiler as though you're creating a list of heterogenous values, whereas lists in Haskell are supposed to be of homogenous type.
Instead of
split2 (0:xs:[y:ys])
you should pass the different arguments/values to the function separately like this
split2 n xs (y:ys)
The functionality you're looking for is also reproduced in standard library functions.
halveList xs = splitAt (length xs `div` 2) xs
In Haskell, the elements of a list need to be all of the same type. In your function the lists contain a mixture of Ints, elements of the original list, and sublists of the original list, all of which are probably different types.
You also have some confusion about how to append lists and elements. x ++ y can only be used when x and y are themselves lists, while x : y can only be used when y is a list and x is an element of a list; to make a new list containing x and y as elements instead use [x, y] (although x:[y] also works). Similarly [xs] ++ y needs to be xs ++ [y] instead.
Without changing your basic algorithm, the simplest solution is probably to let split2 take 3 separate arguments.
split (x:y:xs) = split2 ((length(x:y:xs) `div` 2)-2) [x,y] xs
split2 n xs (y:ys) = split2 (n-1) (xs++[y]) ys
split2 0 xs ys = [xs,ys]

Algorithm - How to delete duplicate elements in a Haskell list

I'm having a problem creating an function similar to the nub function.
I need this func to remove duplicated elements form a list.
An element is duplicated when 2 elements have the same email, and it should keep the newer one (is closer to the end of the list).
type Regist = [name,email,,...,date]
type ListRe = [Regist]
rmDup ListRe -> ListRe
rmDup [] = []
rmDup [a] = [a]
rmDup (h:t) | isDup h (head t) = rmDup t
| otherwise = h : rmDup t
isDup :: Regist -> Regist -> Bool
isDup (a:b:c:xs) (d:e:f:ts) = b==e
The problem is that the function doesn't delete duplicated elements unless they are together in the list.
Just use nubBy, and specify an equality function that compares things the way you want.
And I guess reverse the list a couple of times if you want to keep the last element instead of the first.
Slightly doctored version of your original code to make it run:
type Regist = [String]
type ListRe = [Regist]
rmDup :: ListRe -> ListRe
rmDup [] = []
rmDup (x:xs) = x : rmDup (filter (\y -> not(x == y)) xs)
Result:
*Main> rmDup [["a", "b"], ["a", "d"], ["a", "b"]]
[["a","b"],["a","d"]]
Anon is correct: nubBy is the function you are looking for, and can be found in Data.List.
That said, you want a function rem which accepts a list xs and a function f :: a -> a -> Bool (on which elements are compared for removal from xs). Since the definition is recursive, you need a base case and a recursive case.
In the base case xs = [] and rem f xs = [], since the result of removing all duplicate elements from [] is []:
rem :: Eq a => (a -> a -> Bool) -> [a] -> [a]
rem f [] = []
In the recursive case, xs = (a:as). Let as' be the list obtained by removing all elements a' such that f a a' = True from the list as. This is simply the function filter (\a' -> not $ f a a') applied to the list as. Them rem f (a:as) is the result of recursively calling rem f on as', that is, a : rem f as':
rem f (a:as) = a : rem f $ filter (\a' -> not $ f a a') as
Replace f be a function comparing your list elements for the appropriate equality (e-mail addresses).
While nubBy with two reverse's is probably the best among simple solutions (and probably exactly what Justin needs for his task), one should not forget that it isn't the ideal solution in terms of efficiency - after all nubBy is O(n^2) (in the "worst case" - when there are no duplicates). Two reverse's will also take their toll (in the form of memory allocation).
For more efficient implementation Data.Map (O(logN) on inserts) can be used as an intermediate "latest non duplicating element" holder (Set.insert replaces older element with newer if there is a collision):
import Data.List
import Data.Function
import qualified Data.Set as S
newtype Regis i e = Regis { toTuple :: (i,[e]) }
selector (Regis (_,(_:a:_))) = a
instance Eq e => Eq (Regis i e) where
(==) = (==) `on` selector
instance Ord e => Ord (Regis i e) where
compare = compare `on` selector
rmSet xs = map snd . sortBy (compare `on` fst) . map toTuple . S.toList $ set
where
set = foldl' (flip (S.insert . Regis)) S.empty (zip [1..] xs)
While nubBy implementation is definitely much simpler:
rmNub xs = reverse . nubBy ((==) `on` (!!1)) . reverse $ xs
on 10M elements list (with lots of duplication - nub should play nice here) there is 3 times difference in terms of running time and 700 times difference in memory usage. Compiled with GHC with -O2 :
input = take 10000000 $ map (take 10) $ permutations [1..]
test1 = rmNub input
test2 = rmSet input
Not sure about the nature of the author's data though (the real data might change the picture).
(Assuming you want to figure out an answer, not just call a library function that does this job for you.)
You get what you ask for. What if h is not equal to head t but is instead equal to the 3rd element of t? You need to write an algorithm that compares h with every element of t, not just the first element.
Why not putting everything in a Map from email to Regist (of course respecting your "keep the newest" rule), and then transform the values of the map back in the list? That's the most efficient way I can think of.
I used Alexei Polkhanov's answer and came to the following, so you can remove duplicates from lists with a type that extends Eq class.
removeDuplicates :: Eq a => [[a]] -> [[a]]
removeDuplicates [] = []
removeDuplicates (x:xs) = x : removeDuplicates (filter (\y -> not (x == y)) xs)
Examples:
*Verdieping> removeDuplicates [[1],[2],[1],[1,2],[1,2]]
[[1],[2],[1,2]]
*Verdieping> removeDuplicates [["a","b"],["a"],["a","b"],["c"],["c"]]
[["a","b"],["a"],["c"]]

Resources