I wrote a indexOf function in haskell. Is there a better way to write it?
My second question is
In my function is the tails function lazily evaluated?
Following is my indexof function
import Data.List
indexof :: String -> String -> Int
indexof lat patt = helper (tails lat) 0
where helper [] _ = -1
helper (x:xs) a = if prefix x patt then a else helper xs (a + 1)
prefix :: String -> String -> Bool
prefix _ [] = True
prefix [] _ = False
prefix (x:xs) (y:ys) = if x == y then prefix xs ys else False
This works as expected.
It looks more idiomatic to use the pattern as first parameter, usually failure is not resolved with -1 or some other value, but by using Nothing and thus using Maybe Int as return type.
We can use a foldr pattern here, which makes it more elegant, and Data.List has an isPrefixOf :: Eq a => [a] -> [a] -> Bool:
import Data.List(isPrefixOf, tails)
indexof :: Eq a => [a] -> [a] -> Maybe Int
indexof patt = foldr helper Nothing . tails
where helper cur rec | isPrefixOf patt cur = Just 0
| otherwise = fmap (1+) rec
It might however be better to implement the Knuth-Morris-Pratt algorithm [wiki] since this will result in searching in O(m + n) with m the length of the pattern and n the length of the string. The current approach requires O(m×n).
My second question is In my function is the tails function lazily evaluated?
tails is indeed lazily evaluated. The bottleneck is probably not in tails :: [a] -> [[a]] however, since tails runs in O(n) on an evaluated list, and requires O(n) memory as well, since the tail pointers are shared. It thus does not really constructs a new list per item, it simply each time points to the tail of the previous element.
Building on Willem's answer: usually keeping track of indices is done by zipping with [0..]. The approach here is to find a list [Maybe Int] of possible matches, and then take the first one (which is all done lazily, of course, so we never actually compute the list of matches past the first Just occurrence).
indexOf :: (Eq a) => [a] -> [a] -> Maybe Int
indexOf needle haystack = firstJust $ zipWith findmatch [0..] (tails haystack)
where
findmatch ix suffix
| needle `isPrefixOf` suffix -> Just ix
| otherwise -> Nothing
firstJust :: [Maybe a] -> Maybe a
firstJust = getFirst . mconcat . map First
-- N.B. should really use `coerce` instead of `map First`
I find this fairly "direct", which I like. We can cut the code size by being a bit cleverer:
indexOf needle haystack = listToMaybe . concat $ zipWith findmatch [0..] (tails haystack)
where
findmatch ix suffix = [ ix | needle `isPrefixOf` suffix ]
Essentially we are using zero- or one-element lists to simulate Maybe, and then using the slightly better library and notational support for lists to our advantage. This also might fuse well... (I don't have a good intuition for that)
But yes, if you want it to be fast, you should use KMP (on Texts instead of Strings). It's much more involved, though.
Related
I'm writing a function that gets the index of the first even number from a list. The list I get may or may not contain even numbers, and I'd like to return -1 if there are no even numbers in the list. The list can be infinite.
I wrote this
posicPrimerPar'' :: [Int] -> Int
posicPrimerPar'' a = fromJust (elemIndex (head (filter (even) a)) a)
I could do something like:
posicPrimerPar' :: [Int] -> Int
posicPrimerPar' a = case length evens of
0 -> -1;
n -> fromJust elemIndex (head evens) a
where evens = filter (even) a
But as you can see, this is not the most efficient way of doing it. A list [1..100000] contains a lot of even numbers, and I just need the first one. I need Haskell's laziness, so I need to ask for the head right there, but head throws an empty list exception when the list is empty (i.e. there are no even numbers in the list). I cannot find the Haskell equivalent of Python's try: ... except: .... All I could find regarding exceptions were IO related. What I need is except Prelude.head = -1 or something like that.
Haskell is lazy, so evens will not be fully evaluated. The problematic part is the length evens which is not necessary. You can check with null :: Foldable f => f a -> Bool, or with pattern matching. For example:
import Data.List(findIndex)
posicPrimerPar' :: [Int] -> Maybe Int
posicPrimerPar' [] = Nothing
posicPrimerPar' xs = findIndex even xs
for findIndex :: (a -> Bool) -> [a] -> Maybe Int, you however do not need to take into account the empty list, since it already considers this.
or we can return -1 in case there is no such item:
import Data.List(findIndex)
import Data.Maybe(fromMaybe)
posicPrimerPar' :: [Int] -> Int
posicPrimerPar' = fromMaybe (-1) . findIndex even
I am to make a function which takes two parameters (Strings). The function shall see if the first parameter is a substring of the second parameter. If that is the case, it shall return tuples of each occurences which consists of the startindex of the substring, and the index of the end of the substring.
For example:
f :: String -> String -> [(Int,Int)]
f "oo" "foobar" = [(1,2)]
f "oo" "fooboor" = [(1,2),(4,5)]
f "ooo" "fooobar" = [(1,3)]
We are not allowed to import anything, but I have a isPrefix function. It checks if the first parameter is a prefix to the second parameter.
isPrefix :: Eq a => [a] -> [a] -> Bool
isPrefix [] _ = True
isPrefix _ [] = False
isPrefix (x:xs) (y:ys) |x== y = isPrefix xs ys
|otherwise = False
I was thinking a solution may be to run the function "isPrefix" first on x, and if it returns False, I run it on the tail (xs) and so on. However, I am struggling to implement it and struggling to understand how to return the index of the string if it exists. Perhaps using "!!"? Do you think I am onto something? As I am new to Haskell the syntax is a bit of a struggle :)
We can make a function that will check if the first list is a prefix of the second list. If that is the case, we prepend (0, length firstlist - 1) to the recursive call where we increment both indexes by one.
Ths thus means that such function looks like:
f :: Eq a => [a] -> [a] -> [(Int, Int)]
f needle = go
where go [] = []
go haystack#(_: xs)
| isPrefix needle haystack = (…, …) : tl -- (1)
| otherwise = tl
where tl = … (go xs) -- (2)
n = length needle
Here (1) thus prepends (…, …) to the list; and for (2) tl makes a recursive call and needs to post-process the list by incrementing both items of the 2-tuple by one.
There is a more efficient algorithm to do this where you pass the current index in the recursive call, or you can implement the Knuth-Morris-Pratt algorithm [wiki], I leave these as an exercise.
Given an integer n, how can I build the list containing all lists of length n^2 containing exactly n copies of each integer x < n? For example, for n = 2, we have:
[0,0,1,1], [0,1,0,1], [1,0,0,1], [0,1,1,0], [1,0,1,0], [1,1,0,0]
This can be easily done combining permutations and nub:
f :: Int -> [[Int]]
f n = nub . permutations $ concatMap (replicate n) [0..n-1]
But that is way too inefficient. Is there any simple way to encode the efficient/direct algorithm?
Sure, it's not too hard. We'll start with a list of n copies of each number less than n, and repeatedly choose one to start our result with. First, a function for choosing an element from a list:
zippers :: [a] -> [([a], a, [a])]
zippers = go [] where
go l (h:r) = (l,h,r) : go (h:l) r
go _ [] = []
Now we'll write a function that produces all possible interleavings of some input lists. Internally we'll maintain the invariant that each [a] is non-empty; hence we'll have to establish that invariant before we start recursing. In fact, this will be wasted work in the way we intend to call this function, but for good abstraction we might as well handle all inputs correctly, right?
interleavings :: [[a]] -> [[a]]
interleavings = go . filter (not . null) where
go [] = [[]]
go xss = do
(xssl, x:xs, xssr) <- zippers xss
(x:) <$> interleavings ([xs | not (null xs)] ++ xssl ++ xssr)
And now we're basically done. All we have to do is feed in an appropriate starting list.
f :: Int -> [[Int]]
f n = interleavings (replicate n <$> [1..n])
Try it in ghci:
> f 2
[[1,1,2,2],[1,2,2,1],[1,2,1,2],[2,2,1,1],[2,1,1,2],[2,1,2,1]]
isTogether' :: String -> Bool
isTogether' (x:xs) = isTogether (head xs) (head (tail xs))
For the above code, I want to go through every character in the string. I am not allowed to use recursion.
isTogether' (x:xs) = isTogether (head xs) (head (tail xs))
If I've got it right, you are interested in getting consequential char pairs from some string. So, for example, for abcd you need to test (a,b), (b,c), (c,d) with some (Char,Char) -> Bool or Char -> Char -> Bool function.
Zip could be helpful here:
> let x = "abcd"
> let pairs = zip x (tail x)
it :: [(Char, Char)]
And for some f :: Char -> Char -> Bool function we can get uncurry f :: (Char, Char) -> Bool.
And then it's easy to get [Bool] value of results with map (uncurry f) pairs :: [Bool].
In Haskell, a String is just a list of characters ([Char]). Thus, all of the normal higher-order list functions like map work on strings. So you can use whichever higher-order function is most applicable to your problem.
Note that these functions themselves are defined recursively; in fact, there is no way to go through the entire list in Haskell without either recursing explicitly or using a function that directly or indirectly recurses.
To do this without recursion, you will need to use a higher order function or a list comprehension. I don't understand what you're trying to accomplish so I can only give generic advice. You probably will want one of these:
map :: (a -> b) -> [a] -> [b]
Map converts a list of one type into another. Using map lets you perform the same action on every element of the list, given a function that operates on the kinds of things you have in the list.
filter :: (a -> Bool) -> [a] -> [a]
Filter takes a list and a predicate, and gives you a new list with only the elements that satisfy the predicate. Just with these two tools, you can do some pretty interesting things:
import Data.Char
map toUpper (filter isLower "A quick test") -- => "QUICKTEST"
Then you have folds of various sorts. A fold is really a generic higher order function for doing recursion on some type, so using it takes a bit of getting used to, but you can accomplish pretty much any recursive function on a list with a fold instead. The basic type of foldr looks like this:
foldr :: (a -> b -> b) -> b -> [a] -> b
It takes three arguments: an inductive step, a base case and a value you want to fold. Or, in less mathematical terms, you could think of it as taking an initial state, a function to take the next item and the previous state to produce the next state, and the list of values. It then returns the final state it arrived at. You can do some pretty surprising things with fold, but let's say you want to detect if a list has a run of two or more of the same item. This would be hard to express with map and filter (impossible?), but it's easy with recursion:
hasTwins :: (Eq a) => [a] -> Bool
hasTwins (x:y:xs) | x == y = True
hasTwins (x:y:xs) | otherwise = hasTwins (y:xs)
hasTwins _ = False
Well, you can express this with a fold like so:
hasTwins :: (Eq a) => [a] -> Bool
hasTwins (x:xs) = snd $ foldr step (x, False) xs
where
step x (prev, seenTwins) = (x, prev == x || seenTwins)
So my "state" in this fold is the previous value and whether we've already seen a pair of identical values. The function has no explicit recursion, but my step function passes the current x value along to the next invocation through the state as the previous value. But you don't have to be happy with the last state you have; this function takes the second value out of the state and returns that as the overall return value—which is the boolean whether or not we've seen two identical values next to each other.
I'm having a problem creating an function similar to the nub function.
I need this func to remove duplicated elements form a list.
An element is duplicated when 2 elements have the same email, and it should keep the newer one (is closer to the end of the list).
type Regist = [name,email,,...,date]
type ListRe = [Regist]
rmDup ListRe -> ListRe
rmDup [] = []
rmDup [a] = [a]
rmDup (h:t) | isDup h (head t) = rmDup t
| otherwise = h : rmDup t
isDup :: Regist -> Regist -> Bool
isDup (a:b:c:xs) (d:e:f:ts) = b==e
The problem is that the function doesn't delete duplicated elements unless they are together in the list.
Just use nubBy, and specify an equality function that compares things the way you want.
And I guess reverse the list a couple of times if you want to keep the last element instead of the first.
Slightly doctored version of your original code to make it run:
type Regist = [String]
type ListRe = [Regist]
rmDup :: ListRe -> ListRe
rmDup [] = []
rmDup (x:xs) = x : rmDup (filter (\y -> not(x == y)) xs)
Result:
*Main> rmDup [["a", "b"], ["a", "d"], ["a", "b"]]
[["a","b"],["a","d"]]
Anon is correct: nubBy is the function you are looking for, and can be found in Data.List.
That said, you want a function rem which accepts a list xs and a function f :: a -> a -> Bool (on which elements are compared for removal from xs). Since the definition is recursive, you need a base case and a recursive case.
In the base case xs = [] and rem f xs = [], since the result of removing all duplicate elements from [] is []:
rem :: Eq a => (a -> a -> Bool) -> [a] -> [a]
rem f [] = []
In the recursive case, xs = (a:as). Let as' be the list obtained by removing all elements a' such that f a a' = True from the list as. This is simply the function filter (\a' -> not $ f a a') applied to the list as. Them rem f (a:as) is the result of recursively calling rem f on as', that is, a : rem f as':
rem f (a:as) = a : rem f $ filter (\a' -> not $ f a a') as
Replace f be a function comparing your list elements for the appropriate equality (e-mail addresses).
While nubBy with two reverse's is probably the best among simple solutions (and probably exactly what Justin needs for his task), one should not forget that it isn't the ideal solution in terms of efficiency - after all nubBy is O(n^2) (in the "worst case" - when there are no duplicates). Two reverse's will also take their toll (in the form of memory allocation).
For more efficient implementation Data.Map (O(logN) on inserts) can be used as an intermediate "latest non duplicating element" holder (Set.insert replaces older element with newer if there is a collision):
import Data.List
import Data.Function
import qualified Data.Set as S
newtype Regis i e = Regis { toTuple :: (i,[e]) }
selector (Regis (_,(_:a:_))) = a
instance Eq e => Eq (Regis i e) where
(==) = (==) `on` selector
instance Ord e => Ord (Regis i e) where
compare = compare `on` selector
rmSet xs = map snd . sortBy (compare `on` fst) . map toTuple . S.toList $ set
where
set = foldl' (flip (S.insert . Regis)) S.empty (zip [1..] xs)
While nubBy implementation is definitely much simpler:
rmNub xs = reverse . nubBy ((==) `on` (!!1)) . reverse $ xs
on 10M elements list (with lots of duplication - nub should play nice here) there is 3 times difference in terms of running time and 700 times difference in memory usage. Compiled with GHC with -O2 :
input = take 10000000 $ map (take 10) $ permutations [1..]
test1 = rmNub input
test2 = rmSet input
Not sure about the nature of the author's data though (the real data might change the picture).
(Assuming you want to figure out an answer, not just call a library function that does this job for you.)
You get what you ask for. What if h is not equal to head t but is instead equal to the 3rd element of t? You need to write an algorithm that compares h with every element of t, not just the first element.
Why not putting everything in a Map from email to Regist (of course respecting your "keep the newest" rule), and then transform the values of the map back in the list? That's the most efficient way I can think of.
I used Alexei Polkhanov's answer and came to the following, so you can remove duplicates from lists with a type that extends Eq class.
removeDuplicates :: Eq a => [[a]] -> [[a]]
removeDuplicates [] = []
removeDuplicates (x:xs) = x : removeDuplicates (filter (\y -> not (x == y)) xs)
Examples:
*Verdieping> removeDuplicates [[1],[2],[1],[1,2],[1,2]]
[[1],[2],[1,2]]
*Verdieping> removeDuplicates [["a","b"],["a"],["a","b"],["c"],["c"]]
[["a","b"],["a"],["c"]]