Find index of substring in another string Haskell

Find index of substring in another string Haskell - haskell

I am to make a function which takes two parameters (Strings). The function shall see if the first parameter is a substring of the second parameter. If that is the case, it shall return tuples of each occurences which consists of the startindex of the substring, and the index of the end of the substring.
For example:
f :: String -> String -> [(Int,Int)]
f "oo" "foobar" = [(1,2)]
f "oo" "fooboor" = [(1,2),(4,5)]
f "ooo" "fooobar" = [(1,3)]
We are not allowed to import anything, but I have a isPrefix function. It checks if the first parameter is a prefix to the second parameter.
isPrefix :: Eq a => [a] -> [a] -> Bool
isPrefix [] _ = True
isPrefix _ [] = False
isPrefix (x:xs) (y:ys) |x== y = isPrefix xs ys
|otherwise = False
I was thinking a solution may be to run the function "isPrefix" first on x, and if it returns False, I run it on the tail (xs) and so on. However, I am struggling to implement it and struggling to understand how to return the index of the string if it exists. Perhaps using "!!"? Do you think I am onto something? As I am new to Haskell the syntax is a bit of a struggle :)

We can make a function that will check if the first list is a prefix of the second list. If that is the case, we prepend (0, length firstlist - 1) to the recursive call where we increment both indexes by one.
Ths thus means that such function looks like:
f :: Eq a => [a] -> [a] -> [(Int, Int)]
f needle = go
where go [] = []
go haystack#(_: xs)
| isPrefix needle haystack = (…, …) : tl -- (1)
| otherwise = tl
where tl = … (go xs) -- (2)
n = length needle
Here (1) thus prepends (…, …) to the list; and for (2) tl makes a recursive call and needs to post-process the list by incrementing both items of the 2-tuple by one.
There is a more efficient algorithm to do this where you pass the current index in the recursive call, or you can implement the Knuth-Morris-Pratt algorithm [wiki], I leave these as an exercise.

Related

Understanding non-strictness in Haskell with a recursive example

What is the difference between this two, in terms of evaluation?
Why this "obeys" (how to say?) non-strictness
recFilter :: (a -> Bool) -> [a] -> [a]
recFilter _ [] = []
recFilter p (h:tl) = if (p h)
then h : recFilter p tl
else recFilter p tl
while this doesn't?
recFilter :: (a -> Bool) -> [a] -> Int -> [a]
recFilter _ xs 0 = xs
recFilter p (h:tl) len
| p(h) = recFilter p (tl ++ [h]) (len-1)
| otherwise = recFilter p tl (len-1)
Is it possible to write a tail-recursive function non-strictly?
To be honest I also don't understand the call stack of the first example, because I can't see where that h: goes. Is there a way to see this in ghci?

The non-tail recursive function roughly consumes a portion of the input (the first element) to produce a portion of the output (well, if it's not filtered out at least). Then recursion handles the next portion of the input, and so on.
Your tail recursive function will recurse until len becomes zero, and only at that point it will output the whole result.
Consider this pseudocode:
def rec1(p,xs):
case xs:
[] -> []
(y:ys) -> if p(y): print y
rec1(p,ys)
and compare it with this accumulator-based variant. I'm not using len since I use a separate accumulator argument, which I assume to be initially empty.
def rec2(p,xs,acc):
case xs:
[] -> print acc
(y:ys) -> if p(y):
rec2(p,ys,acc++[y])
else:
rec2(p,ys,acc)
rec1 prints before recursing: it does not need to inspect the whole input list to start printing its output. It works in a "steraming" fashion, in a sense. Instead, rec2 will only start to print at the very end, after the input list was completely scanned.
In your Haskell code there are no prints, of course, but you can thing of returning x : function call as "printing x", since x is made available to the caller of our function before function call is actually made. (Well, to be pedantic this depends on how the caller will consume the output list, but I'll neglect this.)
Hence the non-tail recursive code can also work on infinite lists. Even on finite inputs, performance is improved: if we call head (rec1 p xs), we only evaluate xs until the first non-discarded element. By contrast head (rec2 p xs) would fully filter the whole list xs, even we don't need that.

The second implementation does not make much sense: a variable named len will not contain the length of the list. You thus need to pass this, for infinite lists, this would not work, since there is no length at all.
You likely want to produce something like:
recFilter :: (a -> Bool) -> [a] -> [a]
recFilter p = go []
where go ys [] = ys -- (1)
go ys (x:xs) | p x = go (ys ++ [x]) xs
| otherwise = go ys xs
where we thus have an accumulator to which we append the items in the list, and then eventually return the accumulator.
The problem with the second approach is that as long as the accumulator is not returned, Haskell will need to keep recursing until at least we reach weak head normal form (WHNF). This means that if we pattern match the result with [] or (_:_), we will need at least have to recurse until case one, since the other cases only produce a new expression, and it will thus not yield a data constructor on which we can pattern match.
This in contrast to the first filter where if we pattern match on [] or (_:_) it is sufficient to stop at the first case (1), or the third case 93) where the expression produces an object with a list data constructor. Only if we require extra elements to pattern match, for example (_:_:_), it will require to evaluate the recFilter p tl in case (2) of the first implementation:
recFilter :: (a -> Bool) -> [a] -> [a]
recFilter _ [] = [] -- (1)
recFilter p (h:tl) = if (p h)
then h : recFilter p tl -- (2)
else recFilter p tl
For more information, see the Laziness section of the Wikibook on Haskell that describes how laziness works with thunks.

Is there a better way of writing indexof function?

I wrote a indexOf function in haskell. Is there a better way to write it?
My second question is
In my function is the tails function lazily evaluated?
Following is my indexof function
import Data.List
indexof :: String -> String -> Int
indexof lat patt = helper (tails lat) 0
where helper [] _ = -1
helper (x:xs) a = if prefix x patt then a else helper xs (a + 1)
prefix :: String -> String -> Bool
prefix _ [] = True
prefix [] _ = False
prefix (x:xs) (y:ys) = if x == y then prefix xs ys else False
This works as expected.

It looks more idiomatic to use the pattern as first parameter, usually failure is not resolved with -1 or some other value, but by using Nothing and thus using Maybe Int as return type.
We can use a foldr pattern here, which makes it more elegant, and Data.List has an isPrefixOf :: Eq a => [a] -> [a] -> Bool:
import Data.List(isPrefixOf, tails)
indexof :: Eq a => [a] -> [a] -> Maybe Int
indexof patt = foldr helper Nothing . tails
where helper cur rec | isPrefixOf patt cur = Just 0
| otherwise = fmap (1+) rec
It might however be better to implement the Knuth-Morris-Pratt algorithm [wiki] since this will result in searching in O(m + n) with m the length of the pattern and n the length of the string. The current approach requires O(m×n).
My second question is In my function is the tails function lazily evaluated?
tails is indeed lazily evaluated. The bottleneck is probably not in tails :: [a] -> [[a]] however, since tails runs in O(n) on an evaluated list, and requires O(n) memory as well, since the tail pointers are shared. It thus does not really constructs a new list per item, it simply each time points to the tail of the previous element.

Building on Willem's answer: usually keeping track of indices is done by zipping with [0..]. The approach here is to find a list [Maybe Int] of possible matches, and then take the first one (which is all done lazily, of course, so we never actually compute the list of matches past the first Just occurrence).
indexOf :: (Eq a) => [a] -> [a] -> Maybe Int
indexOf needle haystack = firstJust $ zipWith findmatch [0..] (tails haystack)
where
findmatch ix suffix
| needle `isPrefixOf` suffix -> Just ix
| otherwise -> Nothing
firstJust :: [Maybe a] -> Maybe a
firstJust = getFirst . mconcat . map First
-- N.B. should really use `coerce` instead of `map First`
I find this fairly "direct", which I like. We can cut the code size by being a bit cleverer:
indexOf needle haystack = listToMaybe . concat $ zipWith findmatch [0..] (tails haystack)
where
findmatch ix suffix = [ ix | needle `isPrefixOf` suffix ]
Essentially we are using zero- or one-element lists to simulate Maybe, and then using the slightly better library and notational support for lists to our advantage. This also might fuse well... (I don't have a good intuition for that)
But yes, if you want it to be fast, you should use KMP (on Texts instead of Strings). It's much more involved, though.

Haskell filter out circular permutations

You have a list with N elements
You only want to print elements that are not circular permuations of other elements of the same list
To check if two strings are the circular permutations of each other I do this, which works fine :
string1 = "abc"
string2 = "cab"
stringconc = string1 ++ string1
if string2 `isInfixOf` stringconc
then -- it's a circular permuation
else -- it's not
Edit : As one comment pointed that out, this test only work for strings of the same size
Back to the real use case :
checkClean :: [String] -> [String] -> IO String
checkClean [] list = return ""
checkClean (x:xs) list = do
let sequence = cleanInfix x list
if sequence /= "abortmath"
then putStr sequence
else return ()
checkClean xs list
cleanInfix :
cleanInfix :: String -> [String] -> String
cleanInfix seq [] = seq
cleanInfix seq (x:xs) = do
let seqconc = x ++ x
if seq `isInfixOf` seqconc && seq /= x
then "abortmath"
else cleanInfix seq xs
However this just outputs... nothing
With some research I found out that sequence in checkClean is always "abortmath"
Also I'm not quite comfortable with this "flag" abortmath, because if by any chance one element of the list is "abortmath", well..
For example :
if I have a list composed of :
NUUNNFFUF
FFUFNUUNN
I should write
NUUNNFFUF

I guess you call your initial code (question) with something like that:
result = ["NUUNNFFUF", "FFUFNUUNN"]
main = do
checkClean result result
It won't print anything because:
the first call of cleanInfix has the arguments following arguments: "NUUNNFFUF" and ["NUUNNFFUF", "FFUFNUUNN"]
in cleanInfix, since seq == x you have a recursive call with the following arguments: "NUUNNFFUF" and ["FFUFNUUNN"]
now, "NUUNNFFUF" is a real permutation of "FFUFNUUNN": cleanInfix returns "abortmath", and checkClean returns ()
then you have a recursive call of checkClean with following arguments: "FFUFNUUNN" and ["NUUNNFFUF", "FFUFNUUNN"]
again, "FFUFNUUNN" is a real permutation of "NUUNNFFUF": cleanInfix returns "abortmath", and checkClean returns ()
this is the end.
Basically, x is a permutation of y and y is a permutation of x, thus x and y are discarded.
Your answer works, but it is horribly complicated.
I won't try to improve either of your codes, but I will make a general comment: you should (you really should) avoid returning a monad when you don't need to: in the question, checkClean just needs to remove duplicates (or "circular duplicates") from a list. That's totally functional: you have all the information you need. Thus, remove those dos, lets and returns!
Now, let's try to focus on this:
You have a list with N elements You only want to print elements that are not circular permuations of other elements of the same list
Why don't you use your initial knowledge on circular permutations?
isCircPermOf x y = x `isInfixOf` (y ++ y)
Now, you need a function that takes a sequence and a list of sequences, and return only the elements of the second that are not circular permutations of the first :
filterCircDuplicates :: String -> [String] -> [String]
filterCircDuplicates seq [] = []
filterCircDuplicates seq (x:xs) =
if seq `isCircPermOf` x
then removeCircDuplicates seq xs
else x:removeCircDuplicates seq xs
This pattern is well know, and you can use filter to simplify it:
filterCircDuplicates seq l = filter (\x -> !seq `isCircPermOf` x) l
Or better:
filterCircDuplicates seq = filter (not.isCircPermOf seq)
Note the signature: not.isCircPermOf seq :: String -> Boolean. It returns true if the current element is not a circular permutation of seq. (You don't have to add the list argument.)
Final step: you need a function that takes a list and return this list without (circular) duplicates.
removeCircDuplicates :: [String] -> [String]
removeCircDuplicates [] = []
removeCircDuplicates (x:xs) = x:filterCircDuplicates x (removeCircDuplicates xs)
When your list has a head and a tail, you clean the tail, then remove the duplicates of the first element of the tail, and keep this first element.
Again, you have a well known pattern, a fold:
removeCircDuplicates = foldr (\x acc -> x:filterCircDuplicates x acc) []
It removes the duplicates from right to left.
And if you want a one-liner:
Prelude Data.List> foldr (\x -> ((:) x).filter(not.(flip isInfixOf (x++x)))) [] ["abcd", "acbd", "cdab", "abdc", "dcab"]
["abcd","acbd","abdc"]

The wonders you can make with a pen and some paper...
So if anyone is interested here is how I solved it, it's probably badly optimised but at least it works (I'm just trying to learn haskell, so it's good enough for now)
-- cleanInfix function
cleanInfix :: String -> [String] -> [String] -> [String]
cleanInfix sequence [] cleanlist = cleanlist
cleanInfix sequence (x:xs) cleanlist = do
-- this is where I check for the circular permuation
let sequenceconc = x ++ x
if sequence `isInfixOf` sequenceconc
then cleanInfix sequence xs (delete x cleanlist)
else cleanInfix sequence xs cleanlist
-- checkClean Function
checkClean :: [String] -> [String] -> [String] -> [String]
checkClean [] listesend cleanlist = cleanlist
checkClean (x:xs) listesend cleanlist = do
-- The first delete is to avoid checking if an element is the circular permuation of... itself, because it obviously is... in some way
let liste2 = cleanInfix x (delete x listesend) cleanlist
checkClean xs (delete x listesend) liste2
-- Clean function, first second and third are the command line argument don't worry about them
clean first second third = do
-- create of the result list by asking user for input
let printlist = checkClean result result result -- yes, it's the same list, three times
print printlist -- print the list

Remove the First Value in a List that Meets a Criterion

I'm trying to solve this problem. This function takes two parameters. The first is a function that returns a boolean value, and the second is a list of numbers. The function is supposed to remove the first value in the second parameter that returns true when run with the first parameter.
There's a second function, which does the same thing, except it removes the last value that satisfies it, instead of the first.
I'm fairly certain I have the logic down, as I tested it in another language and it worked, my only problem is translating it into Haskell syntax. Here's what I have:
removeFirst :: (t -> Bool) -> [t] -> [t]
removeFirst p xs = []
removeFirst p xs
| p y = ys
| otherwise = y:removeFirst p ys
where
y:ys = xs
removeLast :: (t -> Bool) -> [t] -> [t]
removeLast p xs = []
removeLast p xs = reverse ( removeFirst p ( reverse xs ) )
I ran:
removeFirst even [1..10]
But instead of getting [1,3,4,5,6,7,8,9,10] as expected, I get [].
What am I doing wrong?

removeFirst p xs = []
This always returns the empty list and it matches all arguments. I think you mean this.
removeFirst _ [] = []

Your first equation,
removeFirst p xs = []
says „Whatever my arguments are, just return []“, and the rest of the code is ignored.
You probably mean
removeFirst p [] = []
saying „When the list is already empty, return the empty list.“

Creating a function using subset language Core Haskell to remove duplicate items in a list

The language I'm using is a subset of Haskell called Core Haskell which does not allow the use of the built-in functions of Haskell. For example, if I were to create a function which counts the number of times that the item x appears in the list xs, then I would write:
count = \x ->
\xs -> if null xs
then 0
else if x == head xs
then 1 + count x(tail xs)
else count x(tail xs)
I'm trying to create a function which outputs a list xs with its duplicate values removed. E.g. remdups (7:7:7:4:5:7:4:4:[]) => (7:4:5:[])
can anyone offer any advice?
Thanks!

I'm guessing that you're a student, and this is a homework problem, so I'll give you part of the answer and let you finish it. In order to write remdups, it would be useful to have a function that tells us if a list contains an element. We can do that using recursion. When using recursion, start by asking yourself what the "base case", or simplest possible case is. Well, when the list is empty, then obviously the answer is False (no matter what the character is). So now, what if the list isn't empty? We can check if the first character in the list is a match. If it is, then we know that the answer is True. Otherwise, we need to check the rest of the list -- which we do by calling the function again.
elem _ [] = False
elem x (y:ys) = if x==y
then True
else elem x ys
The underscore (_) simply means "I'm not going to use this variable, so I won't even bother to give it a name." That can be written more succinctly as:
elem _ [] = False
elem x (y:ys) = x==y || elem x ys
Writing remdups is a little tricky, but I suspect your teacher gave you some hints. One way to approach it is to imagine we're partway through processing the list. We have part of the list that hasn't been processed yet, and part of the list that has been processed (and doesn't contain any duplicates). Suppose we had a function called remdupHelper, which takes those two arguments, called remaining and finished. It would look at the first character in remaining, and return a different result depending on whether or not that character is in finished. (That result could call remdupHelper recursively). Can you write remdupHelper?
remdupHelper = ???
Once you have remdupHelper, you're ready to write remdups. It just invokes remdupHelper in the initial condition, where none of the list has been processed yet:
remdups l = remdupHelper l [] -- '

This works with Ints:
removeDuplicates :: [Int] -> [Int]
removeDuplicates = foldr insertIfNotMember []
where
insertIfNotMember item list = if (notMember item list)
then item : list
else list
notMember :: Int -> [Int] -> Bool
notMember item [] = True
notMember item (x:xs)
| item == x = False
| otherwise = notMember item xs
How it works should be obvious. The only "tricky" part is that the type of foldr is:
(a -> b -> b) -> b -> [a] -> b
but in this case b unifies with [a], so it becomes:
(a -> [a] -> [a]) -> [a] -> [a] -> [a]
and therefore, you can pass the function insertIfNotMember, which is of type:
Int -> [Int] -> [Int] -- a unifies with Int

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Find index of substring in another string Haskell - haskell

Related

Understanding non-strictness in Haskell with a recursive example

Is there a better way of writing indexof function?

Haskell filter out circular permutations

Remove the First Value in a List that Meets a Criterion

Creating a function using subset language Core Haskell to remove duplicate items in a list

Categories

Resources