Is there a better way to write a "string contains X" method? - haskell

Just stared using Haskell and realized (at far as I can tell) there is no direct way to check a string to see if it contains a smaller string. So I figured I'd just take a shot at it.
Essentially the idea was to check if the two strings were the same size and were equal. If the string being checked was longer, recursively lop of the head and run the check again until the string being checked was the same length.
The rest of the possibilities I used pattern matching to handle them. This is what I came up with:
stringExists "" wordToCheckAgainst = False
stringExists wordToCheckFor "" = False
stringExists wordToCheckFor wordToCheckAgainst | length wordToCheckAgainst < length wordToCheckFor = False
| length wordToCheckAgainst == length wordToCheckFor = wordToCheckAgainst == wordToCheckFor
| take (length wordToCheckFor) wordToCheckAgainst == wordToCheckFor = True
| otherwise = stringExists wordToCheckFor (tail wordToCheckAgainst)

If you search Hoogle for the signature of the function you're looking for (String -> String -> Bool) you should see isInfixOf among the top results.

isInfixOf from Data.List will surely solve the problem, however in case of longer haystacks or perverse¹ needles you should consider more advanced string matching algorithms with a much better average and worst case complexity.
¹ Consider a really long string consisting only of a's and a needle with a lot of a's at the beginning and one b at the end.

Consider using the text package(text on Hackage, now also part of Haskell Platform) for your text-processing needs. It provides a Unicode text type, which is more time- and space-efficient than the built-in list-based String. For string search, the text package implements a Boyer-Moore-based algorithm, which has better complexity than the naïve method used by Data.List.isInfixOf.
Usage example:
Prelude> :s -XOverloadedStrings
Prelude> import qualified Data.Text as T
Prelude Data.Text> T.breakOnAll "abc" "defabcged"
[("def","abcged")]
Prelude Data.Text> T.isInfixOf "abc" "defabcged"
True

Related

Haskell substring testing

When I use sub_string("abberr","habberyry") , it returns True, when obviously it should be False. The point of the function is to search for the first argument within the second one. Any ideas what's wrong?
sub_string :: (String, String) -> Bool
sub_string(_,[]) = False
sub_string([],_) = True
sub_string(a:x,b:y) | a /= b = sub_string(a:x,y)
| otherwise = sub_string(x,y)
Let me give you hints on why it's not working:
your function consumes "abber" and "habber" of the input stings on the initial phase.
Now "r" and "yry" is left.
And "r" is a subset of "yry". So it returns True. To illustrate a more simple example of your problem:
*Main> sub_string("rz","rwzf")
True
First off, you need to switch your first two lines. _ will match [] and this will matter when you're matching, say, substring "abc" "abc". Secondly, it is idiomatic Haskell to write a function with two arguments instead of one with a pair argument. So your code should start out:
substring :: String -> String -> Bool
substring [] _ = True
substring _ [] = False
substring needle (h : aystack)
| ...
Now we get to the tricky case where both of these lists are not empty. Here's the problem with recursing on substring as bs: you'll get results like "abc" being a substring of "axbxcx" (because "abc" will match 'a' first, then will look for "bc" in the rest of the string; the substring algorithm will then skip past the 'x' to look for "bc" in "bxcx", which will match 'b' and look for "c" in "xcx", which will return True.
Instead your condition needs to be more thorough. If you're willing to use functions from Data.List this is:
| isPrefixOf needle (h : aystack) = True
| otherwise = substring needle aystack
Otherwise you need to write your own isPrefixOf, for example:
isPrefixOf needle haystack = needle == take (length needle) haystack
As Sibi already pointed out, your function tests for subsequence. Review the previous exercise, it is probably isPrefixof (hackage documentation), which is just a fancy way of saying startsWith, which looks very similar to the function you wrote.
If that is not the previous exercise, do that now!
Then write sub_string in terms of isPrefixOf:
sub_string (x, b:y) = isPrefixOf ... ?? ???
Fill in the dots and "?" yourself.

How to find maximum overlap between two strings in Scala?

Suppose I have two strings: s and t. I need to write a function f to find a max. t prefix, which is also an s suffix. For example:
s = "abcxyz", t = "xyz123", f(s, t) = "xyz"
s = "abcxxx", t = "xx1234", f(s, t) = "xx"
How would you write it in Scala ?
This first solution is easily the most concise, also it's more efficient than a recursive version as it's using a lazily evaluated iteration
s.tails.find(t.startsWith).get
Now there has been some discussion regarding whether tails would end up copying the whole string over and over. In which case you could use toList on s then mkString the result.
s.toList.tails.find(t.startsWith(_: List[Char])).get.mkString
For some reason the type annotation is required to get it to compile. I've not actually trying seeing which one is faster.
UPDATE - OPTIMIZATION
As som-snytt pointed out, t cannot start with any string that is longer than it, and therefore we could make the following optimization:
s.drop(s.length - t.length).tails.find(t.startsWith).get
Efficient, this is not, but it is a neat (IMO) one-liner.
val s = "abcxyz"
val t ="xyz123"
(s.tails.toSet intersect t.inits.toSet).maxBy(_.size)
//res8: String = xyz
(take all the suffixes of s that are also prefixes of t, and pick the longest)
If we only need to find the common overlapping part, then we can recursively take tail of the first string (which should overlap with the beginning of the second string) until the remaining part will not be the one that second string begins with. This also covers the case when the strings have no overlap, because then the empty string will be returned.
scala> def findOverlap(s:String, t:String):String = {
if (s == t.take(s.size)) s else findOverlap (s.tail, t)
}
findOverlap: (s: String, t: String)String
scala> findOverlap("abcxyz", "xyz123")
res3: String = xyz
scala> findOverlap("one","two")
res1: String = ""
UPDATE: It was pointed out that tail might not be implemented in the most efficient way (i.e. it creates a new string when it is called). If that becomes an issue, then using substring(1) instead of tail (or converting both Strings to Lists, where it's tail / head should have O(1) complexity) might give a better performance. And by the same token, we can replace t.take(s.size) with t.substring(0,s.size).

How to search a pattern from a file/String in Haskell

** old**
Suppose we have a pattern ex. "1101000111001110".
Now I have a pattern to be searched ex. "1101". I am new to Haskell world, I am trying it at my end. I am able to do it in c but need to do it in Haskell.
Given Pattern := "1101000111001110"
Pattern To Be Searched :- "110
Desired Output:-"Pattern Found"`
** New**
import Data.List (isInfixOf)
main = do x <- readFile "read.txt"
putStr x
isSubb :: [Char] -> [Char] -> Bool
isSubb sub str = isInfixOf sub str
This code reads a file named "read", which contains the following string 110100001101. Using isInfixOf you can check the pattern "1101" in the string and result will be True.
But the problem is i am not able to search "1101" in the string present in "read.txt".
I need to compare the "read.txt" string with the user provided string. i.e
one string is their in the file "read.txt"
and second string user will provid (user defined) and we will perform search and find whether user defined string is present in the string present in "read.txt"
Answer to new:
To achieve this, you have to use readLn:
sub <- readLn
readLn accepts input until a \n is encountered and <- binds the result to sub. Watch out that if the input should be a string you have to explicitly type the "s around your string.
Alternatively if you do not feel like typing the quotation marks every time, you can use getLine in place of readLn which has the type IO String which becomes String after being bound to sub
For further information on all functions included in the standard libraries of Haskell see Hoogle. Using Hoogle you can search functions by various criteria and will often find functions which suit your needs.
Answer to old:
Use the isInfixOf function from Data.List to search for the pattern:
import Data.List (isInfixOf)
isInfixOf "1101" "1101000111001110" -- outputs true
It returns true if the first sequence exists in the second and false otherwise.
To read a file and get its contents use readFile:
contents <- readFile "filename.txt"
You will get the whole file as one string, which you can now perform standard functions on.
Outputting "Pattern found" should be trivial then.

Correct way to define a function in Haskell

I'm new to Haskell and I'm trying out a few tutorials.
I wrote this script:
lucky::(Integral a)=> a-> String
lucky 7 = "LUCKY NUMBER 7"
lucky x = "Bad luck"
I saved this as lucky.hs and ran it in the interpreter and it works fine.
But I am unsure about function definitions. It seems from the little I have read that I could equally define the function lucky as follows (function name is lucky2):
lucky2::(Integral a)=> a-> String
lucky2 x=(if x== 7 then "LUCKY NUMBER 7" else "Bad luck")
Both seem to work equally well. Clearly function lucky is clearer to read but is the lucky2 a correct way to write a function?
They are both correct. Arguably, the first one is more idiomatic Haskell because it uses its very important feature called pattern matching. In this form, it would usually be written as:
lucky::(Integral a)=> a-> String
lucky 7 = "LUCKY NUMBER 7"
lucky _ = "Bad luck"
The underscore signifies the fact that you are ignoring the exact form (value) of your parameter. You only care that it is different than 7, which was the pattern captured by your previous declaration.
The importance of pattern matching is best illustrated by function that operates on more complicated data, such as lists. If you were to write a function that computes a length of list, for example, you would likely start by providing a variant for empty lists:
len [] = 0
The [] clause is a pattern, which is set to match empty lists. Empty lists obviously have length of 0, so that's what we are having our function return.
The other part of len would be the following:
len (x:xs) = 1 + len xs
Here, you are matching on the pattern (x:xs). Colon : is the so-called cons operator: it is appending a value to list. An expression x:xs is therefore a pattern which matches some element (x) being appended to some list (xs). As a whole, it matches a list which has at least one element, since xs can also be an empty list ([]).
This second definition of len is also pretty straightforward. You compute the length of remaining list (len xs) and at 1 to it, which corresponds to the first element (x).
(The usual way to write the above definition would be:
len (_:xs) = 1 + len xs
which again signifies that you do not care what the first element is, only that it exists).
A 3rd way to write this would be using guards:
lucky n
| n == 7 = "lucky"
| otherwise = "unlucky"
There is no reason to be confused about that. There is always more than 1 way to do it. Note that this would be true even if there were no pattern matching or guards and you had to use the if.
All of the forms we've covered so far use so-called syntactic sugar provided by Haskell. Pattern guards are transformed to ordinary case expressions, as well as multiple function clauses and if expressions. Hence the most low-level, unsugared way to write this would be perhaps:
lucky n = case n of
7 -> "lucky"
_ -> "unlucky"
While it is good that you check for idiomatic ways I'd recommend to a beginner that he uses whatever works for him best, whatever he understands best. For example, if one does (not yet) understand points free style, there is no reason to force it. It will come to you sooner or later.

How to verify if some items are in a list?

I'm trying to mess about trying the haskell equivalent of the 'Scala One Liners' thing that has recently popped up on Reddit/Hacker News.
Here's what I've got so far (people could probably do them a lot better than me but these are my beginner level attempts)
https://gist.github.com/1005383
The one I'm stuck on is verifying if items are in a list. Basically the Scala example is this
val wordlist = List("scala", "akka", "play framework", "sbt", "typesafe")
val tweet = "This is an example tweet talking about scala and sbt."
(words.foldLeft(false)( _ || tweet.contains(_) ))
I'm a bit stumped how to do this in Haskell. I know you can do:
any (=="haskell") $ words "haskell is great!"
To verify if one of the words is present, but the Scala example asks if any of the words in the wordlist are present in the test string.
I can't seem to find a contains function or something similar to that. I know you could probably write a function to do it but that defeats the point of doing this task in one line.
Any help would be appreciated.
You can use the elem function from the Prelude which checks if an item is in a list. It is commonly used in infix form:
Prelude> "foo" `elem` ["foo", "bar", "baz"]
True
You can then use it in an operator section just like you did with ==:
Prelude> let wordList = ["scala", "akka", "play framework", "sbt", "types"]
Prelude> let tweet = "This is an example tweet talking about scala and sbt."
Prelude> any (`elem` wordList) $ words tweet
True
When you find yourself needing a function, but you don't know the name, try using Hoogle to search for a function by type.
Here you wanted something that checks if a thing of any type is in a list of things of the same type, i.e. something of a type like a -> [a] -> Bool (you also need an Eq constraint, but let's say you didn't know that). Typing this type signature into Hoogle gives you elem as the top result.
How about using Data.List.intersect?
import Data.List.intersect
not $ null $ intersect (words tweet) wordList
Although there are already good answers I thought it'd be nice to write something in the spirit of your original code using any. That way you get to see how to compose your own complex queries from simple reusable parts rather than using off-the-shelf parts like intersect and elem:
any (\x -> any (\y -> (y == x)) $ words "haskell is great!")
["scala", "is", "tolerable"]
With a little reordering you can sort of read it in English: is there any word x in the sentence such that there is any y in the list such that x == y? It's clear how to extend to more 'axes' than two, perform comparisons other than ==, and even mix it up with all.

Resources