Haskell: Delimit a string by chosen sub-strings and whitespace - haskell

Am still new to Haskell, so apologize if there is an obvious answer to this...
I would like to make a function that splits up the all following lists of strings i.e. [String]:
["int x = 1", "y := x + 123"]
["int x= 1", "y:= x+123"]
["int x=1", "y:=x+123"]
All into the same string of strings i.e. [[String]]:
[["int", "x", "=", "1"], ["y", ":=", "x", "+", "123"]]
You can use map words.lines for the first [String].
But I do not know any really neat ways to also take into account the others - where you would be using the various sub-strings "=", ":=", "+" etc. to break up the main string.
Thank you for taking the time to enlighten me on Haskell :-)

The Prelude comes with a little-known handy function called lex, which is a lexer for Haskell expressions. These match the form you need.
lex :: String -> [(String,String)]
What a weird type though! The list is there for interfacing with a standard type of parser, but I'm pretty sure lex always returns either 1 or 0 elements (0 indicating a parse failure). The tuple is (token-lexed, rest-of-input), so lex only pulls off one token. So a simple way to lex a whole string would be:
lexStr :: String -> [String]
lexStr "" = []
lexStr s =
case lex s of
[(tok,rest)] -> tok : lexStr rest
[] -> error "Failed lex"
To appease the pedants, this code is in terrible form. An explicit call to error instead of returning a reasonable error using Maybe, assuming lex only returns 1 or 0 elements, etc. The code that does this reliably is about the same length, but is significantly more abstract, so I spared your beginner eyes.

I would take a look at parsec and build a simple grammar for parsing your strings.

how about using words .)
words :: String -> [String]
and words wont care for whitespaces..
words "Hello World"
= words "Hello World"
= ["Hello", "World"]

Related

Words from string to list

I'm trying to write a Haskell function which would read a string and return a list with the words from the string saved in it.
Here's how I did it:
toWordList :: String -> [String]
toWordList = do
[ toLower x | x <- str ]
let var = removePunctuation(x)
return (words var)
But I get this error:
Test1.hs:13:17: error: parse error on input 'let'
|
13 | let var = removePunctuation(x)
| ^^^
I'm new to Haskell so I don't have the grasp over its syntax so thanks in advance for the help.
There's quite a few mistakes here, you should spend more time reading over some tutorials (learn you a Haskell, Real World Haskell). You're pretty close though, so I'll try to do a break-down here.
do is special - it doesn't switch Haskell into "imperative mode", it lets you write clearer code when using Monads - if you don't yet know what Monads are, stay away from do! Keywords like return also don't behave the same as in imperative languages. Try to approach Haskell with a completely fresh mind.
Also in Haskell, indentation is important - see this link for a good explanation. Essentially, you want all the lines in the same "block" to have the same indentation.
Okay, let's strip out the do and return keywords, and align the indentation. We'll also name the parameter to the function str - in your original code, you missed this bit out.
toWordList :: String -> [String]
toWordList str =
[toLower x | x <- str]
let var = removePunctuation(x)
words var
The syntax for let is let __ = __ in __. There's different notation when using do, but forget about that for now. We also don't name the result of the list comprehension, so let's do that:
toWordList str =
let lowered = [toLower x | x <- str] in
let var = removePunctuation lowered in
words var
And this works! We just needed to get some syntax right and avoid the monadic syntactic sugar of do/return.
It's possible (and easy) to make it nicer though. Those let blocks are kinda ugly, we can strip those away. We can also replace the list comprehension with map toLower, which is a bit more elegant and is equivalent to your comprehension:
toWordList str = words (removePunctuation (map toLower str))
Nice, that's down to a single line now! But all those brackets are also a bit of an eyesore, how about we use the $ function?
toWordList str = words $ removePunctuation $ map toLower str
Looking good. There's another improvement we can make, which is to convert this into point-free style, where we don't explicitly name our parameter - instead we express this function as the composition of other functions.
toWordList = words . removePunctuation . (map toLower)
And we're done! Hopefully the first two code snippets make it clearer how the Haskell syntax works, and the last few might show you some nice examples of how you can make fairly verbose code much much cleaner.

How can I write a parser using Parsec that only accepts unique elements?

I have recently started learning Haskell and have been trying my hand at Parsec. However, for the past couple of days I have been stuck with a problem that I have been unable to find the solution to. So what I am trying to do is write a parser that can parse a string like this:
<"apple", "pear", "pineapple", "orange">
The code that I wrote to do that is:
collection :: Parser [String]
collection = (char '<') *> (string `sepBy` char ',')) <* (char '>')
string :: Parser String
string = char '"' *> (many (noneOf ['\"', '\r', '\n', '"'])) <* char '"'
This works fine for me as it is able to parse the string that I have defined above. Nevertheless, I would now like to enforce the rule that every element in this collection must be unique and that is where I am having trouble. One of the first results I found when searching on the internet was this one, which suggest the usage of the nub function. Although the problem stated in that question is not the same, it would in theory solve my problem. But what I don't understand is how I can apply this function within a Parser. I have tried adding the nub function to several parts of the code above without any success. Later I also tried doing it the following way:
collection :: Parser [String]
collection = do
char '<'
value <- (string `sepBy` char ','))
char '>'
return nub value
But this does not work as the type does not match what nub is expecting, which I believe is one of the problems I am struggling with. I am also not entirely sure whether nub is the right way to go. My fear is that I am going in the wrong direction and that I won't be able to solve my problem like this. Is there perhaps something I am missing? Any advice or help anyone could provide would be greatly appreciated.
The Parsec Parser type is an instance of MonadPlus which means that we can always fail (ie cause a parse error) whenever we want. A handy function for this is guard:
guard :: MonadPlus m => Bool -> m ()
This function takes a boolean. If it's true, it return () and the whole computation (a parse in this case) does not fail. If it's false, the whole thing fails.
So, as long as you don't care about efficiency, here's a reasonable approach: parse the whole list, check for whether all the elements are unique and fail if they aren't.
To do this, the first thing we have to do is write a predicate that checks if every element of a list is unique. nub does not quite do the right thing: it return a list with all the duplicates taken out. But if we don't care much about performance, we can use it to check:
allUnique ls = length (nub ls) == length ls
With this predicate in hand, we can write a function unique that wraps any parser that produces a list and ensures that list is unique:
unique parser = do res <- parser
guard (allUnique res)
return res
Again, if guard is give True, it doesn't affect the rest of the parse. But if it's given False, it will cause an error.
Here's how we could use it:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\">"
Right ["apple","pear","pineapple","orange"]
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):unknown parse error
This does what you want. However, there's a problem: there is no error message supplied. That's not very user friendly! Happily, we can fix this using <?>. This is an operator provided by Parsec that lets us set the error message of a parser.
unique parser = do res <- parser
guard (allUnique res) <?> "unique elements"
return res
Ahhh, much better:
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):
expecting unique elements
All this works but, again, it's worth noting that it isn't efficient. It parses the whole list before realizing elements aren't unique, and nub takes quadratic time. However, this works and it's probably more than good enough for parsing small to medium-sized files: ie most things written by hand rather than autogenerated.

Haskell, make single string from integer set?

I'd greatly appreciate if you could tell me how to make a single string from a range between two ints. Like [5..10] i would need to get a "5678910". And then I'd have to calculate how many (zeroes, ones ... nines) there are in a string.
For example: if i have a range from [1..10] i'd need to print out
1 2 1 1 1 1 1 1 1 1
For now i only have a function to search for a element in string.
`countOfElem elem list = length $ filter (\x -> x == elem) list`
But the part how to construct such a string is bugging me out, or maybe there is an easier way? Thank you.
I tried something like this, but it wouldn't work.
let intList = map (read::Int->String) [15..22]
I tried something like this, but it wouldn't work. let intList = map (read::Int->String) [15..22]
Well... the purpose of read is to parse strings to read-able values. Hence it has a type signature String -> a, which obviously doesn't unify with Int -> String. What you want here is the inverse1 of read, it's called show.
Indeed map show [15..22] gives almost the result you asked for – the numbers as decimal-encoded strings – but still each number as a seperate list element, i.e. type [String] while you want only String. Well, how about asking Hoogle? It gives the function you need as the fifth hit: concat.
If you want to get fancy you can then combine the map and concat stages: both the concatMap function and the >>= operator do that. The most compact way to achieve the result: [15..22]>>=show.
1show is only the right inverse of read, to be precise.

How to search a pattern from a file/String in Haskell

** old**
Suppose we have a pattern ex. "1101000111001110".
Now I have a pattern to be searched ex. "1101". I am new to Haskell world, I am trying it at my end. I am able to do it in c but need to do it in Haskell.
Given Pattern := "1101000111001110"
Pattern To Be Searched :- "110
Desired Output:-"Pattern Found"`
** New**
import Data.List (isInfixOf)
main = do x <- readFile "read.txt"
putStr x
isSubb :: [Char] -> [Char] -> Bool
isSubb sub str = isInfixOf sub str
This code reads a file named "read", which contains the following string 110100001101. Using isInfixOf you can check the pattern "1101" in the string and result will be True.
But the problem is i am not able to search "1101" in the string present in "read.txt".
I need to compare the "read.txt" string with the user provided string. i.e
one string is their in the file "read.txt"
and second string user will provid (user defined) and we will perform search and find whether user defined string is present in the string present in "read.txt"
Answer to new:
To achieve this, you have to use readLn:
sub <- readLn
readLn accepts input until a \n is encountered and <- binds the result to sub. Watch out that if the input should be a string you have to explicitly type the "s around your string.
Alternatively if you do not feel like typing the quotation marks every time, you can use getLine in place of readLn which has the type IO String which becomes String after being bound to sub
For further information on all functions included in the standard libraries of Haskell see Hoogle. Using Hoogle you can search functions by various criteria and will often find functions which suit your needs.
Answer to old:
Use the isInfixOf function from Data.List to search for the pattern:
import Data.List (isInfixOf)
isInfixOf "1101" "1101000111001110" -- outputs true
It returns true if the first sequence exists in the second and false otherwise.
To read a file and get its contents use readFile:
contents <- readFile "filename.txt"
You will get the whole file as one string, which you can now perform standard functions on.
Outputting "Pattern found" should be trivial then.

How to verify if some items are in a list?

I'm trying to mess about trying the haskell equivalent of the 'Scala One Liners' thing that has recently popped up on Reddit/Hacker News.
Here's what I've got so far (people could probably do them a lot better than me but these are my beginner level attempts)
https://gist.github.com/1005383
The one I'm stuck on is verifying if items are in a list. Basically the Scala example is this
val wordlist = List("scala", "akka", "play framework", "sbt", "typesafe")
val tweet = "This is an example tweet talking about scala and sbt."
(words.foldLeft(false)( _ || tweet.contains(_) ))
I'm a bit stumped how to do this in Haskell. I know you can do:
any (=="haskell") $ words "haskell is great!"
To verify if one of the words is present, but the Scala example asks if any of the words in the wordlist are present in the test string.
I can't seem to find a contains function or something similar to that. I know you could probably write a function to do it but that defeats the point of doing this task in one line.
Any help would be appreciated.
You can use the elem function from the Prelude which checks if an item is in a list. It is commonly used in infix form:
Prelude> "foo" `elem` ["foo", "bar", "baz"]
True
You can then use it in an operator section just like you did with ==:
Prelude> let wordList = ["scala", "akka", "play framework", "sbt", "types"]
Prelude> let tweet = "This is an example tweet talking about scala and sbt."
Prelude> any (`elem` wordList) $ words tweet
True
When you find yourself needing a function, but you don't know the name, try using Hoogle to search for a function by type.
Here you wanted something that checks if a thing of any type is in a list of things of the same type, i.e. something of a type like a -> [a] -> Bool (you also need an Eq constraint, but let's say you didn't know that). Typing this type signature into Hoogle gives you elem as the top result.
How about using Data.List.intersect?
import Data.List.intersect
not $ null $ intersect (words tweet) wordList
Although there are already good answers I thought it'd be nice to write something in the spirit of your original code using any. That way you get to see how to compose your own complex queries from simple reusable parts rather than using off-the-shelf parts like intersect and elem:
any (\x -> any (\y -> (y == x)) $ words "haskell is great!")
["scala", "is", "tolerable"]
With a little reordering you can sort of read it in English: is there any word x in the sentence such that there is any y in the list such that x == y? It's clear how to extend to more 'axes' than two, perform comparisons other than ==, and even mix it up with all.

Resources