Words from string to list - haskell

I'm trying to write a Haskell function which would read a string and return a list with the words from the string saved in it.
Here's how I did it:
toWordList :: String -> [String]
toWordList = do
[ toLower x | x <- str ]
let var = removePunctuation(x)
return (words var)
But I get this error:
Test1.hs:13:17: error: parse error on input 'let'
|
13 | let var = removePunctuation(x)
| ^^^
I'm new to Haskell so I don't have the grasp over its syntax so thanks in advance for the help.

There's quite a few mistakes here, you should spend more time reading over some tutorials (learn you a Haskell, Real World Haskell). You're pretty close though, so I'll try to do a break-down here.
do is special - it doesn't switch Haskell into "imperative mode", it lets you write clearer code when using Monads - if you don't yet know what Monads are, stay away from do! Keywords like return also don't behave the same as in imperative languages. Try to approach Haskell with a completely fresh mind.
Also in Haskell, indentation is important - see this link for a good explanation. Essentially, you want all the lines in the same "block" to have the same indentation.
Okay, let's strip out the do and return keywords, and align the indentation. We'll also name the parameter to the function str - in your original code, you missed this bit out.
toWordList :: String -> [String]
toWordList str =
[toLower x | x <- str]
let var = removePunctuation(x)
words var
The syntax for let is let __ = __ in __. There's different notation when using do, but forget about that for now. We also don't name the result of the list comprehension, so let's do that:
toWordList str =
let lowered = [toLower x | x <- str] in
let var = removePunctuation lowered in
words var
And this works! We just needed to get some syntax right and avoid the monadic syntactic sugar of do/return.
It's possible (and easy) to make it nicer though. Those let blocks are kinda ugly, we can strip those away. We can also replace the list comprehension with map toLower, which is a bit more elegant and is equivalent to your comprehension:
toWordList str = words (removePunctuation (map toLower str))
Nice, that's down to a single line now! But all those brackets are also a bit of an eyesore, how about we use the $ function?
toWordList str = words $ removePunctuation $ map toLower str
Looking good. There's another improvement we can make, which is to convert this into point-free style, where we don't explicitly name our parameter - instead we express this function as the composition of other functions.
toWordList = words . removePunctuation . (map toLower)
And we're done! Hopefully the first two code snippets make it clearer how the Haskell syntax works, and the last few might show you some nice examples of how you can make fairly verbose code much much cleaner.

Related

Is there a (Template) Haskell library that would allow me to print/dump a few local bindings with their respective names?

For instance:
let x = 1 in putStrLn [dump|x, x+1|]
would print something like
x=1, (x+1)=2
And even if there isn't anything like this currently, would it be possible to write something similar?
TL;DR There is this package which contains a complete solution.
install it via cabal install dump
and/or
read the source code
Example usage:
{-# LANGUAGE QuasiQuotes #-}
import Debug.Dump
main = print [d|a, a+1, map (+a) [1..3]|]
where a = 2
which prints:
(a) = 2 (a+1) = 3 (map (+a) [1..3]) = [3,4,5]
by turnint this String
"a, a+1, map (+a) [1..3]"
into this expression
( "(a) = " ++ show (a) ++ "\t " ++
"(a+1) = " ++ show (a + 1) ++ "\t " ++
"(map (+a) [1..3]) = " ++ show (map (+ a) [1 .. 3])
)
Background
Basically, I found that there are two ways to solve this problem:
Exp -> String The bottleneck here is pretty-printing haskell source code from Exp and cumbersome syntax upon usage.
String -> Exp The bottleneck here is parsing haskell to Exp.
Exp -> String
I started out with what #kqr put together, and tried to write a parser to turn this
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
into this
["not x = False","x = True","x == True = True"]
But after trying for a day, my parsec-debugging-skills have proven insufficient to date, so instead I went with a simple regular expression:
simplify :: String -> String
simplify s = subRegex (mkRegex "_[0-9]+|([a-zA-Z]+\\.)+") s ""
For most cases, the output is greatly improved.
However, I suspect this to likely mistakenly remove things it shouldn't.
For example:
$(dump [|(elem 'a' "a.b.c", True)|])
Would likely return:
["elem 'a' \"c\" = True","True = True"]
But this could be solved with proper parsing.
Here is the version that works with the regex-aided simplification: https://github.com/Wizek/kqr-stackoverflow/blob/master/Th.hs
Here is a list of downsides / unresolved issues I've found with the Exp -> String solution:
As far as I know, not using Quasi Quotation requires cumbersome syntax upon usage, like: $(d [|(a, b)|]) -- as opposed to the more succinct [d|a, b|]. If you know a way to simplify this, please do tell!
As far as I know, [||] needs to contain fully valid Haskell, which pretty much necessitates the use of a tuple inside further exacerbating the syntactic situation. There is some upside to this too, however: at least we don't need to scratch our had where to split the expressions since GHC does that for us.
For some reason, the tuple only seemed to accept Booleans. Weird, I suspect this should be possible to fix somehow.
Pretty pretty-printing Exp is not very straight-forward. A more complete solution does require a parser after all.
Printing an AST scrubs the original formatting for a more uniform looks. I hoped to preserve the expressions letter-by-letter in the output.
The deal-breaker was the syntactic over-head. I knew I could get to a simpler solution like [d|a, a+1|] because I have seen that API provided in other packages. I was trying to remember where I saw that syntax. What is the name...?
String -> Exp
Quasi Quotation is the name, I remember!
I remembered seeing packages with heredocs and interpolated strings, like:
string = [qq|The quick {"brown"} $f {"jumps " ++ o} the $num ...|]
where f = "fox"; o = "over"; num = 3
Which, as far as I knew, during compile-time, turns into
string = "The quick " ++ "brown" ++ " " ++ $f ++ "jumps " ++ o ++ " the" ++ show num ++ " ..."
where f = "fox"; o = "over"; num = 3
And I thought to myself: if they can do it, I should be able to do it too!
A bit of digging in their source code revealed the QuasiQuoter type.
data QuasiQuoter = QuasiQuoter {quoteExp :: String -> Q Exp}
Bingo, this is what I want! Give me the source code as string! Ideally, I wouldn't mind returning string either, but maybe this will work. At this point I still know quite little about Q Exp.
After all, in theory, I would just need to split the string on commas, map over it, duplicate the elements so that first part stays string and the second part becomes Haskell source code, which is passed to show.
Turning this:
[d|a+1|]
into this:
"a+1" ++ " = " ++ show (a+1)
Sounds easy, right?
Well, it turns out that even though GHC most obviously is capable to parse haskell source code, it doesn't expose that function. Or not in any way we know of.
I find it strange that we need a third-party package (which thankfully there is at least one called haskell-src-meta) to parse haskell source code for meta programming. Looks to me such an obvious duplication of logic, and potential source of mismatch -- resulting in bugs.
Reluctantly, I started looking into it. After all, if it is good enough for the interpolated-string folks (those packaged did rely on haskell-src-meta) then maybe it will work okay for me too for the time being.
And alas, it does contain the desired function:
Language.Haskell.Meta.Parse.parseExp :: String -> Either String Exp
Language.Haskell.Meta.Parse
From this point it was rather straightforward, except for splitting on commas.
Right now, I do a very simple split on all commas, but that doesn't account for this case:
[d|(1, 2), 3|]
Which fails unfortunatelly. To handle this, I begun writing a parsec parser (again) which turned out to be more difficult than anticipated (again). At this point, I am open to suggestions. Maybe you know of a simple parser that handles the different edge-cases? If so, tell me in a comment, please! I plan on resolving this issue with or without parsec.
But for the most use-cases: it works.
Update at 2015-06-20
Version 0.2.1 and later correctly parses expressions even if they contain commas inside them. Meaning [d|(1, 2), 3|] and similar expressions are now supported.
You can
install it via cabal install dump
and/or
read the source code
Conclusion
During the last week I've learnt quite a bit of Template Haskell and QuasiQuotation, cabal sandboxes, publishing a package to hackage, building haddock docs and publishing them, and some things about Haskell too.
It's been fun.
And perhaps most importantly, I now am able to use this tool for debugging and development, the absence of which has been bugging me for some time. Peace at last.
Thank you #kqr, your engagement with my original question and attempt at solving it gave me enough spark and motivation to continue writing up a full solution.
I've actually almost solved the problem now. Not exactly what you imagined, but fairly close. Maybe someone else can use this as a basis for a better version. Either way, with
{-# LANGUAGE TemplateHaskell, LambdaCase #-}
import Language.Haskell.TH
dump :: ExpQ -> ExpQ
dump tuple =
listE . map dumpExpr . getElems =<< tuple
where
getElems = \case { TupE xs -> xs; _ -> error "not a tuple in splice!" }
dumpExpr exp = [| $(litE (stringL (pprint exp))) ++ " = " ++ show $(return exp)|]
you get the ability to do something like
λ> let x = True
λ> print $(dump [|(not x, x, x == True)|])
["GHC.Classes.not x_1627412787 = False","x_1627412787 = True","x_1627412787 GHC.Classes.== GHC.Types.True = True"]
which is almost what you wanted. As you see, it's a problem that the pprint function includes module prefixes and such, which makes the result... less than ideally readable. I don't yet know of a fix for that, but other than that I think it is fairly usable.
It's a bit syntactically heavy, but that is because it's using the regular [| quote syntax in Haskell. If one wanted to write their own quasiquoter, as you suggest, I'm pretty sure one would also have to re-implement parsing Haskell, which would suck a bit.

OCaml : need help for List.map function

I need to create a function that basically works like this :
insert_char("string" 'x') outputs "sxtxrxixnxg".
So here is my reasoning :
Create a list with every single character in the string :
let inserer_car(s, c) =
let l = ref [] in
for i = 0 to string.length(s) - 1 do
l := s.[i] :: !l
done;
Then, I want to use List.map to turn it into a list like ['s', 'x', 't', 'x' etc.].
However, I don't really know how to create my function to use with map. Any help would be appreciated!
I'm a beginner in programming and especially in ocaml! so feel free to assume I'm absolutely ignorant.
If you were using Core, you could write it like this:
open Core.Std
let insert_char s c =
String.to_list s
|> (fun l -> List.intersperse l c)
|> String.of_char_list
Or, equivalently:
let insert_char s c =
let chars = String.to_list s in
let interspersed_chars = List.intersperse chars c in
String.of_char_list interspersed_chars
This is just straightforward use of existing librariies. If you want the implementation of List.intersperse, you can find it here. It's quite simple.
A map function creates a copy of a structure with different contents. For lists, this means that List.map f list has the same length as list. So, this won't work for you. Your problem requires the full power of a fold.
(You could also solve the problem imperatively, but in my opinion the reason to study OCaml is to learn about functional programming.)
Let's say you're going to use List.fold_left. Then the call looks like this:
let result = List.fold_left myfun [] !l
Your function myfun has the type char list -> char -> char list. In essence, its first parameter is the result you've built so far and its second parameter is the next character of the input list !l. The result should be what you get when you add the new character to the list you have so far.
At the end you'll need to convert a list of characters back to a string.

Function definition with guards

Classic way to define Haskell functions is
f1 :: String -> Int
f1 ('-' : cs) -> f1 cs + 1
f1 _ = 0
I'm kinda unsatisfied writing function name at every line. Now I usually write in the following way, using pattern guards extension and consider it more readable and modification friendly:
f2 :: String -> Int
f2 s
| '-' : cs <- s = f2 cs + 1
| otherwise = 0
Do you think that second example is more readable, modifiable and elegant? What about generated code? (Haven't time to see desugared output yet, sorry!). What are cons? The only I see is extension usage.
Well, you could always write it like this:
f3 :: String -> Int
f3 s = case s of
('-' : cs) -> f3 cs + 1
_ -> 0
Which means the same thing as the f1 version. If the function has a lengthy or otherwise hard-to-read name, and you want to match against lots of patterns, this probably would be an improvement. For your example here I'd use the conventional syntax.
There's nothing wrong with your f2 version, as such, but it seems a slightly frivolous use of a syntactic GHC extension that's not common enough to assume everyone will be familiar with it. For personal code it's not a big deal, but I'd stick with the case expression for anything you expect other people to be reading.
I prefer writing function name when I am pattern matching on something as is shown in your case. I find it more readable.
I prefer using guards when I have some conditions on the function arguments, which helps avoiding if else, which I would have to use if I was to follow the first pattern.
So to answer your questions
Do you think that second example is more readable, modifiable and elegant?
No, I prefer the first one which is simple and readable. But more or less it depends on your personal taste.
What about generated code?
I dont think there will be any difference in the generated code. Both are just patternmatching.
What are cons?
Well patternguards are useful to patternmatch instead of using let or something more cleanly.
addLookup env var1 var2
| Just val1 <- lookup env var1
, Just val2 <- lookup env var2
= val1 + val2
Well the con is ofcourse you need to use an extension and also it is not Haskell98 (which you might not consider much of a con)
On the other hand for trivial pattern matching on function arguments I will just use the first method, which is simple and readable.

How to verify if some items are in a list?

I'm trying to mess about trying the haskell equivalent of the 'Scala One Liners' thing that has recently popped up on Reddit/Hacker News.
Here's what I've got so far (people could probably do them a lot better than me but these are my beginner level attempts)
https://gist.github.com/1005383
The one I'm stuck on is verifying if items are in a list. Basically the Scala example is this
val wordlist = List("scala", "akka", "play framework", "sbt", "typesafe")
val tweet = "This is an example tweet talking about scala and sbt."
(words.foldLeft(false)( _ || tweet.contains(_) ))
I'm a bit stumped how to do this in Haskell. I know you can do:
any (=="haskell") $ words "haskell is great!"
To verify if one of the words is present, but the Scala example asks if any of the words in the wordlist are present in the test string.
I can't seem to find a contains function or something similar to that. I know you could probably write a function to do it but that defeats the point of doing this task in one line.
Any help would be appreciated.
You can use the elem function from the Prelude which checks if an item is in a list. It is commonly used in infix form:
Prelude> "foo" `elem` ["foo", "bar", "baz"]
True
You can then use it in an operator section just like you did with ==:
Prelude> let wordList = ["scala", "akka", "play framework", "sbt", "types"]
Prelude> let tweet = "This is an example tweet talking about scala and sbt."
Prelude> any (`elem` wordList) $ words tweet
True
When you find yourself needing a function, but you don't know the name, try using Hoogle to search for a function by type.
Here you wanted something that checks if a thing of any type is in a list of things of the same type, i.e. something of a type like a -> [a] -> Bool (you also need an Eq constraint, but let's say you didn't know that). Typing this type signature into Hoogle gives you elem as the top result.
How about using Data.List.intersect?
import Data.List.intersect
not $ null $ intersect (words tweet) wordList
Although there are already good answers I thought it'd be nice to write something in the spirit of your original code using any. That way you get to see how to compose your own complex queries from simple reusable parts rather than using off-the-shelf parts like intersect and elem:
any (\x -> any (\y -> (y == x)) $ words "haskell is great!")
["scala", "is", "tolerable"]
With a little reordering you can sort of read it in English: is there any word x in the sentence such that there is any y in the list such that x == y? It's clear how to extend to more 'axes' than two, perform comparisons other than ==, and even mix it up with all.

Haskell: Delimit a string by chosen sub-strings and whitespace

Am still new to Haskell, so apologize if there is an obvious answer to this...
I would like to make a function that splits up the all following lists of strings i.e. [String]:
["int x = 1", "y := x + 123"]
["int x= 1", "y:= x+123"]
["int x=1", "y:=x+123"]
All into the same string of strings i.e. [[String]]:
[["int", "x", "=", "1"], ["y", ":=", "x", "+", "123"]]
You can use map words.lines for the first [String].
But I do not know any really neat ways to also take into account the others - where you would be using the various sub-strings "=", ":=", "+" etc. to break up the main string.
Thank you for taking the time to enlighten me on Haskell :-)
The Prelude comes with a little-known handy function called lex, which is a lexer for Haskell expressions. These match the form you need.
lex :: String -> [(String,String)]
What a weird type though! The list is there for interfacing with a standard type of parser, but I'm pretty sure lex always returns either 1 or 0 elements (0 indicating a parse failure). The tuple is (token-lexed, rest-of-input), so lex only pulls off one token. So a simple way to lex a whole string would be:
lexStr :: String -> [String]
lexStr "" = []
lexStr s =
case lex s of
[(tok,rest)] -> tok : lexStr rest
[] -> error "Failed lex"
To appease the pedants, this code is in terrible form. An explicit call to error instead of returning a reasonable error using Maybe, assuming lex only returns 1 or 0 elements, etc. The code that does this reliably is about the same length, but is significantly more abstract, so I spared your beginner eyes.
I would take a look at parsec and build a simple grammar for parsing your strings.
how about using words .)
words :: String -> [String]
and words wont care for whitespaces..
words "Hello World"
= words "Hello World"
= ["Hello", "World"]

Resources