I'm trying to write a Haskell function which would read a string and return a list with the words from the string saved in it.
Here's how I did it:
toWordList :: String -> [String]
toWordList = do
[ toLower x | x <- str ]
let var = removePunctuation(x)
return (words var)
But I get this error:
Test1.hs:13:17: error: parse error on input 'let'
|
13 | let var = removePunctuation(x)
| ^^^
I'm new to Haskell so I don't have the grasp over its syntax so thanks in advance for the help.
There's quite a few mistakes here, you should spend more time reading over some tutorials (learn you a Haskell, Real World Haskell). You're pretty close though, so I'll try to do a break-down here.
do is special - it doesn't switch Haskell into "imperative mode", it lets you write clearer code when using Monads - if you don't yet know what Monads are, stay away from do! Keywords like return also don't behave the same as in imperative languages. Try to approach Haskell with a completely fresh mind.
Also in Haskell, indentation is important - see this link for a good explanation. Essentially, you want all the lines in the same "block" to have the same indentation.
Okay, let's strip out the do and return keywords, and align the indentation. We'll also name the parameter to the function str - in your original code, you missed this bit out.
toWordList :: String -> [String]
toWordList str =
[toLower x | x <- str]
let var = removePunctuation(x)
words var
The syntax for let is let __ = __ in __. There's different notation when using do, but forget about that for now. We also don't name the result of the list comprehension, so let's do that:
toWordList str =
let lowered = [toLower x | x <- str] in
let var = removePunctuation lowered in
words var
And this works! We just needed to get some syntax right and avoid the monadic syntactic sugar of do/return.
It's possible (and easy) to make it nicer though. Those let blocks are kinda ugly, we can strip those away. We can also replace the list comprehension with map toLower, which is a bit more elegant and is equivalent to your comprehension:
toWordList str = words (removePunctuation (map toLower str))
Nice, that's down to a single line now! But all those brackets are also a bit of an eyesore, how about we use the $ function?
toWordList str = words $ removePunctuation $ map toLower str
Looking good. There's another improvement we can make, which is to convert this into point-free style, where we don't explicitly name our parameter - instead we express this function as the composition of other functions.
toWordList = words . removePunctuation . (map toLower)
And we're done! Hopefully the first two code snippets make it clearer how the Haskell syntax works, and the last few might show you some nice examples of how you can make fairly verbose code much much cleaner.
I am very bad at wording things, so please bear with me.
I am doing a problem that requires me to generate all possible numbers in the form of a lists of lists, in Haskell.
For example if I have x = 3 and y = 2, I have to generate a list of lists like this:
[[1,1,1], [1,2,1], [2,1,1], [2,2,1], [1,1,2], [1,2,2], [2,1,2], [2,2,2]]
x and y are passed into the function and it has to work with any nonzero positive integers x and y.
I am completely lost and have no idea how to even begin.
For anyone kind enough to help me, please try to keep any math-heavy explanations as easy to understand as possible. I am really not good at math.
Assuming that this is homework, I'll give you the part of the answer, and show you how I think through this sort of problem. It's helpful to experiment in GHCi, and build up the pieces we need. One thing we need is to be able to generate a list of numbers from 1 through y. Suppose y is 7. Then:
λ> [1..7]
[1,2,3,4,5,6,7]
But as you'll see in a moment, what we really need is not a simple list, but a list of lists that we can build on. Like this:
λ> map (:[]) [1..7]
[[1],[2],[3],[4],[5],[6],[7]]
This basically says to take each element in the array, and prepend it to the empty list []. So now we can write a function to do this for us.
makeListOfLists y = map (:[]) [1..y]
Next, we need a way to prepend a new element to every element in a list of lists. Something like this:
λ> map (99:) [[1],[2],[3],[4],[5],[6],[7]]
[[99,1],[99,2],[99,3],[99,4],[99,5],[99,6],[99,7]]
(I used 99 here instead of, say, 1, so that you can easily see where the numbers come from.) So we could write a function to do that:
prepend x yss = map (x:) yss
Ultimately, we want to be able to take a list and a list of lists, and invoke prepend on every element in the list to every element in the list of lists. We can do that using the map function again. But as it turns out, it will be a little easier to do that if we switch the order of the arguments to prepend, like this:
prepend2 yss x = map (x:) yss
Then we can do something like this:
λ> map (prepend2 [[1],[2],[3],[4],[5],[6],[7]]) [97,98,99]
[[[97,1],[97,2],[97,3],[97,4],[97,5],[97,6],[97,7]],[[98,1],[98,2],[98,3],[98,4],[98,5],[98,6],[98,7]],[[99,1],[99,2],[99,3],[99,4],[99,5],[99,6],[99,7]]]
So now we can write that function:
supermap xs yss = map (prepend2 yss) xs
Using your example, if x=2 and y=3, then the answer we need is:
λ> let yss = makeListOfLists 3
λ> supermap [1..3] yss
[[[1,1],[1,2],[1,3]],[[2,1],[2,2],[2,3]],[[3,1],[3,2],[3,3]]]
(If that was all we needed, we could have done this more easily using a list comprehension. But since we need to be able to do this for an arbitrary x, a list comprehension won't work.)
Hopefully you can take it from here, and extend it to arbitrary x.
For the specific x, as already mentioned, the list comprehension would do the trick, assuming that x equals 3, one would write the following:
generate y = [[a,b,c] | a<-[1..y], b<-[1..y], c <-[1..y]]
But life gets much more complicated when x is not predetermined. I don't have much experience of programming in Haskell, I'm not acquainted with library functions and my approach is far from being the most efficient solution, so don't judge it too harshly.
My solution consists of two functions:
strip [] = []
strip (h:t) = h ++ strip t
populate y 2 = strip( map (\a-> map (:a:[]) [1..y]) [1..y])
populate y x = strip( map (\a-> map (:a) [1..y]) ( populate y ( x - 1) ))
strip is defined for the nested lists. By merging the list-items it reduces the hierarchy so to speak. For example calling
strip [[1],[2],[3]]
generates the output:
[1,2,3]
populate is the tricky one.
On the last step of the recursion, when the second argument equals to 2, the function maps each item of [1..y] with every element of the same list into a new list. For example
map (\a-> map (:a:[]) [1..2]) [1..2])
generates the output:
[[[1,1],[2,1]],[[1,2],[2,2]]]
and the strip function turns it into:
[[1,1],[2,1],[1,2],[2,2]]
As for the initial step of the recursion, when x is more than 2, populate does almost the same thing except this time it maps the items of the list with the list generated by the recursive call. And Finally:
populate 2 3
gives us the desired result:
[[1,1,1],[2,1,1],[1,2,1],[2,2,1],[1,1,2],[2,1,2],[1,2,2],[2,2,2]]
As I mentioned above, this approach is neither the most efficient nor the most readable one, but I think it solves the problem. In fact, theoritically the only way of solving this without the heavy usage of recursion would be building the string with list comprehension statement in it and than compiling that string dynamically, which, according to my short experience, as a programmer, is never a good solution.
Consider the following problem: given a list of length three of tuples (String,Int), is there a pair of elements having the same "Int" part? (For example, [("bob",5),("gertrude",3),("al",5)] contains such a pair, but [("bob",5),("gertrude",3),("al",1)] does not.)
This is how I would implement such a function:
import Data.List (sortBy)
import Data.Function (on)
hasPair::[(String,Int)]->Bool
hasPair = napkin . sortBy (compare `on` snd)
where napkin [(_, a),(_, b),(_, c)] | a == b = True
| b == c = True
| otherwise = False
I've used pattern matching to bind names to the "Int" part of the tuples, but I want to sort first (in order to group like members), so I've put the pattern-matching function inside a where clause. But this brings me to my question: what's a good strategy for picking names for functions that live inside where clauses? I want to be able to think of such names quickly. For this example, "hasPair" seems like a good choice, but it's already taken! I find that pattern comes up a lot - the natural-seeming name for a helper function is already taken by the outer function that calls it. So I have, at times, called such helper functions things like "op", "foo", and even "helper" - here I have chosen "napkin" to emphasize its use-it-once, throw-it-away nature.
So, dear Stackoverflow readers, what would you have called "napkin"? And more importantly, how do you approach this issue in general?
General rules for locally-scoped variable naming.
f , k, g, h for super simple local, semi-anonymous things
go for (tail) recursive helpers (precedent)
n , m, i, j for length and size and other numeric values
v for results of map lookups and other dictionary types
s and t for strings.
a:as and x:xs and y:ys for lists.
(a,b,c,_) for tuple fields.
These generally only apply for arguments to HOFs. For your case, I'd go with something like k or eq3.
Use apostrophes sparingly, for derived values.
I tend to call boolean valued functions p for predicate. pred, unfortunately, is already taken.
In cases like this, where the inner function is basically the same as the outer function, but with different preconditions (requiring that the list is sorted), I sometimes use the same name with a prime, e.g. hasPairs'.
However, in this case, I would rather try to break down the problem into parts that are useful by themselves at the top level. That usually also makes naming them easier.
hasPair :: [(String, Int)] -> Bool
hasPair = hasDuplicate . map snd
hasDuplicate :: Ord a => [a] -> Bool
hasDuplicate = not . isStrictlySorted . sort
isStrictlySorted :: Ord a => [a] -> Bool
isStrictlySorted xs = and $ zipWith (<) xs (tail xs)
My strategy follows Don's suggestions fairly closely:
If there is an obvious name for it, use that.
Use go if it is the "worker" or otherwise very similar in purpose to the original function.
Follow personal conventions based on context, e.g. step and start for args to a fold.
If all else fails, just go with a generic name, like f
There are two techniques that I personally avoid. One is using the apostrophe version of the original function, e.g. hasPair' in the where clause of hasPair. It's too easy to accidentally write one when you meant the other; I prefer to use go in such cases. But this isn't a huge deal as long as the functions have different types. The other is using names that might connote something, but not anything that has to do with what the function actually does. napkin would fall into this category. When you revisit this code, this naming choice will probably baffle you, as you will have forgotten the original reason that you named it napkin. (Because napkins have 4 corners? Because they are easily folded? Because they clean up messes? They're found at restaurants?) Other offenders are things like bob and myCoolFunc.
If you have given a function a name that is more descriptive than go or h, then you should be able to look at either the context in which it is used, or the body of the function, and in both situations get a pretty good idea of why that name was chosen. This is where my point #3 comes in: personal conventions. Much of Don's advice applies. If you are using Haskell in a collaborative situation, then coordinate with your team and decide on certain conventions for common situations.
The problem is quite simple: I have to replace all occurences of "fooo" and all its substrings with "xyz". In Java, for example, I will do it like this:
someString.replaceAll( "fooo|foo|fo", "xyz" )
and it will do the trick. But in Haskell I've found no efficient way to work with regex. First of all, I've read this: http://www.haskell.org/haskellwiki/Regular_expressions
The only library which actually has replace function is regex-posix, but its considered "very slow" in performance. And this fact is not acceptable. Also I've found that this replace function for any reasons doesn't respect the order of patterns given, so I've got output like this:
>replace "boo fooo boo" "xyz"
"boo xyzoo boo"
Other backends don't imply such functionality.
So I decided to write simple workaround:
replaceFoo input =
helper input []
where
helper ('f':'o':'o':'o':xs) ys = helper xs ("zyx" ++ ys)
helper ('f':'o':'o':xs) ys = helper xs ("zyx" ++ ys)
helper ('f':'o':xs) ys = helper xs ("zyx" ++ ys)
helper (x:xs) ys = helper xs (x:ys)
helper [] ys = reverse ys
Whilst I don't find this function nice, it works well and fast. But for now I met the necessity to add more words in this replacor, and I don't like the idea to extend helper patterns anymore (I need to say that I actually have 4 words in it in real app and that's odd).
I'll be happy if someone help me with fast solution.
cebewee, thanks for the Data.String.Utils. But I fear this approach is quite slow if there are many words to replace ("fooo" to "xyz", "foo" to "xyz", "fo" to "xyz", "bar" to "quux" and so on), because to get that to work I will need to foldr (\str (from,to) -> replace from to str) input pairs or something like that and it will take O(n*n). More than that, it may have unexpected result of replacing substring of result of previous replacement.
There is Data.String.Utils.replace in the MissingH package. If you only need plain substring replace (and not regular expressions), this might be what you need.
The regex-xmlschema package has a sed function that might be what you're looking for:
http://hackage.haskell.org/package/regex-xmlschema-0.1.3
See particularly:
http://hackage.haskell.org/packages/archive/regex-xmlschema/0.1.3/doc/html/Text-Regex-XMLSchema-String.html#v:sed
There was a discussion of options for string rewriting on Haskell-Cafe last year:
http://www.haskell.org/pipermail/haskell-cafe/2010-May/077943.html
The replace-megaparsec package allows you to
search for pattern matches and then edit the found matches. Here is a solution using
Replace.Megaparsec.streamEdit.
>>> import Replace.Megaparsec
>>> import Text.Megaparsec.Char
>>> streamEdit (chunk "fooo" <|> chunk "foo" <|> chunk "fo") (const "xyz") "boo fooo boo"
"boo xyz boo"
Am still new to Haskell, so apologize if there is an obvious answer to this...
I would like to make a function that splits up the all following lists of strings i.e. [String]:
["int x = 1", "y := x + 123"]
["int x= 1", "y:= x+123"]
["int x=1", "y:=x+123"]
All into the same string of strings i.e. [[String]]:
[["int", "x", "=", "1"], ["y", ":=", "x", "+", "123"]]
You can use map words.lines for the first [String].
But I do not know any really neat ways to also take into account the others - where you would be using the various sub-strings "=", ":=", "+" etc. to break up the main string.
Thank you for taking the time to enlighten me on Haskell :-)
The Prelude comes with a little-known handy function called lex, which is a lexer for Haskell expressions. These match the form you need.
lex :: String -> [(String,String)]
What a weird type though! The list is there for interfacing with a standard type of parser, but I'm pretty sure lex always returns either 1 or 0 elements (0 indicating a parse failure). The tuple is (token-lexed, rest-of-input), so lex only pulls off one token. So a simple way to lex a whole string would be:
lexStr :: String -> [String]
lexStr "" = []
lexStr s =
case lex s of
[(tok,rest)] -> tok : lexStr rest
[] -> error "Failed lex"
To appease the pedants, this code is in terrible form. An explicit call to error instead of returning a reasonable error using Maybe, assuming lex only returns 1 or 0 elements, etc. The code that does this reliably is about the same length, but is significantly more abstract, so I spared your beginner eyes.
I would take a look at parsec and build a simple grammar for parsing your strings.
how about using words .)
words :: String -> [String]
and words wont care for whitespaces..
words "Hello World"
= words "Hello World"
= ["Hello", "World"]