I'm just starting out in Haskell and this is like the third thing I'm writing, so, naturally, I'm finding myself a little stumped.
I'm trying to write a bit of code that will take a string, delete the spaces, and capitalize each letter of that string.
For example, if I input "this is a test", I would like to get back something like: "thisIsATest"
import qualified Data.Char as Char
toCaps :: String -> String
toCaps [] = []
toCaps xs = filter(/=' ') xs
toCaps (_:xs) = map Char.toUpper xs
I think the method I'm using is wrong. With my code in this order, I am able to remove all the spaces using the filter function, but nothing becomes capitalize.
When I move the filter bit to the very end of the code, I am able to use the map Char.toUpper bit. When I map that function Char.toUpper, it just capitalizes everything "HISISATEST", for example.
I was trying to make use of an if function to say something similar to
if ' ' then map Char.toUpper xs else Char.toLower xs, but that didn't work out for me. I haven't utilized if in Haskell yet, and I don't think I'm doing it correctly. I also know using "xs" is wrong, but I'm not sure how to fix it.
Can anyone offer any pointers on this particular problem?
I think it might be better if you split the problem into smaller subproblems. First we can make a function that, for a given word will capitalize the first character. For camel case, we thus can implement this as:
import Data.Char(toUpper)
capWord :: String -> String
capWord "" = ""
capWord (c:cs) = toUpper c : cs
We can then use words to obtain the list of words:
toCaps :: String -> String
toCaps = go . words
where go [] = ""
go (w:ws) = concat (w : map capWord ws)
For example:
Prelude Data.Char> toCaps "this is a test"
"thisIsATest"
For Pascal case, we can make use of concatMap instead:
toCaps :: String -> String
toCaps = concatMap capWord . words
Inspired by this answer from Will Ness, here's a way to do it that avoids unnecessary Booleans and comparisons:
import qualified Data.Char as Char
toCaps :: String -> String
toCaps = flip (foldr go (const [])) id
where go ' ' acc _ = acc Char.toUpper
go x acc f = f x:acc id
Or more understandably, but perhaps slightly less efficient:
import qualified Data.Char as Char
toCaps :: String -> String
toCaps = go id
where go _ [] = []
go _ (' ':xs) = go Char.toUpper xs
go f (x :xs) = f x:go id xs
There are a number of ways of doing it, but if I were trying to keep it as close to how you've set up your example, I might do something like:
import Data.Char (toUpper)
toCaps :: String -> String
toCaps [] = [] -- base case
toCaps (' ':c:cs) = toUpper c : toCaps cs -- throws out the space and capitalizes next letter
toCaps (c:cs) = c : toCaps cs -- anything else is left as is
This is just using basic recursion, dealing with a character (element of the list) at a time, but if you wanted to use higher-order functions such as map or filter that work on the entire list, then you would probably want to compose them (the way that Willem suggested is one way) and in that case you could probably do without using recursion at all.
It should be noted that this solution is brittle in the sense that it assumes the input string does not contain leading, trailing, or multiple consecutive spaces.
Inspired by Joseph Sible 's answer, a coroutines solution:
import Data.Char
toCamelCase :: String -> String
toCamelCase [] = []
toCamelCase (' ': xs) = toPascalCase xs
toCamelCase (x : xs) = x : toCamelCase xs
toPascalCase :: String -> String
toPascalCase [] = []
toPascalCase (' ': xs) = toPascalCase xs
toPascalCase (x : xs) = toUpper x : toCamelCase xs
Be careful to not start the input string with a space, or you'll get the first word capitalized as well.
I want to create a series of possible equations based on a general specification:
test = ["12", "34=", "56=", "78"]
Each string (e.g. "12") represents a possible character at that location, in this case '1' or '2'.)
So possible equations from test would be "13=7" or "1=68".
I know the examples I give are not balanced but that's because I'm deliberately giving a simplified short string.
(I also know that I could use 'sequence' to search all possibilities but I want to be more intelligent so I need a different approach explained below.)
What I want is to try fixing each of the equals in turn and then removing all other equals in the equation. So I want:
[["12","=","56","78"],["12","34","=","78”]]
I've written this nested list comprehension:
(it needs: {-# LANGUAGE ParallelListComp #-} )
fixEquals :: [String] -> [[String]]
fixEquals re
= [
[
if index == outerIndex then equals else remain
| equals <- map (filter (== '=')) re
| remain <- map (filter (/= '=')) re
| index <- [1..]
]
| outerIndex <- [1..length re]
]
This produces:
[["","34","56","78"],["12","=","56","78"],["12","34","=","78"],["12","34","56","”]]
but I want to filter out any with empty lists within them. i.e. in this case, the first and last.
I can do:
countOfEmpty :: (Eq a) => [[a]] -> Int
countOfEmpty = length . filter (== [])
fixEqualsFiltered :: [String] -> [[String]]
fixEqualsFiltered re = filter (\x -> countOfEmpty x == 0) (fixEquals re)
so that "fixEqualsFiltered test" gives:
[["12","=","56","78"],["12","34","=","78”]]
which is what I want but it doesn’t seem elegant.
I can’t help thinking there’s another way to filter these out.
After all, it’s whenever "equals" is used in the if statement and is empty that we want to drop the equals so it seems a waste to build the list (e.g. ["","34","56","78”] and then ditch it.)
Any thoughts appreciated.
I don't know if this is any cleaner than your code, but it might be a bit more clear and maybe more efficient using a recursion:
fixEquals = init . f
f :: [String] -> [[String]]
f [] = [[]]
f (x:xs) | '=' `elem` x = ("=":removeEq xs) : map (removeEq [x] ++) (f xs)
| otherwise = map (x:) (f xs)
removeEq :: [String] -> [String]
removeEq = map (filter (/= '='))
The way it works is that, if there's an '=' in the current string, then it splits the return into two, if not just calls recursively. The init is needed as in the last element returned there's no equal in any string.
Finally, I believe you can probably find a better data structure to do what you need to achieve instead of using list of strings
Let
xs = [["","34","56","78"],["12","=","56","78"],["12","34","=","78"],["12","34","56",""]]
in
filter (not . any null) xs
will give
[["12","=","56","78"],["12","34","=","78"]]
If you want list comprehension then do
[x | x <- xs, and [not $ null y | y <- x]]
I think I'd probably do it this way. First, a preliminary that I've written so many times it's practically burned into my fingers by now:
zippers :: [a] -> [([a], a, [a])]
zippers = go [] where
go _ [] = []
go b (h:e) = (b,h,e):go (h:b) e
Probably running it once or twice in ghci will be a more clear explanation of what this does than any English writing I could do:
> zippers "abcd"
[("",'a',"bcd"),("a",'b',"cd"),("ba",'c',"d"),("cba",'d',"")]
In other words, it gives a way of selecting each element of a list in turn, giving the "leftovers" of what was before and after the selection point. Given that tool, here's our plan: we'll nondeterministically choose a String to serve as our equals sign, double-check that we've got an equals sign in the first place, and then clear out the equals from the others. So:
fixEquals ss = do
(prefix, s, suffix) <- zippers ss
guard ('=' `elem` s)
return (reverse (deleteEquals prefix) ++ ["="] ++ deleteEquals suffix)
deleteEquals = map (filter ('='/=))
Let's try it:
> fixEquals ["12", "34=", "56=", "78"]
[["12","=","56","78"],["12","34","=","78"]]
Perfect! But this is just a stepping-stone to actually generating the equations, right? It turns out to be not that hard to go all the way in one step, skipping this intermediate. Let's do that:
equations ss = do
(prefixes, s, suffixes) <- zippers ss
guard ('=' `elem` s)
prefix <- mapM (filter ('='/=)) (reverse prefixes)
suffix <- mapM (filter ('='/=)) suffixes
return (prefix ++ "=" ++ suffix)
And we can try it in ghci:
> equations ["12", "34=", "56=", "78"]
["1=57","1=58","1=67","1=68","2=57","2=58","2=67","2=68","13=7","13=8","14=7","14=8","23=7","23=8","24=7","24=8"]
The easiest waty to achieve what you want is to create all the combinations and to filter the ones that have a meaning:
Prelude> test = ["12", "34=", "56=", "78"]
Prelude> sequence test
["1357","1358","1367","1368","13=7","13=8","1457","1458","1467","1468","14=7","14=8","1=57","1=58","1=67","1=68","1==7","1==8","2357","2358","2367","2368","23=7","23=8","2457","2458","2467","2468","24=7","24=8"
Prelude> filter ((1==).length.filter('='==)) $ sequence test
["13=7","13=8","14=7","14=8","1=57","1=58","1=67","1=68","23=7","23=8","24=7","24=8","2=57","2=58","2=67","2=68"]
You pointed the drawback: imagine we have the followig list of strings: ["=", "=", "0123456789", "0123456789"]. We will generate 100 combinations and drop them all.
You can look at the combinations as a tree. For the ["12", "34"], you have:
/ \
1 2
/ \ / \
3 4 3 4
You can prune the tree: just ignore the subtrees when you have two = on the path.
Let's try to do it. First, a simple combinations function:
Prelude> :set +m
Prelude> let combinations :: [String] -> [String]
Prelude| combinations [] = [""]
Prelude| combinations (cs:ts) = [c:t | c<-cs, t<-combinations ts]
Prelude|
Prelude> combinations test
["1357","1358","1367","1368","13=7","13=8","1457","1458","1467","1468","14=7","14=8","1=57","1=58","1=67","1=68","1==7","1==8","2357","2358","2367","2368","23=7","23=8","2457","2458","2467","2468","24=7","24=8", ...]
Second, we need a variable to store the current number of = signs met:
if we find a second = sign, just drop the subtree
if we reach the end of a combination with no =, drop the combination
That is:
Prelude> let combinations' :: [String] -> Int -> [String]
Prelude| combinations' [] n= if n==1 then [""] else []
Prelude| combinations' (cs:ts) n = [c:t | c<-cs, let p = n+(fromEnum $ c=='='), p <= 1, t<-combinations' ts p]
Prelude|
Prelude> combinations' test 0
["13=7","13=8","14=7","14=8","1=57","1=58","1=67","1=68","23=7","23=8","24=7","24=8","2=57","2=58","2=67","2=68"]
We use p as the new number of = sign on the path: if p>1, drop the subtree.
If n is zero, we don't have any = sign in the path, drop the combination.
You may use the variable n to store more information, eg type of the last char (to avoid +* sequences).
I want to give each one a number from 1 to length(x:xs), like a book's page number. Unfortunately it only works backwards.
numberL :: [String] -> [String]
numberL [] = []
numberL (x:xs) = ([show (length(x:xs)) ++ ": " ++ x] ++ numberL (xs))
Also how do I remove any new line and tab from the text and replace it with the actual new line and tabulator?
There are multiple built-in Haskell functions in Prelude that are good to learn and use them. zip and zipWith are two of them, when you think about something to be done using two different lists into one result list:
[1..] will generate the list of indices for you, it's an infinite list
appendIndex :: String -> Int -> String
appendIndex s i = (show i) ++ " :" ++ s
indexThem :: [String] -> [String]
indexThem l = zipWith appendIndex l [1..]
if you wanted to use zip, which is more basic but a little more verbose:
appendIndex :: (String,Int) -> String
appendIndex (s,i) = (show i) ++ " :" ++ s
indexThem :: [String] -> [String]
indexThem l = fmap appendIndex $ zip l [1..]
-- if you dont know about Functors yet, `fmap` is the generic way of doing `map`
To get it right, it's important to understand why you're thinking wrong. Your recursion looks like this:
numberL (x:xs) = ... ++ numberL xs
So you calculate numberL xs and then put something in front of it. If numberL xs were correct, then then it would be numbered from 1 onwards, like: 1:..., 2:..., 3:.... So you could never build numberL (x:xs) from numberL xs just by adding new elements at the front. The whole numbering would be wrong. Instead you'd have to change the whole numbering of numberL xs.
The problem therefore is that it's not very useful to know numberL xs in order to calculate numberL (x:xs), due to the fact numberL always starts numbering from 1.
So lift that restriction. Build a function that numbers starting at n,
numberLFrom :: Int -> [String] -> [String]
numberLFrom n [] = ...
numberLFrom n (x:xs) = ...
Now the question you have to ask yourself is, in order to number (x:xs) starting at n you need to number xs starting at which number? And then how do you introduced the numbered x to that result?
I have two Haskell functions, both of which seem very similar to me. But the first one FAILS against infinite lists, and the second one SUCCEEDS against infinite lists. I have been trying for hours to nail down exactly why that is, but to no avail.
Both snippets are a re-implementation of the "words" function in Prelude. Both work fine against finite lists.
Here's the version that does NOT handle infinite lists:
myWords_FailsOnInfiniteList :: String -> [String]
myWords_FailsOnInfiniteList string = foldr step [] (dropWhile charIsSpace string)
where
step space ([]:xs) | charIsSpace space = []:xs
step space (x:xs) | charIsSpace space = []:x:xs
step space [] | charIsSpace space = []
step char (x:xs) = (char : x) : xs
step char [] = [[char]]
Here's the version that DOES handle infinite lists:
myWords_anotherReader :: String -> [String]
myWords_anotherReader xs = foldr step [""] xs
where
step x result | not . charIsSpace $ x = [x:(head result)]++tail result
| otherwise = []:result
Note: "charIsSpace" is merely a renaming of Char.isSpace.
The following interpreter session illustrates that the first one fails against an infinite list while the second one succeeds.
*Main> take 5 (myWords_FailsOnInfiniteList (cycle "why "))
*** Exception: stack overflow
*Main> take 5 (myWords_anotherReader (cycle "why "))
["why","why","why","why","why"]
EDIT: Thanks to the responses below, I believe I understand now. Here are my conclusions and the revised code:
Conclusions:
The biggest culprit in my first attempt were the 2 equations that started with "step space []" and "step char []". Matching the second parameter of the step function against [] is a no-no, because it forces the whole 2nd arg to be evaluated (but with a caveat to be explained below).
At one point, I had thought (++) might evaluate its right-hand argument later than cons would, somehow. So, I thought I might fix the problem by changing " = (char:x):xs" to "= [char : x] ++ xs". But that was incorrect.
At one point, I thought that pattern matching the second arg against (x:xs) would cause the function to fail against infinite lists. I was almost right about this, but not quite! Evaluating the second arg against (x:xs), as I do in a pattern match above, WILL cause some recursion. It will "turn the crank" until it hits a ":" (aka, "cons"). If that never happened, then my function would not succeed against an infinite list. However, in this particular case, everything is OK because my function will eventually encounter a space, at which point a "cons" will occur. And the evaluation triggered by matching against (x:xs) will stop right there, avoiding the infinite recursion. At that point, the "x" will be matched, but the xs will remain a thunk, so there's no problem. (Thanks to Ganesh for really helping me grasp that).
In general, you can mention the second arg all you want, as long as you don't force evaluation of it. If you've matched against x:xs, then you can mention xs all you want, as long as you don't force evaluation of it.
So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.
myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
where
step space acc | charIsSpace space = "":acc
step char (x:xs) = (char:x):xs
step _ [] = error "this should be impossible"
This correctly works against infinite lists. Note there's no head, tail or (++) operator in sight.
Now, for an important caveat:
When I first wrote the corrected code, I did not have the 3rd equation, which matches against "step _ []". As a result, I received the warning about non-exhaustive pattern matches. Obviously, it is a good idea to avoid that warning.
But I thought I was going to have a problem. I already mentioned above that it is not OK to pattern match the second arg against []. But I would have to do so in order to get rid of the warning.
However, when I added the "step _ []" equation, everything was fine! There was still no problem with infinite lists!. Why?
Because the 3rd equation in the corrected code IS NEVER REACHED!
In fact, consider the following BROKEN version. It is EXACTLY the SAME as the correct code, except that I have moved the pattern for empty list up above the other patterns:
myWords_brokenAgain :: String -> [String]
myWords_brokenAgain string = foldr step [""] (dropWhile charIsSpace string)
where
step _ [] = error "this should be impossible"
step space acc | charIsSpace space = "":acc
step char (x:xs) = (char:x):xs
We're back to stack overflow, because the first thing that happens when step is called is that the interpreter checks to see if equation number one is a match. To do so, it must see if the second arg is []. To do that, it must evaluate the second arg.
Moving the equation down BELOW the other equations ensures that the 3rd equation is never attempted, because either the first or the second pattern always matches. The 3rd equation is merely there to dispense with the non-exhaustive pattern warning.
This has been a great learning experience. Thanks to everyone for your help.
Others have pointed out the problem, which is that step always evaluates its second argument before producing any output at all, yet its second argument will ultimately depend on the result of another invocation of step when the foldr is applied to an infinite list.
It doesn't have to be written this way, but your second version is kind of ugly because it relies on the initial argument to step having a particular format and it's quite hard to see that the head/tail will never go wrong. (I'm not even 100% certain that they won't!)
What you should do is restructure the first version so it produces output without depending on the input list in at least some situations. In particular we can see that when the character is not a space, there's always at least one element in the output list. So delay the pattern-matching on the second argument until after producing that first element. The case where the character is a space will still be dependent on the list, but that's fine because the only way that case can infinitely recurse is if you pass in an infinite list of spaces, in which case not producing any output and going into a loop is the expected behaviour for words (what else could it do?)
Try expanding the expression by hand:
take 5 (myWords_FailsOnInfiniteList (cycle "why "))
take 5 (foldr step [] (dropWhile charIsSpace (cycle "why ")))
take 5 (foldr step [] (dropWhile charIsSpace ("why " ++ cycle "why ")))
take 5 (foldr step [] ("why " ++ cycle "why "))
take 5 (step 'w' (foldr step [] ("hy " ++ cycle "why ")))
take 5 (step 'w' (step 'h' (foldr step [] ("y " ++ cycle "why "))))
What's the next expansion? You should see that in order to pattern match for step, you need to know whether it's the empty list or not. In order to find that out, you have to evaluate it, at least a little bit. But that second term happens to be a foldr reduction by the very function you're pattern matching for. In other words, the step function cannot look at its arguments without calling itself, and so you have an infinite recursion.
Contrast that with an expansion of your second function:
myWords_anotherReader (cycle "why ")
foldr step [""] (cycle "why ")
foldr step [""] ("why " ++ cycle "why ")
step 'w' (foldr step [""] ("hy " ++ cycle "why ")
let result = foldr step [""] ("hy " ++ cycle "why ") in
['w':(head result)] ++ tail result
let result = step 'h' (foldr step [""] ("y " ++ cycle "why ") in
['w':(head result)] ++ tail result
You can probably see that this expansion will continue until a space is reached. Once a space is reached, "head result" will obtain a value, and you will have produced the first element of the answer.
I suspect that this second function will overflow for infinite strings that don't contain any spaces. Can you see why?
The second version does not actually evaluate result until after it has started producing part of its own answer. The first version evaluates result immediately by pattern matching on it.
The key with these infinite lists is that you have to produce something before you start demanding list elements so that the output can always "stay ahead" of the input.
(I feel like this explanation is not very clear, but it's the best I can do.)
The library function foldr has this implementation (or similar):
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f k (x:xs) = f x (foldr f k xs)
foldr _ k _ = k
The result of myWords_FailsOnInfiniteList depends on the result of foldr which depends on the result of step which depends on the result of the inner foldr which depends on ... and so on an infinite list, myWords_FailsOnInfiniteList will use up an infinite amount of space and time before producing its first word.
The step function in myWords_anotherReader does not require the result of the inner foldr until after it has produced the first letter of the first word. Unfortunately, as Apocalisp says, it uses O(length of first word) space before it produces the next word, because as the first word is being produced, the tail thunk keeps growing tail ([...] ++ tail ([...] ++ tail (...))).
In contrast, compare to
myWords :: String -> [String]
myWords = myWords' . dropWhile isSpace where
myWords' [] = []
myWords' string =
let (part1, part2) = break isSpace string
in part1 : myWords part2
using library functions which may be defined as
break :: (a -> Bool) -> [a] -> ([a], [a])
break p = span $ not . p
span :: (a -> Bool) -> [a] -> ([a], [a])
span p xs = (takeWhile p xs, dropWhile p xs)
takeWhile :: (a -> Bool) -> [a] -> [a]
takeWhile p (x:xs) | p x = x : takeWhile p xs
takeWhile _ _ = []
dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p (x:xs) | p x = dropWhile p xs
dropWhile _ xs = xs
Notice that producing the intermediate results is never held up by future computation, and only O(1) space is needed as each element of the result is made available for consumption.
Addendum
So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.
myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
where
step space acc | charIsSpace space = "":acc
step char (x:xs) = (char:x):xs
step _ [] = error "this should be impossible"
(Aside: You may not care, but the words "" == [] from the library, but your myWords "" = [""]. Similar issue with trailing spaces.)
Looks much-improved over myWords_anotherReader, and is pretty good for a foldr-based solution.
\n -> tail $ myWords $ replicate n 'a' ++ " b"
It's not possible to do better than O(n) time, but both myWords_anotherReader and myWords take O(n) space here. This may be inevitable given the use of foldr.
Worse,
\n -> head $ head $ myWords $ replicate n 'a' ++ " b"
myWords_anotherReader was O(1) but the new myWords is O(n), because pattern matching (x:xs) requires the further result.
You can work around this with
myWords :: String -> [String]
myWords = foldr step [""] . dropWhile isSpace
where
step space acc | isSpace space = "":acc
step char ~(x:xs) = (char:x):xs
The ~ introduces an "irrefutable pattern". Irrefutable patterns never fail and do not force immediate evaluation.