How do I remove the first space of a string in Haskell?
For example:
removeSpace " hello" = "hello"
removeSpace " hello" = " hello"
removeSpace "hello" = "hello"
Here are multiple remove-space options, to show of a few functions and ways of doing things.
To take multiple spaces, you can do
removeSpaces = dropWhile (==' ')
This means the same as removeSpaces xs = dropWhile (==' ') xs, but uses partial application (and so does (==' ') in essence).
or for more general removal,
import Data.Char
removeWhitespace = dropWhile isSpace
If you're really sure you just want to take one space (and you certainly seem to be), then pattern matching is clearest:
removeASpace (' ':xs) = xs -- if it starts with a space, miss that out.
removeASpace xs = xs -- otherwise just leave the string alone
This works because in haskell, String = [Char] and (x:xs) means the list that starts with x and carries on with the list xs.
To remove one whitespace character, we can use function guards (if statements with very light syntax, if you've not met them):
removeAWhitespace "" = "" -- base case of empty string
removeAWhitespace (x:xs) | isSpace x = xs -- if it's whitespace, omit it
| otherwise = x:xs -- if it's not keep it.
Simply use pattern matching:
removeSpace (' ':xs) = xs
removeSpace xs = xs
In Haskell, strings are simply list of characters, i.e., the Prelude defines
type String = [Char]
Furthermore, there are about three ways to write a function:
Completely roll it yourself using the two most fundamental tools you have at your disposal: pattern matching and recursion;
Cleverly combine some already written functions; and, of course
A mix of these.
If you are new to Haskell and to functional programming, I recommend writing most of your functions using the first method and then gradually shift toward using more and more predefined functions.
For your problem—removing the first space character (' ') in a string—pattern matching and recursion actually make a lot of sense. As said, strings are just lists of characters, so we will end up with nothing but a simple list traversal.
Let us first write a signature for your function:
removeSpace :: [Char] -> [Char]
(I have written [Char] rather than String to make it explicit that we are performing a list traversal here.)
Pattern matching against a list, we need to consider two cases: the list being empty ([]) and the list consisting of a head element followed by a tail (c : cs).
Dealing with the empty list is, as always, simple: there are no characters left, so there is nothing to remove anymore and we simply return the empty list.
removeSpace [] = []
Then the situation in which we have a head element (a character) and a tail list. Here we need to distinguish two cases again: the case in which the head character is a space and the case in which it is any other character.
If the head character is a space, it will be the first space that we encounter and we need to remove it. As we only have to remove the first space, we can return the remainder of the list (i.e., the tail) without further processing:
removeSpace (' ' : cs) = cs
What remains is to deal with the case in which the head character is not a space. Then we need to keep it in the returned list and, moreover, we need to keep seeking for the first space in the remainder of the list; that is, we need to recursively apply our function to the tail:
removeSpace (c : cs) = c : removeSpace cs
And that's all. The complete definition of our function now reads
removeSpace :: [Char] -> [Char]
removeSpace [] = []
removeSpace (' ' : cs) = cs
removeSpace (c : cs) = c : removeSpace cs
This is arguably as clear and concise a definition as any clever combining of predefined functions would have given you.
To wrap up, let us test our function:
> removeSpace " hello"
"hello"
> removeSpace " hello"
" hello"
> removeSpace "hello"
"hello"
If you really want construct your function out of predefined functions, here is one alternative definition of removeSpace that will do the trick:
removeSpace :: [Char] -> [Char]
removeSpace = uncurry (flip (flip (++) . drop 1)) . break (== ' ')
(You can see why I prefer the one using explicit pattern matching and recursion. ;-))
Note: I have assumed that your objective is indeed to remove the first space in a string, no matter where that first space appears. In the examples you have given, the first space is always the first character in the string. If that's always the case, i.e., if you are only after dropping a leading space, you can leave out the recursion and simply write
removeSpace :: [Char] -> [Char]
removeSpace [] = []
removeSpace (' ' : cs) = cs
removeSpace (c : cs) = c : cs
or, combining the first and last cases,
removeSpace :: [Char] -> [Char]
removeSpace (' ' : cs) = cs
removeSpace cs = cs
or, using predefined functions,
removeSpace :: [Char] -> [Char]
removeSpace = uncurry ((++) . drop 1) . span (== ' ')
To remove the first space anywhere in a string:
removeSpace :: String -> String
removeSpace = (\(xs,ys) -> xs ++ drop 1 ys) . span (/=' ')
Where span grabs characters until it finds a space or reaches the end of the string.
It then splits the results and puts them in a tuple that we take and combine, skipping the first character in the second list (the space). Additionally we assert that the remainder is not null (an empty list) - if it is, we can't get the tail as an empty list can't have a tail can it? So if it is, we just return an empty list.
Related
So basically I want to split my string with two conditions , when have a empty space or a diferent letter from the next one.
An example:
if I have this string ,"AAA ADDD DD", I want to split to this, ["AAA","A","DDD","DD"]
So I made this code:
sliceIt :: String -> [String]
sliceIt xs = words xs
But it only splits the inicial string when an empty space exists.
How can I also split when a caracter is next to a diferent one?
Can this problem be solve easier with recursion?
So you want to split by words and then group equal elements in each split. You have the functions for doing so,
import Data.List
sliceIt :: String -> [String]
sliceIt s = concatMap group $ words s
sliceItPointFree = concatMap group . words -- Point free notation. Same but cooler
split :: String -> [String]
split [] = []
split (' ':xs) = split xs
split (x:xs) = (takeWhile (== x) (x:xs)) : (split $ dropWhile (== x) (x:xs))
So this is a recursive definition where there are 2 cases:
If head is a space then ignore it.
Otherwise, take as many of the same characters as you can, then call the function on the remaining part of the string.
The question is: Write a function that will delete leading white
space from a string. Example: cutWhitespace [" x","y"," z"] Expected answer: ["x","y","z"]
Heres what I have:
cutWhitespace (x:xs) = filter (\xs -> (xs /=' ')) x:xs
This returns ["x", " y"," z"] when the input is [" x"," y", " z"]. Why is it ignoring the space in the second and third string and how do I fix it?
We are allowed to use higher-order functions which is why I implemented filter.
The reason the OP cutWhitespace function only works on the first string, is that due to operator precedence, it's actually this function:
cutWhitespace (x:xs) = (filter (\xs -> (xs /=' ')) x) : xs
Here, I've put brackets around most of the body to make it clear how it evaluates. The filter is only applied on x, and x is the first element of the input list; in the example input " x".
If you filter " x" as given, you get "x":
Prelude> filter (\xs -> (xs /=' ')) " x"
"x"
The last thing cutWhitespace does, then, is to take the rest of the list ([" y", " z"]) and cons it on "x", so that it returns ["x"," y"," z"].
In order to address the problem, you could write the function with the realisation that a list of strings is a nested list of characters, i.e. [[Char]].
As a word of warning, pattern-matching on (x:xs) without also matching on [] is dangerous, as it'll fail on empty lists.
Instead of writing a custom function that checks if a character is whitespace, I would advice to use isSpace :: Char -> Bool. This function does not only returns True for a space (' '), but for a new line ('\n'), a carriage return ('\r'), a tab ('\t'), a vertical tab ('\v') and form feed ('\f') as well. Usually it is better to work with such functions since the odds of forgetting certain cases is lower.
We can thus remove the spacing of a single string with:
dropWhile isSpace
Where we thus dropWhile in such way that all chacters where isSpace.
We can then perform a mapping with this filter to filter the spaces out of all the strings, like:
import Data.Char(isSpace)
cutWhitespace = map (dropWhile isSpace)
The question you asked, on how to delete leading whitespace from a string, you can do by simply doing dropWhile on a string:
deleteLeadingWhitespace = dropWhile (\c -> c == ' ')
though you should be more clever if you consider other things "whitespace". You could use the "isSpace" function defined in Data.Char for example.
From your sample data, it looks like you are really trying to do this for a list of strings, in which case you can map the dropWhile over your array:
map deleteLeadingWhitespace
The filter approach you are taking is a little bit dangerous, because even if you had it doing what you think it should, it would be deleting all the spaces, not just the leading ones.
I have just started programming with Haskell and would like to do a String transformation.
I have an arbitrary String e.g.
" abcd \n dad "
I would like to remove the whitespace characters on the left and on the right. And I would like to eliminate multiple whitespaces as well as escape sequcences " \n " -> " "
So the String above would look like this
"abcd dad"
I have already written a function that trims the String and removes the whitespace characters (I'm removing the character if isSpace is true):
trim :: [Char] -> [Char]
trim x = dropWhileEnd isSpace (dropWhile isSpace x)
Now my idea is to do a pattern matching on the input String. But how do I apply the trim function directly to the input? So at first I would like to trim the String at both ends and then apply a pattern matching. So the only thing I would have to do is comparing two characters and removing one if both are whitespace characters
--How do I apply trim directly to the input
s :: [Char] -> [Char]
s [x] = [x]
s(x:xx) = ...
Note: Efficiency is not important. I would like to learn the concepts of pattern matching and understand how Haskell works.
Cheers
trim = unwords . words
Examine the source of words in the Prelude.
If you want to pattern-match on the output of trim, you have to call trim, of course! For example, if you want cases for lists of length 0, 1, and longer, you could use
s xs = case trim xs of
[] -> ...
[x] -> ...
x:x':xs -> ...
Your first pattern matches a single character and returns it. Surely this is not what you want - it could be whitespace. Your first match should be the empty list.
If you were only removing space chars, you could do something like this:
trim :: [Char] -> [Char]
trim [] = []
trim (' ':xs) = trim xs
...
You should be able to see that this removes all leading spaces. At this point, either the string is empty (and matches the first pattern) or it falls through to... leaving that up to you.
If you want to remove all whitespace, you need a list or set of those characters. That might look like this:
trim :: [Char] -> [Char]
trim = let whitespace = [' ', '\t\, `\v'] -- There are more than this, of course
in t
where
t [] = []
t (x:xs) | elem x whitespace = t xs
| otherwise = ...
Again, this has shown how to match the beginning part of the string. Leave it up to you to think about getting to the end.
You can also do pattern matching in a nested function:
s str = removeInnerSpaces (trim str)
where
removeInnerSpaces [] = []
removeInnerSpaces (x:xs) = ...
Here removeInnerSpaces is a nested function, local to s.
I have a list of strings that looks like this:
xs = ["xabbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "ibbatb"]
I would like to find only strings in the list which have and vocel followed by two b's followed by any character followed by a vowel. How are simple matches like this done in Haskell. Is there a better solution that regular expressions? Can anyone help me with an example? Thanks.
You could just use the classic filter function in conjunction with any regexp library. Your pattern is simple enough that this would work with any regexp library :
filter (=~ "bb.[aeiuy]") xs
The confusing part of regexps in Haskell is that there is a very powerful generic API (in regex-base) to use them in the same way for all the specific libraries and the multiple result type you could wish for (Bool, String, Int...). For basic usages it should mostly work as you mean (tm). For your specific need, regex-posix should be sufficient (and come with the haskell platform so no need to install it normally). So don't forget to import it :
import Text.Regex.Posix
This tutorial should show you the basics of the regex API if you have other needs, it is a bit out-dated now but the fundamentals remains the same, only details of regex-base have changed.
One approach would be to build a small pattern-matching language and to embed it in Haskell.
In your example, a pattern is basically a list of character specifications. Let's define a type of abstract characters the values of which will serve as such specifications,
data AbsChar = Exactly Char | Vowel | Any
together with an "interpreter" that tells us whether a character matches a specification:
(=?) :: AbsChar -> Char -> Bool
Exactly c' =? c = c == c'
Vowel =? c = c `elem` "aeiou"
Any =? c = True
For example, Vowel =? 'x' will produce False, while Vowel =? 'a' will produce True.
Then, indeed, a pattern is just a list of abstract characters:
type Pattern = [AbsChar]
Next, we write a function that tests whether the prefix of a string matches a given pattern:
matchesPrefix :: Pattern -> String -> Bool
matchesPrefix [] _ = True
matchesPrefix (a : as) (c : cs) = a =? c && matchesPrefix as cs
matchesPrefix _ _ = False
For example:
> matchesPrefix [Vowel, Exactly 'v'] "eva"
True
> matchesPrefix [Vowel, Exactly 'v'] "era"
False
As we do not want to restrict ourselves to matching prefixes, but rather match anywhere within a word, our next function matches the prefixes of every end segment of a string:
containsMatch :: Pattern -> String -> Bool
containsMatch pat = any (matchesPrefix pat) . tails
It uses the function tails which can be found in the module Data.List, but which we can, to make this explanation self-contained, easily define ourselves as well:
tails :: [a] -> [[a]]
tails [] = [[]]
tails l#(_ : xs) = l : tails xs
For example:
> tails "xabbaua"
["xabbaua","abbaua","bbaua","baua","aua","ua","a",""]
Now, finally, the function you were looking for, that selects all strings from a list that contain a matching segment, is written simply as:
select :: Pattern -> [String] -> [String]
select = filter . containsMatch
Let's test it on your example:
> let pat = [Vowel, Exactly 'b', Exactly 'b', Any, Vowel]
> select pat ["xabbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "ibbatb"]
["xabbaua"]
Well, you can try this function, although this may not be a best method:
elem' :: String -> String -> Bool
elem' p xs = any (p==) $ map (take $ length p) $ tails xs
Usage:
filter (elem' "bb") ["xxbbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "bbbaab"]
or
bbFilter = filter (elem' "bb")
Well if you're absolutely opposed to doing it with Regexs you could do it with just pattern matching and recursion, although it is ugly.
xs = ["xabbaua", "bbbaacv", "ggfeehhaa", "uyyttaccaa", "ibbatb"]
vowel = "aeiou"
filter' strs = filter matches strs
matches [] = False
matches str#(x:'b':'b':_:y:xs)
| x `elem` vowel && y `elem` vowel = True
| otherwise = matches $ tail str
matches (x:xs) = matches xs
Calling filter' xs will return ["xabbaua"] which I believe is the required result.
I have two Haskell functions, both of which seem very similar to me. But the first one FAILS against infinite lists, and the second one SUCCEEDS against infinite lists. I have been trying for hours to nail down exactly why that is, but to no avail.
Both snippets are a re-implementation of the "words" function in Prelude. Both work fine against finite lists.
Here's the version that does NOT handle infinite lists:
myWords_FailsOnInfiniteList :: String -> [String]
myWords_FailsOnInfiniteList string = foldr step [] (dropWhile charIsSpace string)
where
step space ([]:xs) | charIsSpace space = []:xs
step space (x:xs) | charIsSpace space = []:x:xs
step space [] | charIsSpace space = []
step char (x:xs) = (char : x) : xs
step char [] = [[char]]
Here's the version that DOES handle infinite lists:
myWords_anotherReader :: String -> [String]
myWords_anotherReader xs = foldr step [""] xs
where
step x result | not . charIsSpace $ x = [x:(head result)]++tail result
| otherwise = []:result
Note: "charIsSpace" is merely a renaming of Char.isSpace.
The following interpreter session illustrates that the first one fails against an infinite list while the second one succeeds.
*Main> take 5 (myWords_FailsOnInfiniteList (cycle "why "))
*** Exception: stack overflow
*Main> take 5 (myWords_anotherReader (cycle "why "))
["why","why","why","why","why"]
EDIT: Thanks to the responses below, I believe I understand now. Here are my conclusions and the revised code:
Conclusions:
The biggest culprit in my first attempt were the 2 equations that started with "step space []" and "step char []". Matching the second parameter of the step function against [] is a no-no, because it forces the whole 2nd arg to be evaluated (but with a caveat to be explained below).
At one point, I had thought (++) might evaluate its right-hand argument later than cons would, somehow. So, I thought I might fix the problem by changing " = (char:x):xs" to "= [char : x] ++ xs". But that was incorrect.
At one point, I thought that pattern matching the second arg against (x:xs) would cause the function to fail against infinite lists. I was almost right about this, but not quite! Evaluating the second arg against (x:xs), as I do in a pattern match above, WILL cause some recursion. It will "turn the crank" until it hits a ":" (aka, "cons"). If that never happened, then my function would not succeed against an infinite list. However, in this particular case, everything is OK because my function will eventually encounter a space, at which point a "cons" will occur. And the evaluation triggered by matching against (x:xs) will stop right there, avoiding the infinite recursion. At that point, the "x" will be matched, but the xs will remain a thunk, so there's no problem. (Thanks to Ganesh for really helping me grasp that).
In general, you can mention the second arg all you want, as long as you don't force evaluation of it. If you've matched against x:xs, then you can mention xs all you want, as long as you don't force evaluation of it.
So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.
myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
where
step space acc | charIsSpace space = "":acc
step char (x:xs) = (char:x):xs
step _ [] = error "this should be impossible"
This correctly works against infinite lists. Note there's no head, tail or (++) operator in sight.
Now, for an important caveat:
When I first wrote the corrected code, I did not have the 3rd equation, which matches against "step _ []". As a result, I received the warning about non-exhaustive pattern matches. Obviously, it is a good idea to avoid that warning.
But I thought I was going to have a problem. I already mentioned above that it is not OK to pattern match the second arg against []. But I would have to do so in order to get rid of the warning.
However, when I added the "step _ []" equation, everything was fine! There was still no problem with infinite lists!. Why?
Because the 3rd equation in the corrected code IS NEVER REACHED!
In fact, consider the following BROKEN version. It is EXACTLY the SAME as the correct code, except that I have moved the pattern for empty list up above the other patterns:
myWords_brokenAgain :: String -> [String]
myWords_brokenAgain string = foldr step [""] (dropWhile charIsSpace string)
where
step _ [] = error "this should be impossible"
step space acc | charIsSpace space = "":acc
step char (x:xs) = (char:x):xs
We're back to stack overflow, because the first thing that happens when step is called is that the interpreter checks to see if equation number one is a match. To do so, it must see if the second arg is []. To do that, it must evaluate the second arg.
Moving the equation down BELOW the other equations ensures that the 3rd equation is never attempted, because either the first or the second pattern always matches. The 3rd equation is merely there to dispense with the non-exhaustive pattern warning.
This has been a great learning experience. Thanks to everyone for your help.
Others have pointed out the problem, which is that step always evaluates its second argument before producing any output at all, yet its second argument will ultimately depend on the result of another invocation of step when the foldr is applied to an infinite list.
It doesn't have to be written this way, but your second version is kind of ugly because it relies on the initial argument to step having a particular format and it's quite hard to see that the head/tail will never go wrong. (I'm not even 100% certain that they won't!)
What you should do is restructure the first version so it produces output without depending on the input list in at least some situations. In particular we can see that when the character is not a space, there's always at least one element in the output list. So delay the pattern-matching on the second argument until after producing that first element. The case where the character is a space will still be dependent on the list, but that's fine because the only way that case can infinitely recurse is if you pass in an infinite list of spaces, in which case not producing any output and going into a loop is the expected behaviour for words (what else could it do?)
Try expanding the expression by hand:
take 5 (myWords_FailsOnInfiniteList (cycle "why "))
take 5 (foldr step [] (dropWhile charIsSpace (cycle "why ")))
take 5 (foldr step [] (dropWhile charIsSpace ("why " ++ cycle "why ")))
take 5 (foldr step [] ("why " ++ cycle "why "))
take 5 (step 'w' (foldr step [] ("hy " ++ cycle "why ")))
take 5 (step 'w' (step 'h' (foldr step [] ("y " ++ cycle "why "))))
What's the next expansion? You should see that in order to pattern match for step, you need to know whether it's the empty list or not. In order to find that out, you have to evaluate it, at least a little bit. But that second term happens to be a foldr reduction by the very function you're pattern matching for. In other words, the step function cannot look at its arguments without calling itself, and so you have an infinite recursion.
Contrast that with an expansion of your second function:
myWords_anotherReader (cycle "why ")
foldr step [""] (cycle "why ")
foldr step [""] ("why " ++ cycle "why ")
step 'w' (foldr step [""] ("hy " ++ cycle "why ")
let result = foldr step [""] ("hy " ++ cycle "why ") in
['w':(head result)] ++ tail result
let result = step 'h' (foldr step [""] ("y " ++ cycle "why ") in
['w':(head result)] ++ tail result
You can probably see that this expansion will continue until a space is reached. Once a space is reached, "head result" will obtain a value, and you will have produced the first element of the answer.
I suspect that this second function will overflow for infinite strings that don't contain any spaces. Can you see why?
The second version does not actually evaluate result until after it has started producing part of its own answer. The first version evaluates result immediately by pattern matching on it.
The key with these infinite lists is that you have to produce something before you start demanding list elements so that the output can always "stay ahead" of the input.
(I feel like this explanation is not very clear, but it's the best I can do.)
The library function foldr has this implementation (or similar):
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f k (x:xs) = f x (foldr f k xs)
foldr _ k _ = k
The result of myWords_FailsOnInfiniteList depends on the result of foldr which depends on the result of step which depends on the result of the inner foldr which depends on ... and so on an infinite list, myWords_FailsOnInfiniteList will use up an infinite amount of space and time before producing its first word.
The step function in myWords_anotherReader does not require the result of the inner foldr until after it has produced the first letter of the first word. Unfortunately, as Apocalisp says, it uses O(length of first word) space before it produces the next word, because as the first word is being produced, the tail thunk keeps growing tail ([...] ++ tail ([...] ++ tail (...))).
In contrast, compare to
myWords :: String -> [String]
myWords = myWords' . dropWhile isSpace where
myWords' [] = []
myWords' string =
let (part1, part2) = break isSpace string
in part1 : myWords part2
using library functions which may be defined as
break :: (a -> Bool) -> [a] -> ([a], [a])
break p = span $ not . p
span :: (a -> Bool) -> [a] -> ([a], [a])
span p xs = (takeWhile p xs, dropWhile p xs)
takeWhile :: (a -> Bool) -> [a] -> [a]
takeWhile p (x:xs) | p x = x : takeWhile p xs
takeWhile _ _ = []
dropWhile :: (a -> Bool) -> [a] -> [a]
dropWhile p (x:xs) | p x = dropWhile p xs
dropWhile _ xs = xs
Notice that producing the intermediate results is never held up by future computation, and only O(1) space is needed as each element of the result is made available for consumption.
Addendum
So, here's the revised code. I usually try to avoid head and tail, merely because they are partial functions, and also because I need practice writing the pattern matching equivalent.
myWords :: String -> [String]
myWords string = foldr step [""] (dropWhile charIsSpace string)
where
step space acc | charIsSpace space = "":acc
step char (x:xs) = (char:x):xs
step _ [] = error "this should be impossible"
(Aside: You may not care, but the words "" == [] from the library, but your myWords "" = [""]. Similar issue with trailing spaces.)
Looks much-improved over myWords_anotherReader, and is pretty good for a foldr-based solution.
\n -> tail $ myWords $ replicate n 'a' ++ " b"
It's not possible to do better than O(n) time, but both myWords_anotherReader and myWords take O(n) space here. This may be inevitable given the use of foldr.
Worse,
\n -> head $ head $ myWords $ replicate n 'a' ++ " b"
myWords_anotherReader was O(1) but the new myWords is O(n), because pattern matching (x:xs) requires the further result.
You can work around this with
myWords :: String -> [String]
myWords = foldr step [""] . dropWhile isSpace
where
step space acc | isSpace space = "":acc
step char ~(x:xs) = (char:x):xs
The ~ introduces an "irrefutable pattern". Irrefutable patterns never fail and do not force immediate evaluation.