Haskell: trim String and eliminate multiple spaces - haskell

I have just started programming with Haskell and would like to do a String transformation.
I have an arbitrary String e.g.
" abcd \n dad "
I would like to remove the whitespace characters on the left and on the right. And I would like to eliminate multiple whitespaces as well as escape sequcences " \n " -> " "
So the String above would look like this
"abcd dad"
I have already written a function that trims the String and removes the whitespace characters (I'm removing the character if isSpace is true):
trim :: [Char] -> [Char]
trim x = dropWhileEnd isSpace (dropWhile isSpace x)
Now my idea is to do a pattern matching on the input String. But how do I apply the trim function directly to the input? So at first I would like to trim the String at both ends and then apply a pattern matching. So the only thing I would have to do is comparing two characters and removing one if both are whitespace characters
--How do I apply trim directly to the input
s :: [Char] -> [Char]
s [x] = [x]
s(x:xx) = ...
Note: Efficiency is not important. I would like to learn the concepts of pattern matching and understand how Haskell works.
Cheers

trim = unwords . words
Examine the source of words in the Prelude.

If you want to pattern-match on the output of trim, you have to call trim, of course! For example, if you want cases for lists of length 0, 1, and longer, you could use
s xs = case trim xs of
[] -> ...
[x] -> ...
x:x':xs -> ...

Your first pattern matches a single character and returns it. Surely this is not what you want - it could be whitespace. Your first match should be the empty list.
If you were only removing space chars, you could do something like this:
trim :: [Char] -> [Char]
trim [] = []
trim (' ':xs) = trim xs
...
You should be able to see that this removes all leading spaces. At this point, either the string is empty (and matches the first pattern) or it falls through to... leaving that up to you.
If you want to remove all whitespace, you need a list or set of those characters. That might look like this:
trim :: [Char] -> [Char]
trim = let whitespace = [' ', '\t\, `\v'] -- There are more than this, of course
in t
where
t [] = []
t (x:xs) | elem x whitespace = t xs
| otherwise = ...
Again, this has shown how to match the beginning part of the string. Leave it up to you to think about getting to the end.

You can also do pattern matching in a nested function:
s str = removeInnerSpaces (trim str)
where
removeInnerSpaces [] = []
removeInnerSpaces (x:xs) = ...
Here removeInnerSpaces is a nested function, local to s.

Related

Haskell: Deleting white space from a list of strings

The question is: Write a function that will delete leading white
space from a string. Example: cutWhitespace [" x","y"," z"] Expected answer: ["x","y","z"]
Heres what I have:
cutWhitespace (x:xs) = filter (\xs -> (xs /=' ')) x:xs
This returns ["x", " y"," z"] when the input is [" x"," y", " z"]. Why is it ignoring the space in the second and third string and how do I fix it?
We are allowed to use higher-order functions which is why I implemented filter.
The reason the OP cutWhitespace function only works on the first string, is that due to operator precedence, it's actually this function:
cutWhitespace (x:xs) = (filter (\xs -> (xs /=' ')) x) : xs
Here, I've put brackets around most of the body to make it clear how it evaluates. The filter is only applied on x, and x is the first element of the input list; in the example input " x".
If you filter " x" as given, you get "x":
Prelude> filter (\xs -> (xs /=' ')) " x"
"x"
The last thing cutWhitespace does, then, is to take the rest of the list ([" y", " z"]) and cons it on "x", so that it returns ["x"," y"," z"].
In order to address the problem, you could write the function with the realisation that a list of strings is a nested list of characters, i.e. [[Char]].
As a word of warning, pattern-matching on (x:xs) without also matching on [] is dangerous, as it'll fail on empty lists.
Instead of writing a custom function that checks if a character is whitespace, I would advice to use isSpace :: Char -> Bool. This function does not only returns True for a space (' '), but for a new line ('\n'), a carriage return ('\r'), a tab ('\t'), a vertical tab ('\v') and form feed ('\f') as well. Usually it is better to work with such functions since the odds of forgetting certain cases is lower.
We can thus remove the spacing of a single string with:
dropWhile isSpace
Where we thus dropWhile in such way that all chacters where isSpace.
We can then perform a mapping with this filter to filter the spaces out of all the strings, like:
import Data.Char(isSpace)
cutWhitespace = map (dropWhile isSpace)
The question you asked, on how to delete leading whitespace from a string, you can do by simply doing dropWhile on a string:
deleteLeadingWhitespace = dropWhile (\c -> c == ' ')
though you should be more clever if you consider other things "whitespace". You could use the "isSpace" function defined in Data.Char for example.
From your sample data, it looks like you are really trying to do this for a list of strings, in which case you can map the dropWhile over your array:
map deleteLeadingWhitespace
The filter approach you are taking is a little bit dangerous, because even if you had it doing what you think it should, it would be deleting all the spaces, not just the leading ones.

Split string to a list of strings in Clean

Because of the limited amount of resources, I need to propose a question here. I have been struggling with functional programming, the endless Haskell tutorials don't really help me. So what I want to achieve, in Clean language, is to split a string like " car cow cat " to a list of strings ["car","cow","cat"]. Can you provide me a detailed answer (does not have to be complete code), on how to iterate through this string, and especially the part when the newly constructed strings are added to the list?
I'm going to offer a simple solution. There are infinitely better ways of doing this in Haskell, but it's the simplest I can think for someone new in functional programming, without using any specifically Haskell function like takeWhile, or even any folds and maps...
You basically want to simulate iterating over a list, so here is what I suggest:
Define a function that will take a string and a split-by character. This function will return a list of strings - spliton :: String -> Char -> [String]
To move over the list, we'll want to gobble up characters until we hit one of our splitting characters. We'll also want to save the word we've saved up until now, and the entire list of words.
For that, we'll define a subfunction that will save the states
spliton' :: String -> Char -> String -> [String] -> [String]
spliton' [] _ sofar res = res ++ [sofar]
I've also included the simplest clause - an empty string. When our string is empty, we'll just want to return what we have saved so far.
Now lets move on to our actual recursive function:
If we hit our split character, we'll add the string we have saved so far to the list and restart with an empty current-state string
If we don't hit the split character, we'll add the character to the current-state string
spliton' (currchar:rest) splitby sofar res
| currchar==splitby = spliton' rest splitby "" (res++[sofar])
| otherwise = spliton' rest splitby (sofar++[currchar]) res
So, to summarize our code:
spliton :: String -> Char -> [String]
spliton source splitchar = spliton' source splitchar [] []
spliton' :: String -> Char -> String -> [String] -> [String]
spliton' [] _ sofar res = res ++ [sofar]
spliton' (currchar:rest) splitby sofar res
| currchar==splitby = spliton' rest splitby "" (res++[sofar])
| otherwise = spliton' rest splitby (sofar++[currchar]) res
Note: This will not however get rid of the empty string - meaning if you have many superfluous spaces - you'll get them added to the list. I'll leave you to think how to handle that case - hope this can help you get started.
Let's split this up in several sub-problems:
Make a list of characters from the string so that we can easily apply pattern matching.
Scrape the initial part of the list (as long as possible with only spaces or only not-spaces), and only keep it when it is not whitespace.
Repeat the second step while the list is non-empty.
The first thing can be done using fromString. For the second and third step, we define a helper function:
scrape :: [Char] -> [String]
scrape [] = []
scrape cs=:[c:_]
| isSpace c = scrape (dropWhile isSpace cs)
| otherwise = [toString word:scrape rest]
where
(word,rest) = span (not o isSpace) cs
The first alternative is the base case to match the empty list. The second alternative matches the whole list cs with a first element c. If the first character is a space, we recursively (step 3) call the same function on the same list without the initial part of spaces. If the first character is not a space, we use span :: (a -> Bool) [a] -> ([a], [a]) to split the list in the initial part that is a word, and the rest. We store the word using toString as a string, and recursively call scrape for the rest of the list.
Now, we only need a wrapper to make this a function with the type String -> [String]:
split :: String -> [String]
split s = scrape (fromString s)
where
scrape :: [Char] -> [String]
scrape [] = []
scrape cs=:[c:_]
| isSpace c = scrape (dropWhile isSpace cs)
| otherwise = [toString word:scrape rest]
where
(word,rest) = span (not o isSpace) cs
Note that you can easily abstract from the delimiter, by passing a character d and replacing isSpace c with c == d and (not o isSpace) by ((<>) d). Alternatively, you can choose to not pass a character d but a function isDelim :: Char -> Bool. You then get isDelim c and (not o isDelim), respectively.

Haskell char quotes

I've started to learn haskell for real recently, and I'm doing some exercises from wikibooks.
I'm doing exercise with RLE encoding, and I've come with solution like this:
import Data.List
rle :: String -> [(Int,Char)]
rle [] = []
rle xs = zip lengths chars
where
groups = group xs
lengths = map length groups
chars = map head groups
rle_toString :: [(Int, Char)] -> String
rle_toString [] = []
rle_toString (x:xs) = show (fst x ) ++ show (snd x) ++ rle_toString xs`
Not a very elegant solution, but it almost works. The problem is, that I get output like this: "7'a'8'b'7'j'6'q'3'i'7'q'1'p'1'a'16'z'2'n'". The single quotes with chars are not vetry elegant. How can I achieve output like: "7a8b7j6q3i7q1p1a16z2n"?
show is used to print values as they appear in Haskell source code, and thus puts single quotes around characters (and double quotes around strings, and so on). Use [snd x] instead to show just the character.
In Haskell, String is just shorthand for List of Char [Char]. For example, the String "Foo" can also be written like this: ['F','o','o']. So, to convert a single character to a string, just put in in brackets: [char].
The problem is your use of show on a character. show 'a' == "'a'".
The solution is to realize that strings are just lists of characters, so if c is a character, then the one-character string that contains c is just [c].

Remove first space in string using Haskell

How do I remove the first space of a string in Haskell?
For example:
removeSpace " hello" = "hello"
removeSpace " hello" = " hello"
removeSpace "hello" = "hello"
Here are multiple remove-space options, to show of a few functions and ways of doing things.
To take multiple spaces, you can do
removeSpaces = dropWhile (==' ')
This means the same as removeSpaces xs = dropWhile (==' ') xs, but uses partial application (and so does (==' ') in essence).
or for more general removal,
import Data.Char
removeWhitespace = dropWhile isSpace
If you're really sure you just want to take one space (and you certainly seem to be), then pattern matching is clearest:
removeASpace (' ':xs) = xs -- if it starts with a space, miss that out.
removeASpace xs = xs -- otherwise just leave the string alone
This works because in haskell, String = [Char] and (x:xs) means the list that starts with x and carries on with the list xs.
To remove one whitespace character, we can use function guards (if statements with very light syntax, if you've not met them):
removeAWhitespace "" = "" -- base case of empty string
removeAWhitespace (x:xs) | isSpace x = xs -- if it's whitespace, omit it
| otherwise = x:xs -- if it's not keep it.
Simply use pattern matching:
removeSpace (' ':xs) = xs
removeSpace xs = xs
In Haskell, strings are simply list of characters, i.e., the Prelude defines
type String = [Char]
Furthermore, there are about three ways to write a function:
Completely roll it yourself using the two most fundamental tools you have at your disposal: pattern matching and recursion;
Cleverly combine some already written functions; and, of course
A mix of these.
If you are new to Haskell and to functional programming, I recommend writing most of your functions using the first method and then gradually shift toward using more and more predefined functions.
For your problem—removing the first space character (' ') in a string—pattern matching and recursion actually make a lot of sense. As said, strings are just lists of characters, so we will end up with nothing but a simple list traversal.
Let us first write a signature for your function:
removeSpace :: [Char] -> [Char]
(I have written [Char] rather than String to make it explicit that we are performing a list traversal here.)
Pattern matching against a list, we need to consider two cases: the list being empty ([]) and the list consisting of a head element followed by a tail (c : cs).
Dealing with the empty list is, as always, simple: there are no characters left, so there is nothing to remove anymore and we simply return the empty list.
removeSpace [] = []
Then the situation in which we have a head element (a character) and a tail list. Here we need to distinguish two cases again: the case in which the head character is a space and the case in which it is any other character.
If the head character is a space, it will be the first space that we encounter and we need to remove it. As we only have to remove the first space, we can return the remainder of the list (i.e., the tail) without further processing:
removeSpace (' ' : cs) = cs
What remains is to deal with the case in which the head character is not a space. Then we need to keep it in the returned list and, moreover, we need to keep seeking for the first space in the remainder of the list; that is, we need to recursively apply our function to the tail:
removeSpace (c : cs) = c : removeSpace cs
And that's all. The complete definition of our function now reads
removeSpace :: [Char] -> [Char]
removeSpace [] = []
removeSpace (' ' : cs) = cs
removeSpace (c : cs) = c : removeSpace cs
This is arguably as clear and concise a definition as any clever combining of predefined functions would have given you.
To wrap up, let us test our function:
> removeSpace " hello"
"hello"
> removeSpace " hello"
" hello"
> removeSpace "hello"
"hello"
If you really want construct your function out of predefined functions, here is one alternative definition of removeSpace that will do the trick:
removeSpace :: [Char] -> [Char]
removeSpace = uncurry (flip (flip (++) . drop 1)) . break (== ' ')
(You can see why I prefer the one using explicit pattern matching and recursion. ;-))
Note: I have assumed that your objective is indeed to remove the first space in a string, no matter where that first space appears. In the examples you have given, the first space is always the first character in the string. If that's always the case, i.e., if you are only after dropping a leading space, you can leave out the recursion and simply write
removeSpace :: [Char] -> [Char]
removeSpace [] = []
removeSpace (' ' : cs) = cs
removeSpace (c : cs) = c : cs
or, combining the first and last cases,
removeSpace :: [Char] -> [Char]
removeSpace (' ' : cs) = cs
removeSpace cs = cs
or, using predefined functions,
removeSpace :: [Char] -> [Char]
removeSpace = uncurry ((++) . drop 1) . span (== ' ')
To remove the first space anywhere in a string:
removeSpace :: String -> String
removeSpace = (\(xs,ys) -> xs ++ drop 1 ys) . span (/=' ')
Where span grabs characters until it finds a space or reaches the end of the string.
It then splits the results and puts them in a tuple that we take and combine, skipping the first character in the second list (the space). Additionally we assert that the remainder is not null (an empty list) - if it is, we can't get the tail as an empty list can't have a tail can it? So if it is, we just return an empty list.

Haskell: Gluing a char and a list together?

So I have this code here:
toWords :: String -> [String]
toWords "" = []
toWords (nr1 : rest)
| nr1 == ' ' = toWords rest
| otherwise = [nr1] : toWords rest
The "toWords" function should simply remove all spaces and return a list with all the words. But this happens:
*Main> toWords "How are you?"
["H","o","w","a","r","e","y","o","u","?"]
Is this just me, or you are trying to re-invent "words" function from Prelude?
Your type should be String -> [String] or [Char] -> [[Char]].
Your input is a string (a list of chars) your output a list of string (a list of chars of chars).
Your type here means it maps a string to ANY type, this is not true.
Edit: alternatively you can use:
splitBy :: [a] -> a -> [[a]]
splitBy [] sep = []
splitBy (nr1 : rest) if nr1 == sep then splitBy rest sep else nr1 : (splitBy rest sep)
Which is polymorphic and splits a list by any separator. (code not tested though) so splitBy "string of words" ' ' should return ["string","of","words"].
FINALLY, this was annoying and had an obvious and stupid error of [] in lieu of [[]] the functional version is:
splitBy [] sep = [[]]
splitBy (nr1 : rest) sep = if nr1 == sep
then [] : splitBy rest sep
else (nr1 : head (splitBy rest sep)) : tail (splitBy rest sep)
Such that: splitBy "List of things" ' ' ===> ["list","of","things"]
Think about what this does:
It iterates through each character in the string.
| nr1 == ' ' = toWords rest
If the character is a space, it skips that character.
| otherwise = [nr1] : toWords rest
Otherwise it creates a string containing only that character, then continues doing so to the rest of the characters in the string.
What you want is to accumulate the characters in the word into a single list rather than creating a new list for each one.
Here's an example of how you could do it:
toWords "" = []
toWords (' ':rest) = toWords rest
toWords text = let (word, rest) = break (== ' ') text
in word : toWords rest
Two questions - firstly, why is your output polymorphic? It seems like it's invariably going to be a list of String rather than a list of a. I'm not that familiar with the Haskell type inference internals; try changing the signature to String -> String or String -> [Char] and see if it then works.
Secondly, when you do this it should become clear that your implementation is a little off; even if it did work, it would simply return your original string with all the spaces removed.
If you do want to return a list of strings, you'll need to have a helper function that builds up the current word so far, and then adds that whole word to the output list when the current character is a string. Since it seems like you're doing this to learn (else, just use the Prelude function) I won't give a listing but it shouldn't be too hard to figure out from here.

Resources