Haskell: Deleting white space from a list of strings - string

The question is: Write a function that will delete leading white
space from a string. Example: cutWhitespace [" x","y"," z"] Expected answer: ["x","y","z"]
Heres what I have:
cutWhitespace (x:xs) = filter (\xs -> (xs /=' ')) x:xs
This returns ["x", " y"," z"] when the input is [" x"," y", " z"]. Why is it ignoring the space in the second and third string and how do I fix it?
We are allowed to use higher-order functions which is why I implemented filter.

The reason the OP cutWhitespace function only works on the first string, is that due to operator precedence, it's actually this function:
cutWhitespace (x:xs) = (filter (\xs -> (xs /=' ')) x) : xs
Here, I've put brackets around most of the body to make it clear how it evaluates. The filter is only applied on x, and x is the first element of the input list; in the example input " x".
If you filter " x" as given, you get "x":
Prelude> filter (\xs -> (xs /=' ')) " x"
"x"
The last thing cutWhitespace does, then, is to take the rest of the list ([" y", " z"]) and cons it on "x", so that it returns ["x"," y"," z"].
In order to address the problem, you could write the function with the realisation that a list of strings is a nested list of characters, i.e. [[Char]].
As a word of warning, pattern-matching on (x:xs) without also matching on [] is dangerous, as it'll fail on empty lists.

Instead of writing a custom function that checks if a character is whitespace, I would advice to use isSpace :: Char -> Bool. This function does not only returns True for a space (' '), but for a new line ('\n'), a carriage return ('\r'), a tab ('\t'), a vertical tab ('\v') and form feed ('\f') as well. Usually it is better to work with such functions since the odds of forgetting certain cases is lower.
We can thus remove the spacing of a single string with:
dropWhile isSpace
Where we thus dropWhile in such way that all chacters where isSpace.
We can then perform a mapping with this filter to filter the spaces out of all the strings, like:
import Data.Char(isSpace)
cutWhitespace = map (dropWhile isSpace)

The question you asked, on how to delete leading whitespace from a string, you can do by simply doing dropWhile on a string:
deleteLeadingWhitespace = dropWhile (\c -> c == ' ')
though you should be more clever if you consider other things "whitespace". You could use the "isSpace" function defined in Data.Char for example.
From your sample data, it looks like you are really trying to do this for a list of strings, in which case you can map the dropWhile over your array:
map deleteLeadingWhitespace
The filter approach you are taking is a little bit dangerous, because even if you had it doing what you think it should, it would be deleting all the spaces, not just the leading ones.

Related

Comparing each element of a list to each element of another list

I'm trying to write a function that takes the first character of the first string, compares it to all the characters of the second string and if it finds the same character, replaces with a "-". Then it moves on to the second character of the first string, does the same comparison with each character (except the first character - the one we already checked) on the second string and so on. I want it to return the first string, but with the repeating characters swapped with the symbol "-".
E.g. if I put in comparing "good morning" "good afternoon", I'd like it to return "-----m---i-g"
I hope I explained it clearly enough. So far I've got:
comparing :: String -> String -> String
comparing a b =
if a == "" then ""
else if head a == head b then "-" ++ (comparing (tail a) (tail b))
else [head a] ++ (comparing (tail a) b)
The problem with this is it does not go through the second string character by character and I'm not sure how to implement that. I think I would need to call a recursive function on the 4th line:
if head a == ***the first character of tail b*** then "-" ++ (comparing (tail a) (tail b))
What could that function look like? Or is there a better way to do this?
First, at each recursive call, while you're iterating over the string a, you are for some reason also iterating over the string b at the same time. Look: you're passing only tail b to the next call. This means that the next call won't be able to look through the whole string b, but only through its tail. Why are you doing this?
Second, in order to see if a character is present in a string, use elem:
elem 'x' "xyz" == True
elem 'x' "abc" == False
So the second line of your function should look like this:
else if elem (head a) b then "-" ++ (comparing (tail a) b)
On a somewhat related note, use of head and tail functions is somewhat frowned upon, because they're partial: they will crash if the string is empty. Yes, I see that you have checked to make sure that the string is not empty, but the compiler doesn't understand that, which means that it won't be able to catch you when you accidentally change this check in the future.
A better way to inspect data is via pattern matching:
-- Comparing an empty string with anything results in an empty string
comparing "" _ = ""
-- Comparing a string that starts with `a` and ends with `rest`
comparing (a:rest) b =
(if elem a b then "-" else a) ++ comparing rest b
Rather than writing the recursive logic manually, this looks like a classic use case for map. You just need a function that takes a character and returns either that character or '-' depending on its presence in the other list.
Written out fully, this would look like:
comparing first second = map replace first
where replace c = if c `elem` second then '-' else c

Haskell filter more than one char

I have a question. I have this line of code here.
map length [filter (/= ' ') someString]
I know it removes the space from someString. But is it possible to remove more than just the space from the string using filter? Let just say remove the spaces and some other char.
Thanks!
You can just do:
filter (not . flip elem "<your chars here>")
Example:
ghci> filter (not . flip elem " .,") "This is an example sentence, which uses punctuation."
"Thisisanexamplesentencewhichusespunctuation"
Just to put the comment in here: To filter out letters of both cases (a-z and A-Z), you should probably use Data.Char's isLetter function.
A slightly shorter way is to use notElem:
λ> filter (flip notElem " f") "foo bar"
"oobar"
An alternative is to use a fold to get the length, if you don't actually need to return the filtered string but only want its length.
f x charlist = foldl (\acc x -> if not $ x `elem` charlist then acc + 1 else acc) 0 x
charlist is your list of banned characters. If you have a fixed list always, you can directly define it as a top-level value and use it in the function body instead of passing it as a parameter.
A more concise version using list comprehensions:
f x = length [z | z <- x, notElem z charlist]

"Split" returns redundant characters

I'm looking for a simple way of implementing split function. Here is what I have:
import Data.List
groupBy (\x y -> y /= ',') "aaa, bbb, ccc, ddd"
=> ["aaa",", bbb",", ccc",", ddd"]
It's almost what I want except the fact that a delimiter "," and even an extra whitespace are in the result set. I'd like it to be ["aaa","bbb","ccc","ddd"]
So what is the simplest way to do that?
Think about: what is your group separator?
In your case, looks you want to avoid comma and whitespaces, why not?
split :: Eq a => [a] -> [a] -> [[a]]
split separators seq = ...
You can group then writing
groupBy ((==) `on` (flip elem sep)) seq
taking
[ "aaa"
, ", "
, "bbb"
, ", "
, "ccc"
, ", "
, "ddd"
]
and filter final valid groups
filter (not.flip elem sep.head) $ groupBy ((==) `on` (flip elem sep)) seq
returning
["aaa","bbb","ccc","ddd"]
of course, if you want a implemented function, then Data.List.Split is great!
Explanation
This split function works for any a type whenever instance Eq class (i.e. you can compare equality given two a). Not just Char.
A (list-based) string in Haskell is written as [Char], but a list of chars (not a string) is also written as [Char].
In our split function, the first element list is the valid separators (e.g. for [Char] may be ", "), the second element list is the source list to split (e.g. for [Char] may be "aaa, bbb"). A better signature could be:
type Separators a = [a]
split :: Eq a => Separators a -> [a] -> [[a]]
or data/newtype variations but this is another story.
Then, our first argument has the same type as second one - but they are not the same thing.
The resultant type is a list of strings. As a string is [Char] then the resultant type is [[Char]]. If we'd prefer a general type (not just Char) then it becomes [[a]].
A example of splitting with numbers might be:
Prelude> split [5,10,15] [1..20]
[[1,2,3,4],[6,7,8,9],[11,12,13,14],[16,17,18,19,20]]
[5,10,15] is the separator list, [1..20] the input list to split.
(Thank you very much Nick B!)
Have a look at the splitOn function from the Data.List.Split package:
splitOn ", " "aaa, bbb, ccc, ddd" -- returns ["aaa","bbb","ccc","ddd"]
It splits a given list on every occurrence of the complete substring. Alternatively you can also use splitOneOf:
splitOneOf ", " "aaa, bbb, ccc, ddd" -- returns ["aaa","","bbb","","ccc","","ddd"]
Although it returns some empty strings it has the advantage of splitting at one of the characters. The empty strings can be removed by a simple filter.

Remove first space in string using Haskell

How do I remove the first space of a string in Haskell?
For example:
removeSpace " hello" = "hello"
removeSpace " hello" = " hello"
removeSpace "hello" = "hello"
Here are multiple remove-space options, to show of a few functions and ways of doing things.
To take multiple spaces, you can do
removeSpaces = dropWhile (==' ')
This means the same as removeSpaces xs = dropWhile (==' ') xs, but uses partial application (and so does (==' ') in essence).
or for more general removal,
import Data.Char
removeWhitespace = dropWhile isSpace
If you're really sure you just want to take one space (and you certainly seem to be), then pattern matching is clearest:
removeASpace (' ':xs) = xs -- if it starts with a space, miss that out.
removeASpace xs = xs -- otherwise just leave the string alone
This works because in haskell, String = [Char] and (x:xs) means the list that starts with x and carries on with the list xs.
To remove one whitespace character, we can use function guards (if statements with very light syntax, if you've not met them):
removeAWhitespace "" = "" -- base case of empty string
removeAWhitespace (x:xs) | isSpace x = xs -- if it's whitespace, omit it
| otherwise = x:xs -- if it's not keep it.
Simply use pattern matching:
removeSpace (' ':xs) = xs
removeSpace xs = xs
In Haskell, strings are simply list of characters, i.e., the Prelude defines
type String = [Char]
Furthermore, there are about three ways to write a function:
Completely roll it yourself using the two most fundamental tools you have at your disposal: pattern matching and recursion;
Cleverly combine some already written functions; and, of course
A mix of these.
If you are new to Haskell and to functional programming, I recommend writing most of your functions using the first method and then gradually shift toward using more and more predefined functions.
For your problem—removing the first space character (' ') in a string—pattern matching and recursion actually make a lot of sense. As said, strings are just lists of characters, so we will end up with nothing but a simple list traversal.
Let us first write a signature for your function:
removeSpace :: [Char] -> [Char]
(I have written [Char] rather than String to make it explicit that we are performing a list traversal here.)
Pattern matching against a list, we need to consider two cases: the list being empty ([]) and the list consisting of a head element followed by a tail (c : cs).
Dealing with the empty list is, as always, simple: there are no characters left, so there is nothing to remove anymore and we simply return the empty list.
removeSpace [] = []
Then the situation in which we have a head element (a character) and a tail list. Here we need to distinguish two cases again: the case in which the head character is a space and the case in which it is any other character.
If the head character is a space, it will be the first space that we encounter and we need to remove it. As we only have to remove the first space, we can return the remainder of the list (i.e., the tail) without further processing:
removeSpace (' ' : cs) = cs
What remains is to deal with the case in which the head character is not a space. Then we need to keep it in the returned list and, moreover, we need to keep seeking for the first space in the remainder of the list; that is, we need to recursively apply our function to the tail:
removeSpace (c : cs) = c : removeSpace cs
And that's all. The complete definition of our function now reads
removeSpace :: [Char] -> [Char]
removeSpace [] = []
removeSpace (' ' : cs) = cs
removeSpace (c : cs) = c : removeSpace cs
This is arguably as clear and concise a definition as any clever combining of predefined functions would have given you.
To wrap up, let us test our function:
> removeSpace " hello"
"hello"
> removeSpace " hello"
" hello"
> removeSpace "hello"
"hello"
If you really want construct your function out of predefined functions, here is one alternative definition of removeSpace that will do the trick:
removeSpace :: [Char] -> [Char]
removeSpace = uncurry (flip (flip (++) . drop 1)) . break (== ' ')
(You can see why I prefer the one using explicit pattern matching and recursion. ;-))
Note: I have assumed that your objective is indeed to remove the first space in a string, no matter where that first space appears. In the examples you have given, the first space is always the first character in the string. If that's always the case, i.e., if you are only after dropping a leading space, you can leave out the recursion and simply write
removeSpace :: [Char] -> [Char]
removeSpace [] = []
removeSpace (' ' : cs) = cs
removeSpace (c : cs) = c : cs
or, combining the first and last cases,
removeSpace :: [Char] -> [Char]
removeSpace (' ' : cs) = cs
removeSpace cs = cs
or, using predefined functions,
removeSpace :: [Char] -> [Char]
removeSpace = uncurry ((++) . drop 1) . span (== ' ')
To remove the first space anywhere in a string:
removeSpace :: String -> String
removeSpace = (\(xs,ys) -> xs ++ drop 1 ys) . span (/=' ')
Where span grabs characters until it finds a space or reaches the end of the string.
It then splits the results and puts them in a tuple that we take and combine, skipping the first character in the second list (the space). Additionally we assert that the remainder is not null (an empty list) - if it is, we can't get the tail as an empty list can't have a tail can it? So if it is, we just return an empty list.

Shortening a String Haskell

How do you shorten a string in Haskell with a given number.
Say:
comp :: String -> String
short :: String -> String
chomp (x:xs) = (x : takeWhile (==x) xs)
using comp I want to select a run of repeated characters from the start of a string, with
the run comprising at most nine characters.
For example:
short "aaaavvvdd"
would output "aaaa"
and short "dddddddddd"
outputs "ddddddddd".
I know I need take but am not sure how to put that into the code.
i've got this far but it doesn't work
short x:xs | length(short x:xs) >9 = take(9)
| otherwise = comp
The Quick Answer
import Data.List
short [] = []
short x = (take 9 . head . group) x
This will give you output that matches your desired output.
That is,
*> short "aaaavvvdd"
"aaaa"
*> short "dddddddddd"
"ddddddddd"
Step by Step Development
Use "group" to separate the items
This solution depends on the "group" function in the Data.List library. We begin with the definition:
short x = group x
This gives us:
*> short "aaaavvvddd"
["aaaa","vvv","ddd"]
Use "head" to return only the first element
Once we have the the elements in a list, we want only the first item of the list. We achieve this using "head":
short x = (head . group) x
"." is the Haskell function for function composition. It's the same as:
short x = head (group x)
or
short x = head $ group x
This will give us:
*> short "aaaavvvdd"
"aaaa"
*> short "dddddddddddddd"
"dddddddddddddd"
Use "take" to get the first nine characters
We finish the program by taking only the first nine characters of this result, and end up with our final function. To do this, we use the "take" function from the prelude:
short x = (take 9 . head . group) x
We now have the result that we wanted, but with one minor problem.
Add another case to eliminate the error
Note that using our current definition on the empty list causes an error,
*> short "aaaavvvddd"
"aaaa"
*> short ""
"*** Exception: Prelude.head: empty list
Because "head" is undefined on the empty list, we need to handle another case: the empty list. Now we have:
short [] = []
short x = (take 9 . head . group) x
This is our "final answer".
Here is another version:
short xs = take 9 $ takeWhile (== head xs) xs
So we take from the list as long as the content equals the head of list (which is the first char of the string). Then we use take to shorten the result when necessary.
Note that we don't need an additional case for empty strings, which is a consequence from Haskell's lazyness: If takeWhile sees that the list argument is empty, it doesn't bother to evaluate the condition argument, so head xs doesn't throw an error.
Here's a definition:
import Data.List (group)
short = take 9 . head . group
Interestingly enough, since our returned string is a prefix of the original string, and is constrained to be at most 9 characters long, it doesn't matter whether we trim down to that limit first or last. So we could also use this definition:
short = head . group . take 9
Both of these are written in the "pointfree" style which doesn't reference a lack of punctuation, but a lack of unnecessary variables. We could have also written the definition as
short s = take 9 (head (group s))
Or, using $ to get rid of parentheses:
short s = take 9 $ head $ group s
The only other step is to extract only the first block of matching characters, which is what head . group does (equivalent to your chomp function).
From the docs:
group :: Eq a => [a] -> [[a]]
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]
It is a special case of groupBy, which allows the programmer to supply their own equality test.

Resources