"Split" returns redundant characters - haskell

I'm looking for a simple way of implementing split function. Here is what I have:
import Data.List
groupBy (\x y -> y /= ',') "aaa, bbb, ccc, ddd"
=> ["aaa",", bbb",", ccc",", ddd"]
It's almost what I want except the fact that a delimiter "," and even an extra whitespace are in the result set. I'd like it to be ["aaa","bbb","ccc","ddd"]
So what is the simplest way to do that?

Think about: what is your group separator?
In your case, looks you want to avoid comma and whitespaces, why not?
split :: Eq a => [a] -> [a] -> [[a]]
split separators seq = ...
You can group then writing
groupBy ((==) `on` (flip elem sep)) seq
taking
[ "aaa"
, ", "
, "bbb"
, ", "
, "ccc"
, ", "
, "ddd"
]
and filter final valid groups
filter (not.flip elem sep.head) $ groupBy ((==) `on` (flip elem sep)) seq
returning
["aaa","bbb","ccc","ddd"]
of course, if you want a implemented function, then Data.List.Split is great!
Explanation
This split function works for any a type whenever instance Eq class (i.e. you can compare equality given two a). Not just Char.
A (list-based) string in Haskell is written as [Char], but a list of chars (not a string) is also written as [Char].
In our split function, the first element list is the valid separators (e.g. for [Char] may be ", "), the second element list is the source list to split (e.g. for [Char] may be "aaa, bbb"). A better signature could be:
type Separators a = [a]
split :: Eq a => Separators a -> [a] -> [[a]]
or data/newtype variations but this is another story.
Then, our first argument has the same type as second one - but they are not the same thing.
The resultant type is a list of strings. As a string is [Char] then the resultant type is [[Char]]. If we'd prefer a general type (not just Char) then it becomes [[a]].
A example of splitting with numbers might be:
Prelude> split [5,10,15] [1..20]
[[1,2,3,4],[6,7,8,9],[11,12,13,14],[16,17,18,19,20]]
[5,10,15] is the separator list, [1..20] the input list to split.
(Thank you very much Nick B!)

Have a look at the splitOn function from the Data.List.Split package:
splitOn ", " "aaa, bbb, ccc, ddd" -- returns ["aaa","bbb","ccc","ddd"]
It splits a given list on every occurrence of the complete substring. Alternatively you can also use splitOneOf:
splitOneOf ", " "aaa, bbb, ccc, ddd" -- returns ["aaa","","bbb","","ccc","","ddd"]
Although it returns some empty strings it has the advantage of splitting at one of the characters. The empty strings can be removed by a simple filter.

Related

Haskell: Deleting white space from a list of strings

The question is: Write a function that will delete leading white
space from a string. Example: cutWhitespace [" x","y"," z"] Expected answer: ["x","y","z"]
Heres what I have:
cutWhitespace (x:xs) = filter (\xs -> (xs /=' ')) x:xs
This returns ["x", " y"," z"] when the input is [" x"," y", " z"]. Why is it ignoring the space in the second and third string and how do I fix it?
We are allowed to use higher-order functions which is why I implemented filter.
The reason the OP cutWhitespace function only works on the first string, is that due to operator precedence, it's actually this function:
cutWhitespace (x:xs) = (filter (\xs -> (xs /=' ')) x) : xs
Here, I've put brackets around most of the body to make it clear how it evaluates. The filter is only applied on x, and x is the first element of the input list; in the example input " x".
If you filter " x" as given, you get "x":
Prelude> filter (\xs -> (xs /=' ')) " x"
"x"
The last thing cutWhitespace does, then, is to take the rest of the list ([" y", " z"]) and cons it on "x", so that it returns ["x"," y"," z"].
In order to address the problem, you could write the function with the realisation that a list of strings is a nested list of characters, i.e. [[Char]].
As a word of warning, pattern-matching on (x:xs) without also matching on [] is dangerous, as it'll fail on empty lists.
Instead of writing a custom function that checks if a character is whitespace, I would advice to use isSpace :: Char -> Bool. This function does not only returns True for a space (' '), but for a new line ('\n'), a carriage return ('\r'), a tab ('\t'), a vertical tab ('\v') and form feed ('\f') as well. Usually it is better to work with such functions since the odds of forgetting certain cases is lower.
We can thus remove the spacing of a single string with:
dropWhile isSpace
Where we thus dropWhile in such way that all chacters where isSpace.
We can then perform a mapping with this filter to filter the spaces out of all the strings, like:
import Data.Char(isSpace)
cutWhitespace = map (dropWhile isSpace)
The question you asked, on how to delete leading whitespace from a string, you can do by simply doing dropWhile on a string:
deleteLeadingWhitespace = dropWhile (\c -> c == ' ')
though you should be more clever if you consider other things "whitespace". You could use the "isSpace" function defined in Data.Char for example.
From your sample data, it looks like you are really trying to do this for a list of strings, in which case you can map the dropWhile over your array:
map deleteLeadingWhitespace
The filter approach you are taking is a little bit dangerous, because even if you had it doing what you think it should, it would be deleting all the spaces, not just the leading ones.

Haskell char quotes

I've started to learn haskell for real recently, and I'm doing some exercises from wikibooks.
I'm doing exercise with RLE encoding, and I've come with solution like this:
import Data.List
rle :: String -> [(Int,Char)]
rle [] = []
rle xs = zip lengths chars
where
groups = group xs
lengths = map length groups
chars = map head groups
rle_toString :: [(Int, Char)] -> String
rle_toString [] = []
rle_toString (x:xs) = show (fst x ) ++ show (snd x) ++ rle_toString xs`
Not a very elegant solution, but it almost works. The problem is, that I get output like this: "7'a'8'b'7'j'6'q'3'i'7'q'1'p'1'a'16'z'2'n'". The single quotes with chars are not vetry elegant. How can I achieve output like: "7a8b7j6q3i7q1p1a16z2n"?
show is used to print values as they appear in Haskell source code, and thus puts single quotes around characters (and double quotes around strings, and so on). Use [snd x] instead to show just the character.
In Haskell, String is just shorthand for List of Char [Char]. For example, the String "Foo" can also be written like this: ['F','o','o']. So, to convert a single character to a string, just put in in brackets: [char].
The problem is your use of show on a character. show 'a' == "'a'".
The solution is to realize that strings are just lists of characters, so if c is a character, then the one-character string that contains c is just [c].

split string into string in haskell

How can I split a string into another string without using Data.List.Split function?
To be more concrete: to turn "Abc" into "['A','b','c']"
If you want literally the string "['A','b','c']" as opposed to the expression ['A','b','c'] which is identical to "Abc" since in Haskell the String type is a synonym for [Char], then something like the following will work:
'[': (intercalate "," $ map show "Abc") ++ "]"
The function intercalate is in Data.List with the type
intercalate :: [a] -> [[a]] -> [a]
It intersperses its first argument between the elements of the list given as its second argument.
I assume you meant how to turn "Abc"into ["A", "b", "c"]. This is quite simple, if the string to split is s, then this will do the trick:
map (\x -> [x]) s
Fire up ghci to see that the expressions you wrote are the same:
Prelude> ['A','b','c']
"Abc"

How do i convert String into list of integers in Haskell

I have a String like "1 2 3 4 5". How can I convert it into a list of integers like [1,2,3,4,5] in Haskell? What if the list is "12345"?
You can use
Prelude> map read $ words "1 2 3 4 5" :: [Int]
[1,2,3,4,5]
Here we use words to split "1 2 3 4 5" on whitespace so that we get ["1", "2", "3", "4", "5"]. The read function can now convert the individual strings into integers. It has type Read a => String -> a so it can actually convert to anything in the Read type class, and that includes Int. It is because of the type variable in the return type that we need to specify the type above.
For the string without spaces, we need to convert each Char into a single-element list. This can be done by applying (:"") to it — a String is just a list of Chars. We then apply read again like before:
Prelude> map (read . (:"")) "12345" :: [Int]
[1,2,3,4,5]
q1 :: Integral a => String -> [a]
q1 = map read . words
q2 :: Integral a => String -> [a]
q2 = map (read . return)
Error handling is left as an exercise. (Hint: you will need a different return type.)
There is a function defined in the module Data.Char called digitToInt. It takes a character and returns a number, as long as the character could be interpreted as an hexadecimal digit.
If you want to use this function in your first example, where the numbers where separated by a space, you'll need to avoid the spaces. You can do that with a simple filter
> map digitToInt $ filter (/=' ') "1 2 1 2 1 2 1"
[1,2,1,2,1,2,1]
The second example, where the digits where not separated at all, is even easier because you don't need a filter
> map digitToInt "1212121"
[1,2,1,2,1,2,1]
I'd guess digitToInt is better than read because it doesn't depend on the type of the expression, which could be tricky (which is in turn how i found this post =P ). Anyway, I'm new to haskell so i might as well be wrong =).
You can use:
> [read [x] :: Int | x <- string]

How to convert a list of (Char,Int) to a string with the given number of repeated chars?

How can I convert [(char,Int)] to a String of the Int in the second component gives the number of repetitions of the character in the first component? For example the input [(a,9),(b,10)] should give ["aaaaaaaaa","bbbbbbbbbb"] as output.
Hugs> map (\(c,n) -> replicate n c) [('a',9), ('b',10)]
["aaaaaaaaa","bbbbbbbbbb"]
or
map (uncurry $ flip replicate)
This can be assembled from just a few functions in the Prelude. Since your input is a list of tuples, the return value becomes a list of strings.
repChars :: (Char, Int) -> String
repChars (c,n) = replicate n c
Prelude> map repChars [('a',9),('b',10)]
["aaaaaaaaa","bbbbbbbbbb"]
Or if you want to do it as a point-free one-liner:
repCharList = map (uncurry (flip replicate))
Is this homework? If so, please use the homework tag.
I'm assuming the input is supposed to be [('a', 9), ('b', 10)] since without the 's it would only make sense if a and b were previously defined, which you did not mention.
In that case you can use replicate to create a list which contains a given element a given number of times (note that the string "aaaaaaaaaa" is a list containing the element 'a' 9 times). To do that for every tuple in the list, you can use map on the list. Now you have a list containing the strings for each character. To turn that into a single string separated by commas, you can use intercalate, which takes a separator and a list of lists and returns a single li.
The facetious and horrible answer:
Prelude> let replignore ((_,x):[]) = [replicate x 'b']; replignore ((_,x):xs) = replicate x 'a' : replignore xs
Prelude> replignore [(a,9),(b,10)]
<interactive>:1:13: Not in scope: `a'
<interactive>:1:19: Not in scope: `b'
Prelude> let a = undefined
Prelude> let b = undefined
Prelude> replignore [(a,9),(b,10)]
["aaaaaaaaa","bbbbbbbbbb"]
But it didn't quite fit the specs since it includes the quotation marks in the answer. ;)
My point is, you need quotes around your Char and String literals.

Resources