Haskell - Splitting a string by delimiter - haskell

I am trying to write a program in Haskell to split a string by delimiter.
And I have studied different examples provided by other users. An example would the the code that is posted below.
split :: String -> [String]
split [] = [""]
split (c:cs)
| c == ',' = "" : rest
| otherwise = (c : head rest) : tail rest
where
rest = split cs
Sample Input: "1,2,3".
Sample Output: ["1","2","3"].
I have been trying to modify the code so that the output would be something like ["1", "," , "2", "," , "3"] which includes the delimiter in the output as well , but I just cannot succeed.
For example, I changed the line:
| c == ',' = "" : rest
into:
| c == ',' = "," : rest
But the result becomes ["1,","2,","3"].
What is the problem and in which part I have had a misunderstanding?

If you're trying to write this function "for real" instead of writing the character-by-character recursion for practice, I think a clearer method is to use the break function from Data.List. The following expression:
break (==',') str
breaks the string into a tuple (a,b) where the first part consists of the initial "comma-free" part, and the second part is either more string starting with the comma or else empty if there's no more string.
This makes the definition of split clear and straightforward:
split str = case break (==',') str of
(a, ',':b) -> a : split b
(a, "") -> [a]
You can verify that this handles split "" (which returns [""]), so there's no need to treat that as a special case.
This version has the added benefit that the modification to include the delimiter is also easy to understand:
split2 str = case break (==',') str of
(a, ',':b) -> a : "," : split2 b
(a, "") -> [a]
Note that I've written the patterns in these functions in more detail than is necessary to make it absolute clear what's going on, and this also means that Haskell does a duplicate check on each comma. For this reason, some people might prefer:
split str = case break (==',') str of
(a, _:b) -> a : split b
(a, _) -> [a]
or, if they still wanted to document exactly what they were expecting in each case branch:
split str = case break (==',') str of
(a, _comma:b) -> a : split b
(a, _empty) -> [a]

Instead of altering code in the hope that it matches the expecations, it is usually better to understand the code fragment first.
split :: String -> [String]
split [] = [""]
split (c:cs) | c == ',' = "" : rest
| otherwise = (c : head rest) : tail rest
where rest = split cs
First of all we better analyze what split does. The first statement simply says "The split of an empty string, is a list with one element, the empty string". This seems reasonable. Now the second clause states: "In case the head of the string is a comma, we produce a list where the first element is an empty string, followed by splitting up the remainings of the string.". The last guard says "In case the first character of the string is not a comma, we prepend that character to the first item of the split of the remaining string, followed by the remaining elements of the split of the remaining string". Mind that split returns a list of strings, so the head rest is a string.
So if we want to add the delimiter to the output, then we need to add that as a separate string in the output of split. Where? In the first guard. We should not return "," : rest, since the head is - by recursion - prepended, but as a separate string. So the result is:
split :: String -> [String]
split [] = [""]
split (c:cs) | c == ',' = "" : "," : rest
| otherwise = (c : head rest) : tail rest
where rest = split cs

That example code is poor style. Never use head and tail unless you know exactly what you're doing (these functions are unsafe, partial functions). Also, equality comparisons are usually better written as dedicated patterns.
With that in mind, the example becomes:
split :: String -> [String]
split "" = [""]
split (',':cs) = "" : split cs
split (c:cs) = (c:cellCompletion) : otherCells
where cellCompletion : otherCells = split cs
(Strictly speaking, this is still unsafe because the match cellCompletion:otherCells is non-exhaustive, but at least it happens in a well-defined place which will give a clear error message if anything goes wrong.)
Now IMO, this makes it quite a bit clearer what's actually going on here: with "" : split cs, the intend is not really to add an empty cell to the result. Rather, it is to add a cell which will be filled up by calls further up in the recursion stack. This happens because those calls deconstruct the deeper result again, with the pattern match cellCompletion : otherCells = split cs, i.e. they pop off the first cell again and prepend the actual cell contents.
So, if you change that to "," : split, the effect is just that all cells you build will already be pre-terminated with a , character. That's not what you want.
Instead you want to add an additional cell that won't be touched anymore. That needs to be deeper in the result then:
split (',':cs) = "" : "," : split cs

Related

Add n spaces between each letter in a given string in Haskell

I am quite a beginner and I am trying to write a function in Haskell that would take in a number n and a given string. The function will then return the string with n spaces between each letter in the string.
For example the function addSpace 2 "hello" will return the following "h e l l o".
For the moment I have only managed to have a function that would take in a String and just add a single space between each letter.
addSpace :: String -> String
addSpace s = if length s <= 1
then s
else take 1 s ++ " " ++ addSpace (drop 1 s)
However the function in Haskell I would like it to be:
addSpace :: Integer -> String -> String
Appreciate all help!
You already have a working implementation that adds a string between every character of the input string; it's just that the string to be added is hardcoded as " ". So it should be fairly obvious that all you need to do is replace the " " with some function call that takes an Integer and works out how many spaces there should be:
addSpace :: Integer -> String -> String
addSpace n s
= if length s <= 1
then s
else take 1 s ++ makeSpaces n ++ addSpace (drop 1 s)
Now all you have to do is define the makeSpaces function. From how we're using it, it must have this type:
makeSpaces :: Integer -> String
makeSpaces n = _
I'm not going to give you any more code, but you can implement it very similarly to the recursion scheme you already demonstrated in addSpace, only instead of the choice of whether to recurse of not being based on the length of an input string, it will be based on the value of an input integer. When n is zero, what should you return? When it's greater than zero, what should you add to a recursive call to get the right answer? And how should you transform n as input for the recursive call.
For bonus points1: your current addString works (at least for finite inputs) but can be very inefficient, because every time you ask for the length of a string (which you do in every recursive call) it has to walk the entire string to count out the length. Since you don't actually use the length for anything except checking whether it it's less than or equal to 1, do you really need to calculate the exact length? If you've been shown pattern matching2 then you should be able to think of a way to tell whether a list has at least one character in it without calling length.
1 I say "bonus points" because an addSpace :: Integer -> String -> String function based on the way you've written your current solution works and I imagine it would get a decent mark in a beginner course, but I also imagine it will not get a top mark; the code is inefficient and longer than it needs to be.
2 If you haven't been shown pattern matching then (a) you can ignore this whole "bonus points" paragraph, as I don't think it will be required of you to get a good mark, and (b) they're teaching you Haskell in a very weird order.

Backslash in string changing output

I am currently trying to implement a method that counts the number of characters and digits in a string. However if I use a string that contains the '\' character I am getting strange results. I am guessing it's because the backslash character is an escape character.
Here is the method:
import Data.Char
countLettersAndDigits :: String -> Int
countLettersAndDigits [] = 0
countLettersAndDigits (x:xs) = if isDigit x == True || isLetter x == True
then 1 + countLettersAndDigits xs
else countLettersAndDigits xs
Here is a set of inputs with their respective results:
"1234fd" -> 6 (Doesn't contain '\')
"1234f\d" -> lexical error in string/character literal at character
'd'
"1234\fd" -> 5
"123\4fd" -> 5
"12\34fd" -> 4
"1\234fd" -> 4
"\1234fd" -> 3
I find it strange that, for example, "1234\fd" and "123\4fd" both give 5 as a result.
Any help explaining why this maybe the case and also how to get around this problem? would be great!
Cheers.
Edit
I forgot to mention that the string that I used above was just an example I was playing with. The actual string that is causing a problem is being generated by Quick Check. The string was "\178". So I require a way to be able to handle this case in my code when their is only one backslash and the string is being generated for me. Cheers.
You are correct that \ is Haskell's escape character. If you print out the generated strings, the answer may be more obvious:
main = mapM_ putStrLn [ "1234fd"
, "1234\fd"
, "123\4fd"
, "12\34fd"
, "1\234fd"
, "\1234fd"
]
yields...
1234fd
1234d
123fd
12"fd
1êfd
Ӓfd
If you actually intended on including a backslash character in your string, you need to double it up: "\\" will result in a single \ being printed.
You can read up on escape sequences here.

Creating an array of possible string variations

I'm trying to figure out how I would create variations of a string, by replacing one character at a time in the string with a different character from another array.
For example:
variations = "abc"
getVariations "xyz" variations
Should return:
["xbc", "ybc", "zbc", "axc", "ayc", "azc", "abx", "aby", "abz"]
I'm not quite sure how to go about this. I tried iterating through the string, and then using list comprehension to add the possible characters but I end up losing characters.
[c ++ xs | c <- splitOn "" variations]
Where xs is the tail of the string.
Would someone be able to point me in the right direction please?
Recursively you can define getVariations replacements input
if input is empty, the result is ...
if input is (a:as), combine the results of:
replacing a with a character from replacements
keeping a the same and performing getVariations on as
This means the definition of getVariations could look ike:
getVariations replacements [] = ...
getVariations replacements (a:as) = ...#1... ++ ...#2...
It might also help to decide what the type of getVariations is:
getVariations :: String -> String -> ???

Standard ML string to a list

Is there a way in ML to take in a string and output a list of those string where a separation is a space, newline or eof, but also keeping strings inside strings intact?
EX) hello world "my id" is 5555
-> [hello, world, my id, is, 5555]
I am working on a tokenizing these then into:
->[word, word, string, word, int]
Sure you can! Here's the idea:
If we take a string like "Hello World, \"my id\" is 5555", we can split it at the quote marks, ignoring the spaces for now. This gives us ["Hello World, ", "my id", " is 5555"]. The important thing to notice here is that the list contains three elements - an odd number. As long as the string only contains pairs of quotes (as it will if it's properly formatted), we'll always get an odd number of elements when we split at the quote marks.
A second important thing is that all the even-numbered elements of the list will be strings that were unquoted (if we start counting from 0), and the odd-numbered ones were quoted. That means that all we need to do is tokenize the ones that were unquoted, and then we're done!
I put some code together - you can continue from there:
fun foo s =
let
val quoteSep = String.tokens (fn c => c = #"\"") s
val spaceSep = String.tokens (fn c => c = #" ") (* change this to include newlines and stuff *)
fun sepEven [] = []
| sepEven [x] = (* there were no quotes in the string *)
| sepEven (x::y::xs) = (* x was unquoted, y was quoted *)
in
if length quoteSep mod 2 = 0
then (* there was an uneven number of quote marks - something is wrong! *)
else (* call sepEven *)
end
String.tokens brings you halfway there. But if you really want to handle quotes like you are sketching then there is no way around writing an actual lexer. MLlex, which comes with SML/NJ and MLton (but is usable with any SML) could help. Or you just write it by hand, which should be easy enough in this case as well.

filtering the words ending with "ed" or "ing" using haskell

hi am new to Haskell and functional programing..
i want to pass in the string and find the words ending with "ed" or "ing".
eg: if the string is "he is playing and he played well"
answer should be : playing, played
does anyone know how to do this using Haskell.
You can build this using standard Haskell functions. Start by importing Data.List:
import Data.List
Use isSuffixOf to determine if one list ends with another. Below endings could be ["ed","ing"] and w would be the word you're testing, such as "played".
hasEnding endings w = any (`isSuffixOf` w) endings
Assuming you have split the string into a list of individual words (ws below), use filter to eliminate the words you don't want:
wordsWithEndings endings ws = filter (hasEnding endings) ws
Use words to get the list of words from the original string. Use intercalculate to join the filtered words back into the final comma-separated string (or leave this off if you want the result as a list of words). Use . to chain these functions together.
wordsEndingEdOrIng ws = intercalate ", " . wordsWithEndings ["ed","ing"] . words $ ws
And you're done.
wordsEndingEdOrIng "he is playing and he played well"
If you're typing into ghci, put let in front of each of the function definitions (all lines but the last one).
contain w end = take (length end) (reverse w) == reverse end
findAll ends txt = filter (\w -> any (contain w) ends) (words txt)
main = getLine >>= print . findAll ["ing","ed"]
findAll :: [String] -> String -> [String]
findAll :: "endings" -> "your text" -> "right words"

Resources