Split string to a list of strings in Clean - string

Because of the limited amount of resources, I need to propose a question here. I have been struggling with functional programming, the endless Haskell tutorials don't really help me. So what I want to achieve, in Clean language, is to split a string like " car cow cat " to a list of strings ["car","cow","cat"]. Can you provide me a detailed answer (does not have to be complete code), on how to iterate through this string, and especially the part when the newly constructed strings are added to the list?

I'm going to offer a simple solution. There are infinitely better ways of doing this in Haskell, but it's the simplest I can think for someone new in functional programming, without using any specifically Haskell function like takeWhile, or even any folds and maps...
You basically want to simulate iterating over a list, so here is what I suggest:
Define a function that will take a string and a split-by character. This function will return a list of strings - spliton :: String -> Char -> [String]
To move over the list, we'll want to gobble up characters until we hit one of our splitting characters. We'll also want to save the word we've saved up until now, and the entire list of words.
For that, we'll define a subfunction that will save the states
spliton' :: String -> Char -> String -> [String] -> [String]
spliton' [] _ sofar res = res ++ [sofar]
I've also included the simplest clause - an empty string. When our string is empty, we'll just want to return what we have saved so far.
Now lets move on to our actual recursive function:
If we hit our split character, we'll add the string we have saved so far to the list and restart with an empty current-state string
If we don't hit the split character, we'll add the character to the current-state string
spliton' (currchar:rest) splitby sofar res
| currchar==splitby = spliton' rest splitby "" (res++[sofar])
| otherwise = spliton' rest splitby (sofar++[currchar]) res
So, to summarize our code:
spliton :: String -> Char -> [String]
spliton source splitchar = spliton' source splitchar [] []
spliton' :: String -> Char -> String -> [String] -> [String]
spliton' [] _ sofar res = res ++ [sofar]
spliton' (currchar:rest) splitby sofar res
| currchar==splitby = spliton' rest splitby "" (res++[sofar])
| otherwise = spliton' rest splitby (sofar++[currchar]) res
Note: This will not however get rid of the empty string - meaning if you have many superfluous spaces - you'll get them added to the list. I'll leave you to think how to handle that case - hope this can help you get started.

Let's split this up in several sub-problems:
Make a list of characters from the string so that we can easily apply pattern matching.
Scrape the initial part of the list (as long as possible with only spaces or only not-spaces), and only keep it when it is not whitespace.
Repeat the second step while the list is non-empty.
The first thing can be done using fromString. For the second and third step, we define a helper function:
scrape :: [Char] -> [String]
scrape [] = []
scrape cs=:[c:_]
| isSpace c = scrape (dropWhile isSpace cs)
| otherwise = [toString word:scrape rest]
where
(word,rest) = span (not o isSpace) cs
The first alternative is the base case to match the empty list. The second alternative matches the whole list cs with a first element c. If the first character is a space, we recursively (step 3) call the same function on the same list without the initial part of spaces. If the first character is not a space, we use span :: (a -> Bool) [a] -> ([a], [a]) to split the list in the initial part that is a word, and the rest. We store the word using toString as a string, and recursively call scrape for the rest of the list.
Now, we only need a wrapper to make this a function with the type String -> [String]:
split :: String -> [String]
split s = scrape (fromString s)
where
scrape :: [Char] -> [String]
scrape [] = []
scrape cs=:[c:_]
| isSpace c = scrape (dropWhile isSpace cs)
| otherwise = [toString word:scrape rest]
where
(word,rest) = span (not o isSpace) cs
Note that you can easily abstract from the delimiter, by passing a character d and replacing isSpace c with c == d and (not o isSpace) by ((<>) d). Alternatively, you can choose to not pass a character d but a function isDelim :: Char -> Bool. You then get isDelim c and (not o isDelim), respectively.

Related

Transforming a string into a list of pairs comprising the character from each run together with its number of repetitions

So I am required to convert a list of strings to a [(Char,Int)]. So for example, ["xxxxx","yyy"] to [('x',5), ('y',3)] . I am able to get the ('x',5) part without any issues but I am not sure how to move on to the next element of the list. Here is my code so far. Any pointers will be greatly appricated.
[(x,y) | let x = head(head(reap xs)), let y = length(head(reap xs)))]
p.s : reap is a function that turns a string into a list of repeated characters. For example "aaaabbbbccc" -> ["aaaa","bbbb","bbb"].
I suggest breaking this into smaller parts. First define a function that takes a single String and returns a tuple (Char, Int). Then once you have this function , you can use map to apply it to each String in a list.
You can use the fmap function which applies a function over or any item in the list.
The function charRepetitions accepts a list and uses the charRepetition function to transform an item.
main = do
_ <- print $ charRepetitions ["xxxxx","yyy"]
return ()
charRepetitions :: [String] -> [(Char, Int)]
charRepetitions xs = fmap charRepition xs
charRepetition :: String -> (Char, Int)
charRepetition s = (head s , length s)

Using the bind function to process a list

I am trying to convert a String of numbers (e.g. "2 3 9 10 14") into a list of Maybe [Token]. I have the following code where the function parseToken converts a String into a Maybe Token.
data Token = Num Int
parseToken :: String -> Maybe Token
parseToken str = fmap Num (readMaybe str)
For converting the String into a list of Maybe [Token], I have the following code below:
tokenise :: String -> Maybe [Token]
tokenise str = do
let (x:xs) = words str
y <- parseToken x
ys <- parseToken xs
return (y:ys)
I am trying to use the bind (>>=) function to do this. Initially I convert the string into a list of strings, using the words function. I then apply parseToken to the first element of the list, with the result (i.e. the Token value) of this stored in y.
However, I am not sure how I can apply parseToken to the rest of the list using bind. In general, if one wants to apply a function to every element of a list, while taking in the context of failure, and then join the results into a new list -
what is the best way to do this?
Any insights are appreciated.
You have merged two separate concerns in one function here:
Separating a string into components, and
turning each component into a token.
That's all fine and normal so far. What I would recommend, though, is splitting step (2) out into a separate function, and implementing your top-level thing in terms of it. So:
parseTokens :: [String] -> Maybe [Token]
parseTokens [] = ...
parseTokens (x:xs) = ...
I think you will find it easier to implement this than implementing tokenise wholesale, because when it comes time to deal with xs, you will find that you already have a function that does the thing you need on it. I recommend taking a stab at implementing this function; if you have trouble, then perhaps a fresh question with your attempt and why you believe it's not possible to make progress on it would be warranted.
Once you've done that, you can drop this function in place in your existing tokenise implementation:
tokenise str = do
let (x:xs) = words str
parseTokens (x:xs)
Of course, at this point there's no reason to pattern match on the result of words like that, since you just plan to pass on the result anyway:
tokenise str = do
let xs = words str
parseTokens xs
Most people would then inline xs,
tokenise str = do
parseTokens (words str)
drop the superfluous do,
tokenise str = parseTokens (words str)
and make it point-free.
tokenise = parseTokens . words

can not get the type of the function just right

I am trying to write a function which take a string and returns the original strings without the space character as a list of string e.g.
toStrings "thanks for your help" -> ["thanks", "for", "your", "help"].
I want to solve this problem using an accumulator so I did the following:
toStrings :: String -> [String]
toStrings str = go str []
where
go str acc
| str == [] = acc
| otherwise = go (dropWhile (/=' ') str) acc : (takeWhile (/=' ') str)
It does not work. The compiler says:
Couldn't match type '[Char]' with 'Char'
I thought I was working with Strings.
Help is much appreciated.
Thanks Eitan
takeWhile on a String will return a String. Therefore, you have
go (…) acc : takeWhile (…)
where the latter is a String. However, you need [String] at that point. Since String = [Char], we have the following type mismatch:
String = [Char] -- actual
[String] = [[Char]] -- expected
GHC then sees the [[Char]] and [Char], removes one list layer, and sees [Char] and Char, which cannot get simplified anymore.
That's why you get your error. Type synonyms and simplified types in error messages.
That being said, you never change the acc, nor do you drop the spaces afterwards. Your current implementation will therefore loop infinitely.
I suggest you to solve this problem without an accumulator and instead try to come up with something similar to
-- pseudo code
go str = firstWord str : go (restOfString str)
Keep in mind that firstWord should strip leading spaces, or you end up with an infinite loop.
I think it helps if you add the type to the go function. Based on the function description it should be:
toStrings :: String -> [String]
toStrings str = go str []
where
go str acc :: String -> [String] -> [String]
| str == [] = acc
| otherwise = go (dropWhile (/=' ') str) acc : (takeWhile (/=' ') str)
But in your recursive call, you call (go somestr acc) : someotherstr (I here use somestr and someotherstr to make it easier to see why it does not work). That does not match, since go somestr acc will result in a [String] (given that works), and someotherstr is a String. If you use the cons (:) it expects the head (left operand) to be a String, and the tail (right operand) to be a [String].
But in fact here we do not need to work with an accumulator at all. We can construct a "cons" and perform recursion at the tail, like:
toStrings :: String -> [String]
toStrings ss | null s1 = []
| otherwise = word : toStrings rest
where s1 = dropWhile (== ' ') ss
(word, rest) = span (/= ' ') s1
So first we drop all the spaces of the string ss, which is then s1. In case s1 is the empty list, then we are done, and we return the empty list. Otherwise we perform a span (a conditional split) such that we obtain a tuple with the word as the first item, and the rest of the string as second item. We then yield the word, and perform recursion on the rest.

Strip left/right a string (and chomp)

I have found nothing about how to strip a string (remove leading/trailing characters) in Haskell, and there’s no place place to find such a strip or chomp function (correct me if I’m wrong).
What am I gonna do?
Have a look at Data.Text. Anything that uses Prelude lists, such as Strings, usually performs poorly, especially with functions like stripR. Some consider it a mistake from the past, because it has infected a lot of (otherwise sensible) interfaces with the inefficiencies of using singly linked lists of characters (String) for textual data.
The functions you're looking for are, in order: dropWhile, dropWhileEnd, dropAround, stripStart, stripEnd, strip.
Note that there's no specific function for stripping based on character equality. You don't really gain anything from aliasing dropX with a predicate, unless it's a very commonly used one like Data.Char.isSpace.
First off, you should use Text (from the text package) instead of String, since text is much more efficient.
Also, text already has this function:
-- Remove leading and trailing white space from a string.
strip :: Text -> Text
The more general approach would be to pass a predicate to the strip functions, so one could stripL isSpace e.g. to remove all leading white space.
Then stripL would however just be an alias for dropWhile.
For the stripping of the end, a potentially more efficient version uses foldr,
stripR :: (a -> Bool) -> [a] -> [a]
stripR pred = foldr keepOrDrop []
where
keepOrDrop c xs
| pred c = case xs of
[] -> []
_ -> c:xs
| otherwise = c:xs
that can start producing output without traversing the entire input list, and is efficient if there are no long runs of elements satisfying the predicate it the input.
Here are 3 functions and 3 currified aliased functions to make it through:
stripL :: Char -> String -> String
stripL x = dropWhile (==x)
stripR :: Char -> String -> String
stripR x = reverse . stripL . reverse
strip :: Char -> String -> String
strip x = stripL x . stripR x
chompL :: String -> String
chompL = stripL ' '
chompR :: String -> String
chompR = stripR ' '
chomp :: String -> String
chomp = strip ' '
What do you think? Is it possible to add such functions to Data.String?

Haskell: Gluing a char and a list together?

So I have this code here:
toWords :: String -> [String]
toWords "" = []
toWords (nr1 : rest)
| nr1 == ' ' = toWords rest
| otherwise = [nr1] : toWords rest
The "toWords" function should simply remove all spaces and return a list with all the words. But this happens:
*Main> toWords "How are you?"
["H","o","w","a","r","e","y","o","u","?"]
Is this just me, or you are trying to re-invent "words" function from Prelude?
Your type should be String -> [String] or [Char] -> [[Char]].
Your input is a string (a list of chars) your output a list of string (a list of chars of chars).
Your type here means it maps a string to ANY type, this is not true.
Edit: alternatively you can use:
splitBy :: [a] -> a -> [[a]]
splitBy [] sep = []
splitBy (nr1 : rest) if nr1 == sep then splitBy rest sep else nr1 : (splitBy rest sep)
Which is polymorphic and splits a list by any separator. (code not tested though) so splitBy "string of words" ' ' should return ["string","of","words"].
FINALLY, this was annoying and had an obvious and stupid error of [] in lieu of [[]] the functional version is:
splitBy [] sep = [[]]
splitBy (nr1 : rest) sep = if nr1 == sep
then [] : splitBy rest sep
else (nr1 : head (splitBy rest sep)) : tail (splitBy rest sep)
Such that: splitBy "List of things" ' ' ===> ["list","of","things"]
Think about what this does:
It iterates through each character in the string.
| nr1 == ' ' = toWords rest
If the character is a space, it skips that character.
| otherwise = [nr1] : toWords rest
Otherwise it creates a string containing only that character, then continues doing so to the rest of the characters in the string.
What you want is to accumulate the characters in the word into a single list rather than creating a new list for each one.
Here's an example of how you could do it:
toWords "" = []
toWords (' ':rest) = toWords rest
toWords text = let (word, rest) = break (== ' ') text
in word : toWords rest
Two questions - firstly, why is your output polymorphic? It seems like it's invariably going to be a list of String rather than a list of a. I'm not that familiar with the Haskell type inference internals; try changing the signature to String -> String or String -> [Char] and see if it then works.
Secondly, when you do this it should become clear that your implementation is a little off; even if it did work, it would simply return your original string with all the spaces removed.
If you do want to return a list of strings, you'll need to have a helper function that builds up the current word so far, and then adds that whole word to the output list when the current character is a string. Since it seems like you're doing this to learn (else, just use the Prelude function) I won't give a listing but it shouldn't be too hard to figure out from here.

Resources