Insert space after every punctuation sign in a String Haskell - string

I have this function that checks if a character is one of these punctuation signs.
checkpunctuation:: Char -> Bool
checkpunctuationc = c `elem` ['.', ',', '?', '!', ':', ';', '(', ')']
I have to write another function that after every punctuation sign it adds a space
format :: String -> String
I know how to add space after a given number of characthers but don't know how to add after specific characters.

Simple recursive option:
format :: String -> String
format [] = []
format (x:xs) | checkpuntuationc x = x : ' ' : format xs
| otherwise = x : format xs

Another option is to use foldr with a helper function:
helper :: Char -> String -> String
helper x xs | checkpunctuation x = x : ' ' : xs
| otherwise = x : xs
The helper checks if the first character is a punctuation. If so it inserts a space, otherwise it does not.
and then define format as:
format :: String -> String
format = foldr helper []
A sample call:
*Main> format "Hello? Goodbye! You say goodbye!! (and I say Hello)"
"Hello? Goodbye! You say goodbye! ! ( and I say Hello) "
This function works also on "infinite strings":
*Main> take 50 $ format $ cycle "Hello?Goodbye!"
"Hello? Goodbye! Hello? Goodbye! Hello? Goodbye! He"
So although we feed it a string that keeps cycle-ing, and thus never ends, we can derive the first 50 characters of the result.

There's probably a more elegant way to do it, but
format :: String -> String
format s = concat [if (checkpunctuation c) then (c:" ") else [c] | c <- s]
will work (thanks, #Shou Ya!).
Edit based on comment
To count the total length of post-formatted punctuation characters, you can use
sumLength :: [String] -> Int
sumLength strings = 2 * (sum $ fmap length (fmap (filter checkpunctuation) strings))
as the it is twice the sum of the number of punctuation characters.

Related

Word count in haskell

I'm working on this exercise:
Given a phrase, count the occurrences of each word in that phrase.
For the purposes of this exercise you can expect that a word will always be one of:
A number composed of one or more ASCII digits (ie "0" or "1234") OR
A simple word composed of one or more ASCII letters (ie "a" or "they") OR
A contraction of two simple words joined by a single apostrophe (ie "it's" or "they're")
When counting words you can assume the following rules:
The count is case insensitive (ie "You", "you", and "YOU" are 3 uses of the same word)
The count is unordered; the tests will ignore how words and counts are ordered
Other than the apostrophe in a contraction all forms of punctuation are ignored
The words can be separated by any form of whitespace (ie "\t", "\n", " ")
For example, for the phrase "That's the password: 'PASSWORD 123'!", cried the Special > Agent.\nSo I fled. the count would be:
that's: 1
the: 2
password: 2
123: 1
cried: 1
special: 1
agent: 1
so: 1
i: 1
fled: 1
My code:
module WordCount (wordCount) where
import qualified Data.Char as C
import qualified Data.List as L
import Text.Regex.TDFA as R
wordCount :: String -> [(String, Int)]
wordCount xs =
do
ys <- words xs
let zs = R.getAllTextMatches (ys =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map (C.toLower) w | w <- zs]
return (head g, length g)
But it fails on the input "one fish two fish red fish blue fish". It outputs one count for each word, even the repeated ones, as if the sort and group aren't doing anything. Why?
I've read this answer, which basically does the same thing in a more advanced way using Control.Arrow.
You don't need to use words to split the line, the regex should achieve the desired splitting:
wordCount :: String -> [(String, Int)]
wordCount xs =
do
let zs = R.getAllTextMatches (xs =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map C.toLower w | w <- zs]
return (head g, length g)
wordCount xs =
do
ys <- words xs
let zs = R.getAllTextMatches (ys =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map (C.toLower) w | w <- zs]
return (head g, length g)
You’re splitting the input xs into words by whitespace using words. You iterate over these in the list monad with the binding statement ys <- …. Then you split each of those words into subwords using the regular expression, of which there happens to be only one match in your example. You sort and group each of the subwords in a list by itself.
I believe you can essentially just delete the initial call to words:
wordCount xs =
do
let ys = R.getAllTextMatches (xs =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map C.toLower w | w <- ys]
return (head g, length g)

Replacing Ints with Strings

I need help with an exercise. I'm trying to write a function called "refer" that will take a list of novels and any given text with citations ([n]) and will return that text with the citations replaced by the novel's name.
refer will have the following signature of:
refer :: [(String, String, Int)] -> String -> String
For example, this is how it will be ran:
> refer [("author1", "novel1", 1999),("author2", "novel2", 2000)] txt
> "novel1(author1, 1999) and novel2(author2, 2000) are my favorite books of all time, but I like novel2(author2, 2000) the most!"
I wrote a function called, txt, which will show my text that I will use.
txt :: String
txt = "[1] and [2] are my favorite books of all time, but I like [2] the most!"
I wrote a helper function called, format, which will help me format the novels from ("author1", "novel1", 1999) to "novel1(author1, 1999)"
format :: (String, String, Int) -> String
format (name, novel, yearInt) = novel ++ " (" ++ name ++
", " ++ (show yearInt) ++ ")"
WHAT I THINK I NEED TO DO:
Step 1: I need to use words to break the input into a list of strings.
Step 2: I should make a helper function to parse through the list and if I find a citation, I should use format to replace that citation and recursively parse through the rest of the list until I've checked everything.
Step 3: Make a helper function to convert the string representation of the citation number into an Int (possibly, unwords) since I have to replace the citation with its corresponding element in the given list.
Step 4: Then I need to use rewords to turn my updated list back into a string.
WHAT I HAVE SO FAR:
refer :: [(String, String, Int)] -> String -> String
refer [] "" = ""
refer books txt = [string'| y <- words txt, ........... ]
-- I'm trying to say that I want to take the inputted list of
-- novels and text and turn them all into strings and store
-- them into y which will be represented by string'. I am not
-- sure what to do after this.
You could use words, but then you lose information about the white space between the words - i.e. words "a b" equals words "a b". Maybe this is not important, but it is something to keep in mind.
Without providing the exact solution, here is a function which replaces a with a' and b with b' in a list:
replace a a' b b' [] = [] -- base case
replace a a' b b' (x:xs) = c : replace a a' b b' xs
where c = if x == a then a' else if x == b then b' else x
Perhaps you can figure out how to adapt this to your problem.
Note also that this function may be written using map:
replace a a' b b' xs = map f xs
where f x = if x == a then a' else if x == b then b' else x
Another approach to this kind of string processing is to pattern match against the characters. Here is a function which removes all occurrences of "cat" from a string:
removeCat :: String -> String
removeCat ('c':'a':'t':rest) = rest -- "cat" found so remove it
removeCat (x:rest) = x : removeCat rest
If this is for a homework problem please don't copy verbatim.
It's easier to solve the generic problem.
First, you need a replace function to replace a substring. A straightforward implementation can be
replace :: String -> String -> String -> String
replace old new [] = []
replace old new input = if (take n input == old) then new++(recurse n)
else (take 1 input)++(recurse 1)
where n = length old
recurse k = replace old new $ drop k input
tries to match the "old" from the beginning of input and if matched replace and skip the length of old, if not matched move one position and repeat.
So far so good, this will replace all occurrences of "old" to "new". However, you need multiple different replacements. For that let's write another function. To minimize validations assume we paired all replacements.
replaceAll :: [(String,String)] -> String -> String
replaceAll [] input = input
replaceAll ((old,new):xs) input = replaceAll xs $ replace old new input
You can make the top level signature better by defining a type such as
type Novel = (String,String,Int)
finally,
refer :: [Novel] -> String -> String
refer [] text = text
refer ns text = replaceAll p text
where p = [ ("["++show i++"]",format n) | (i,n) <- zip [1..] ns]
notice that the citation input is derived from the position of the Novels.

Haskell replace space with Char

I'm trying to come up with a function that will replace all the blank spaces in a string with "%50" or similar and I know I'm messing up something with my types but can't seem to figure it out I have been trying the following (yes I have imported Data.Char)
newLine :: String -> String
newLine xs = if x `elem` " " then "%50"
I also tried the if then else statement but really didn't know what to do with the else so figured just lowercase all the letters with
newLine xs = [if x `elem` ' ' then '%50' else toLower x | x<-xs]
would like the else statement to simply do nothing but have searched and found no way of doing that so i figured if everything was lowercase it wouldn't really matter just trying to get this to work.
Try simple solution
newLine :: String -> String
newline "" = ""
newLine (' ':xs) = '%':'5':'0': newLine xs
newLine (x:xs) = x: newLine xs
or use library function
You're running into type mismatch issues. The approach you're currently using would work if you were replacing a Char with another Char. For example, to replace spaces with asterisks:
newLine xs = [if x == ' ' then '*' else toLower x | x<-xs]
Or if you wanted to replace both spaces and newlines with asterisks, you could use the elem function. But note that the elem function takes an array (or a String, which is the same as [Char]). In your example, you were trying to pass it a single element, ' '. This should work:
newLine xs = [if x `elem` " \n" then '*' else toLower x | x<-xs]
However, you want to replace a Char with a String ([Char]). So you need a different approach. The solution suggested by viorior looks good to me.
Well, the list comprehension is almost correct. Problem is:
"%50" is not a valid character literal, so you can't have '%50'. If you actually mean the three characters %, 5 and 0, it needs to be a String instead.
' ' is a correct character literal, but the character x can't be element of another char. You certainly mean simply x == ' '.
Now that would suggest the solution
[if x == ' ' then "%50" else toLower x | x<-xs]
but this doesn't quite work because you're mixing strings ("%50") and single-characters in the same list. That can easily be fixed though, by "promoting" x to a single-char string:
[if x == ' ' then "%50" else [toLower x] | x<-xs]
The result has then type [String], which can be "flattened" to a single string with the prelude concat function.
concat [if x == ' ' then "%50" else [toLower x] | x<-xs]
An alternative way of writing this is
concatMap (\x -> if x == ' ' then "%50" else [toLower x]) xs
or – exactly the same with more general infix operators
xs >>= \x -> if x == ' ' then "%50" else [toLower x]
To replace characters with possibly longer strings, one can follow this approach:
-- replace single characters
replace :: Char -> String
replace ' ' = "%50"
replace '+' = "Hello"
replace c | isAlpha c = someStringFunctionOf c
replace _ = "DEFAULT"
-- extend to strings
replaceString :: String -> String
replaceString s = concat (map replace s)
The last line can also be written as
replaceString s = concatMap replace s
or even
replaceString s = s >>= replace
or even
replaceString = (>>= replace)
import Data.List
newLine :: String -> String
newLine = intercalate "%50" . words

How do I replace space characters in a string with "%20"?

I wanted to write a Haskell function that takes a string, and replaces any space characters with the special code %20. For example:
sanitize "http://cs.edu/my homepage/I love spaces.html"
-- "http://cs.edu/my%20homepage/I%20love%20spaces.html"
I am thinking to use the concat function, so I can concatenates a list of lists into a plain list.
The higher-order function you are looking for is
concatMap :: (a -> [b]) -> [a] -> [b]
In your case, choosing a ~ Char, b ~ Char (and observing that String is just a type synonym for [Char]), we get
concatMap :: (Char -> String) -> String -> String
So once you write a function
escape :: Char -> String
escape ' ' = "%20"
escape c = [c]
you can lift that to work over strings by just writing
sanitize :: String -> String
sanitize = concatMap escape
Using a comprehension also works, as follows,
changer :: [Char] -> [Char]
changer xs = [ c | v <- xs , c <- if (v == ' ') then "%20" else [v] ]
changer :: [Char] -> [Char] -> [Char]
changer [] res = res
changer (x:xs) res = changer xs (res ++ (if x == ' ' then "%20" else [x]))
sanitize :: [Char] -> [Char]
sanitize xs = changer xs ""
main = print $ sanitize "http://cs.edu/my homepage/I love spaces.html"
-- "http://cs.edu/my%20homepage/I%20love%20spaces.html"
The purpose of sanitize function is to just invoke changer, which does the actual work. Now, changer recursively calls itself, till the current string is exhausted.
changer xs (res ++ (if x == ' ' then "%20" else [x]))
It takes the first character x and checks if it is equal to " ", if so gives %20, otherwise the actual character itself as a string, which we then concatenate with the accumulated string.
Note: This is may not be the optimal solution.
You can use intercalate function from Data.List module. It does an intersperse with given separator and list, then concats the result.
sanitize = intercalate "%20" . words
or using pattern matching :
sanitize [] = []
sanitize (x:xs) = go x xs
where go ' ' [] = "%20"
go y [] = [y]
go ' ' (x:xs) = '%':'2':'0': go x xs
go y (x:xs) = y: go x xs
Another expression of Shanth's pattern-matching approach:
sanitize = foldr go []
where
go ' ' r = '%':'2':'0':r
go c r = c:r

Haskell extract substring within a string

My goal is to find the number of times a substring exists within a string.
The substring I'm looking for will be of type "[n]", where n can be any variable.
My attempt involved splitting the string up using the words function,
then create a new list of strings if the 'head' of a string was '[' and
the 'last' of the same string was ']'
The problem I ran into was that I entered a String which when split using
the function words, created a String that looked like this "[2],"
Now, I still want this to count as an occurrence of the type "[n]"
An example would be I would want this String,
asdf[1]jkl[2]asdf[1]jkl
to return 3.
Here's the code I have:
-- String that will be tested on references function
txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
-- Function that will take a list of Strings and return a list that contains
-- any String of the type [n], where n is an variable
ref :: [String] -> [String]
ref [] = []
ref xs = [x | x <- xs, head x == '[', last x == ']']
-- Function takes a text with references in the format [n] and returns
-- the total number of references.
-- Example : ghci> references txt -- -> 3
references :: String -> Integer
references txt = len (ref (words txt))
If anyone can enlighten me on how to search for a substring within a string
or how to parse a string given a substring, that would be greatly appreciated.
I would just use a regular expression, and write it like this:
import Text.Regex.Posix
txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
-- references counts the number of references in the input string
references :: String -> Int
references str = str =~ "\\[[0-9]*\\]"
main = putStrLn $ show $ references txt -- outputs 3
regex is huge overkill for such a simple problem.
references = length . consume
consume [] = []
consume ('[':xs) = let (v,rest) = consume' xs in v:consume rest
consume (_ :xs) = consume xs
consume' [] = ([], [])
consume' (']':xs) = ([], xs)
consume' (x :xs) = let (v,rest) = consume' xs in (x:v, rest)
consume waits for a [ , then calls consume', which gathers everything until a ].
Here's a solution with
sepCap.
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Either
import Data.Maybe
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
pattern = single '[' *> anySingle <* single ']' :: Parsec Void String Char
length $ rights $ fromJust $ parseMaybe (sepCap pattern) txt
3

Resources