Do some replacement in Haskell List Comprehensions - string

My questions is if I put in a string containing such as Hello, today is a Nice Day!! How could I get rid of spaces and punctuation and also replacing the uppercase letters with lowercase?
I know how to delete them but not how to replace them.
Also to get rid of the punctuation.
Sorry I don't know how to mess around with strings, only numbers.
testList xs = [if x = [,|.|?|!] then " " | x<-xs]

import Data.Char
If you want convert the punctuation to space and the characters from upper case to lower case:
testList xs = [if x `elem` ",.?!" then ' ' else toLower x | x<-xs]
Example: testList "TeST,LiST!" == "test list "
If you want to delete the punctuation and convert the characters from upper case to lower case:
testList2 xs = [toLower x | x<-xs, not (x `elem` ",.?!")]
Example: testList2 "Te..S,!t LiS?T" == "test list"
If you don't want or can not import Data.Char, this is an implementation of toLower:
toLower' :: Char -> Char
toLower' char
| isNotUppercase = char -- no change required
| otherwise = toEnum (codeChar + diffLowerUpperChar) -- char lowered
where
codeChar = fromEnum char -- each character has a numeric code
code_A = 65
code_Z = 90
code_a = 97
isNotUppercase = codeChar < code_A || codeChar > code_Z
diffLowerUpperChar = code_a - code_A

I've been without writing a code in Haskell for a long time, but the following should remove the invalid characters (replace them by a space) and also convert the characters from Uppercase to Lowercase:
import Data.Char
replace invalid xs = [if elem x invalid then ' ' else toLower x | x <- xs]
Another way of doing the same:
repl invalid [] = []
repl invalid (x:xs) | elem x invalid = ' ' : repl invalid xs
| otherwise = toLower x : repl invalid xs
You can call the replace (or repl) function like this:
replace ",.?!" "Hello, today is a Nice Day!!"
The above code will return:
"hello today is a nice day "
Edit: I'm using the toLower function from Data.Char in Haskell, but if you want to write it by yourself, check here on Stack Overflow. That question has been asked before.

You will find the functions you need in Data.Char:
import Data.Char
process str = [toLower c | c <- str , isAlpha c]
Though personally, I think the function compositional approach is clearer:
process = map toLower . filter isAlpha

To get rid of the punctuation you can use a filter like this one
[x | x<-[1..10], x `mod` 2 == 0]
The "if" you are using won't filter. Putting an if in the "map" part of a list comprehension will only seve to choose between two options but you can't filter them out there.
As for converting things to lowercase, its the same trick as you can already pull off in numbers:
[x*2 | x <- [1..10]]

Here's a version without importing modules, using fromEnum and toEnum to choose which characters to allow:
testList xs =
filter (\x -> elem (fromEnum x) ([97..122] ++ [32] ++ [48..57])) $ map toLower' xs
where toLower' x = if elem (fromEnum x) [65..90]
then toEnum (fromEnum x + 32)::Char
else x
OUTPUT:
*Main> testList "Hello, today is a Nice Day!!"
"hello today is a nice day"
For a module-less replace function, something like this might work:
myReplace toReplace xs = map myReplace' xs where
myReplace' x
| elem (fromEnum x) [65..90] = toEnum (fromEnum x + 32)::Char
| elem x toReplace = ' '
| otherwise = x
OUTPUT:
*Main> myReplace "!," "Hello, today is a Nice Day!! 123"
"hello today is a nice day 123"

Using Applicative Style
A textual quote from book "Learn You a Haskell for Great Good!":
Using the applicative style on lists is often a good replacement for
list comprehensions. In the second chapter, we wanted to see all the
possible products of [2,5,10] and [8,10,11], so we did this:
[ x*y | x <- [2,5,10], y <- [8,10,11]]
We're just drawing from two lists and applying a function between
every combination of elements. This can be done in the applicative
style as well:
(*) <$> [2,5,10] <*> [8,10,11]
This seems clearer to me, because it's easier to see that we're just
calling * between two non-deterministic computations. If we wanted all
possible products of those two lists that are more than 50, we'd just
do:
filter (>50) $ (*) <$> [2,5,10] <*> [8,10,11]
-- [55,80,100,110]
Functors, Applicative Functors and Monoids

Related

Word count in haskell

I'm working on this exercise:
Given a phrase, count the occurrences of each word in that phrase.
For the purposes of this exercise you can expect that a word will always be one of:
A number composed of one or more ASCII digits (ie "0" or "1234") OR
A simple word composed of one or more ASCII letters (ie "a" or "they") OR
A contraction of two simple words joined by a single apostrophe (ie "it's" or "they're")
When counting words you can assume the following rules:
The count is case insensitive (ie "You", "you", and "YOU" are 3 uses of the same word)
The count is unordered; the tests will ignore how words and counts are ordered
Other than the apostrophe in a contraction all forms of punctuation are ignored
The words can be separated by any form of whitespace (ie "\t", "\n", " ")
For example, for the phrase "That's the password: 'PASSWORD 123'!", cried the Special > Agent.\nSo I fled. the count would be:
that's: 1
the: 2
password: 2
123: 1
cried: 1
special: 1
agent: 1
so: 1
i: 1
fled: 1
My code:
module WordCount (wordCount) where
import qualified Data.Char as C
import qualified Data.List as L
import Text.Regex.TDFA as R
wordCount :: String -> [(String, Int)]
wordCount xs =
do
ys <- words xs
let zs = R.getAllTextMatches (ys =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map (C.toLower) w | w <- zs]
return (head g, length g)
But it fails on the input "one fish two fish red fish blue fish". It outputs one count for each word, even the repeated ones, as if the sort and group aren't doing anything. Why?
I've read this answer, which basically does the same thing in a more advanced way using Control.Arrow.
You don't need to use words to split the line, the regex should achieve the desired splitting:
wordCount :: String -> [(String, Int)]
wordCount xs =
do
let zs = R.getAllTextMatches (xs =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map C.toLower w | w <- zs]
return (head g, length g)
wordCount xs =
do
ys <- words xs
let zs = R.getAllTextMatches (ys =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map (C.toLower) w | w <- zs]
return (head g, length g)
You’re splitting the input xs into words by whitespace using words. You iterate over these in the list monad with the binding statement ys <- …. Then you split each of those words into subwords using the regular expression, of which there happens to be only one match in your example. You sort and group each of the subwords in a list by itself.
I believe you can essentially just delete the initial call to words:
wordCount xs =
do
let ys = R.getAllTextMatches (xs =~ "\\d+|\\b[a-zA-Z']+\\b") :: [String]
g <- L.group $ L.sort [map C.toLower w | w <- ys]
return (head g, length g)

add spaces in between String: Haskell

I am trying to add a white space in between an inputted string and, although the code works, when I try to use the map function in conjunction with the code, it gives a pattern match failure when it hits the whitespace, any way I can ignore the whitespace or improve the code?
whiteSpace xs
| length xs <= 1 = xs
| otherwise = take 1 xs ++ " "++ whiteSpace (drop 1 xs)
Do you want to implement Data.List.intersperse
> intersperse ' ' "asdfasd"
"a s d f a s d"
a basic implementation for your use case can be
> let white :: String -> String
| white [] = []
| white [x] = [x]
| white (x:xs) = x:' ':white xs

Haskell: successive modifications of a text

I want to know how to make modifications to a text that is full of special characters and codes and replace those codes with strings.
I have the following text:
text=
"#chomsky/syntactic structures/chomskySynt/: published in 1957. #bloomfield/language/bloomfieldLan/: published in 1933. #chomsky/aspects of a theory of syntax/chomskyAsp/: published in 1965. ... #see/chomskySynt/ is considered the starting point of generative linguistics.... Another hypothesis was introduced in #see/chomskyAsp/."
I want to turn it into=
"Chomsky 1: Syntactic structures : published in 1957. Bloomfield 1: Language : published in 1933. Chomsky 2: Aspects of a theory of syntax : published in 1965. ... Chomsky 1 is considered the starting point of generative linguistics ... Another hypothesis was introduced in Chomsky 2..."
Explanation of the special characters and codes: the information on a book starts with # followed by the name of the author (chomsky for example) followed by / then title of the book / then the special code for the book (chomskyAsp) then /
The citation of a book starts with #see followed by / then the special code of the book (ex. chomskySyn) /
The modifications are:
To count how many times an author is cited and concatenate the number to the name: Chomsky 1, for example.
Author name will start with a capital letter
Remove the special code : chomskySynt which serves only as an identification code.
Replace the reference : #see/chomskyAsp with the Chomsky 2. That is replace the reference with the actual author and number.
Here is my code:
RemoveSlash = myReplace "/"" " text
removeDash = map lines $ (filter(any isLetter) . groupBy ( (==) `on` (=='#'))) $ removeSlash
flattenList= concat removeDash
splitIntoWords = map words flattenList
And here is the myReplace function:
myReplace _ _ [] = []
myReplace a b s#(x:xs)= if isPrefixOf a s
then b++myReplace a b (drop(length a)s)
else x: myReplace a b xs
Here is the result so far:
[["chomsky syntactic structures chomskySynt published in 1957. "], ["bloomfield language bloomfieldLan published in 1933. "],["chomsky aspects of a theory of syntax chomskyAsp published in 1965. ... "],["see chomskySynt is considered the starting point of generative linguistics.... Another hypothesis was introduced in "],["see chomskyAsp"]]
The reason I flattened the list and split it into words is now if I do:
map head splitIntoWords
I get ["chomsky","bloomfield","chomsky","see","see"]
I am stuck at this stage. How do I count how many times an author is cited and concatenate the number to the name. I thought of using the zip function:
zipChomsky =zip [1, 2][x | x <- diviser,(head x) == "chomsky"]
This gives:
[(1["chomsky","syntactic","structures","chomskySynt","published","in","1957."]),(2,["chomsky","aspects","of","a","theory","of","syntax","chomskyAsp","published","in","1965.","..."])]
But the result is very different from: Chomsky 1: ...
EDIT: I didn't mean to make the answer this long, but the problem turned out a non-trivial task, and I'm not quite sure how much detail I should put in the answer. In case you understand all the tools I'm using, the full code is just at the end of this answer.
In your case, you'll need:
an approach to parse your input document
a suitable data structure to store the input information
displaying the data as output format
For the parsing part, perhaps Regex is enough (maybe), but I guess the Parsec library is a better choice. For detailed usage of Parsec please refer to the link, and I'll only try to show how to use it in your case:
First, import Text.ParserCombinators.Parsec.
A document is a list of
a literal string
a definition, with format #<Author>/<Title>/<Code>/, as in "#chomsky/syntactic structures/chomskySynt/"
a citation, with format #see/<Code>/, as in "#see/chomskyAsp/"
Hence we define
data Index = Index {
getAuthor :: String,
getTitle :: String,
getSpecialCode :: String,
getAuthorCount :: Int
-- For counting author later.
} deriving (Show)
data Content = Def Index
| Cite String Index
-- We'll fill in Index later.
| Literal String
deriving (Show)
and our input document will just be turned into [Content].
Correspondingly, we'll use the following function (actually, parser) to parse the input:
document = many (try def <|> try cite <|> literal)
literal = Literal <$> many1 (noneOf "#")
def = do
char '#'
author <- many1 $ noneOf "/"
char '/'
title <- many1 $ noneOf "/"
char '/'
code <- many1 $ noneOf "/"
char '/'
return $ Def author title code
cite = do
try $ string "#see/"
code <- many1 $ noneOf "/"
char '/'
return $ Cite code nullIndex
A short explanation:
A document is many (def or cite or literal), with operator <|> combining parsers.
A literal is a string, stopping at '#', with at least 1 char (using many1); a parser inside many should not accept empty input, think of why!
A def is #<Author>/<Title>/<Code>/, and we can write in do-notation since Parser is a monad.
A cite goes similarly.
A def, cite, or string "#see/" parse multiple characters, hence is possible to fail when they have consumed some chars; therefore, we use the combinator try.
By the way, nullIndex is just a placeholder before we actually fill this record:
nullIndex :: Index
nullIndex = Index "" "" "" 0
Now we only need a function with signiture [Content] -> String.
We can start with captializing the author name:
capitalizeAuthor :: Content -> Content
capitalizeAuthor (Def x) = Def (x {getAuthor = author'}) where
author' = toUpper (head author) : tail author
author = getAuthor x
capitalizeAuthor y = y
The other tasks are not local, since the relation between Contents should be observed, hence we will use a foldl across the list.
Define
import Data.Map.Strict ((!))
import qualified Data.Map.Strict as M
type CodeDict = M.Map String Index
-- Map Code Index
type AuthorDict = M.Map String Int
-- Map Author Count
type Fold = (CodeDict, AuthorDict, [Content])
emptyFold :: Fold
emptyFold = (M.empty, M.empty, [])
The Fold type will store the state when we modify along the original [Content].
(I realize that the code will be much clearer if I use the State monad, but I'm not sure if I need to explain it then ...)
In addition, a folding function for foldl
accum :: Fold -> Content -> Fold
accum (c,a,ls) (Def x) = (c',a',Def x':ls) where
a' = M.insertWith (+) author 1 a
c' = M.insert code x' c
x' = x {getAuthorCount = count}
count = maybe 1 (+1) $ a !? author
author = getAuthor x
code = getSpecialCode x
accum (c,a,ls) (Cite code _) = (c,a,Cite code (c ! code) : ls)
accum (c,a,ls) y = (c,a,y:ls)
After foldr, the resulted list will contain the contents with
getAuthorCount correctly filled
Cites transferred into Defs, since they have the same outputting format.
The resulted list is reversed, so you'll need Data.List.reverse.
Finally, you can define your own version of Show for Content. For example,
instance Show Index where
show x = getAuthor x ++ " "
++ show (getAuthorCount x) ++ ": "
++ getTitle x ++ " "
instance Show Content where
show (Def idx) = show idx
show (Cite x idx) = getAuthor idx ++ " "
++ show (getAuthorCount idx)
show (Literal x) = x
as I figured out from your output sample.
The full length code:
import Data.Char
import Data.List (reverse)
import Data.Map.Strict ((!),(!?))
import qualified Data.Map.Strict as M
import Text.ParserCombinators.Parsec
data Index = Index {
getAuthor :: String,
getTitle :: String,
getSpecialCode :: String,
getAuthorCount :: Int
-- For counting author later.
}
nullIndex :: Index
nullIndex = Index "" "" "" 0
instance Show Index where
show x = getAuthor x ++ " "
++ show (getAuthorCount x) ++ ": "
++ getTitle x ++ " "
data Content = Def Index
| Cite String Index
| Literal String
instance Show Content where
show (Def idx) = show idx
show (Cite x idx) = getAuthor idx ++ " "
++ show (getAuthorCount idx)
show (Literal x) = x
document = many (try cite <|> try def <|> literal)
literal = Literal <$> many1 (noneOf "#")
def = do
char '#'
author <- many1 $ noneOf "/"
char '/'
title <- many1 $ noneOf "/"
char '/'
code <- many1 $ noneOf "/"
char '/'
return $ Def $ Index author title code 0
cite = do
try $ string "#see/"
code <- many1 $ noneOf "/"
char '/'
return $ Cite code nullIndex
capitalizeAuthor :: Content -> Content
capitalizeAuthor (Def x) = Def (x {getAuthor = author'}) where
author' = toUpper (head author) : tail author
author = getAuthor x
capitalizeAuthor y = y
type CodeDict = M.Map String Index
-- Map Code Index
type AuthorDict = M.Map String Int
-- Map Author Count
type Fold = (CodeDict, AuthorDict, [Content])
emptyFold :: Fold
emptyFold = (M.empty, M.empty, [])
accum :: Fold -> Content -> Fold
accum (c,a,ls) (Def x) = (c',a',Def x':ls) where
a' = M.insertWith (+) author 1 a
c' = M.insert code x' c
x' = x {getAuthorCount = count}
count = maybe 1 (+1) $ a !? author
author = getAuthor x
code = getSpecialCode x
accum (c,a,ls) (Cite code _) = (c,a,Cite code (c ! code) : ls)
accum (c,a,ls) y = (c,a,y:ls)
main :: IO ()
main = do
line <- getLine
let parsed = parse document "" line
case parsed of
Left x -> print x
Right cs -> do
let cs1 = map capitalizeAuthor cs
let (_,_,cs2) = foldl accum emptyFold cs1
let output = concatMap show $ reverse cs2
putStrLn output

Cutting specific chunks from a Haskell String

I'm trying to cut chunks from a list, with a given predicate. I would have preferred to use a double character, e.g. ~/, but have resolved to just using $. What I essentially want to do is this...
A: "Hello, my $name is$ Danny and I $like$ Haskell"
What I want to turn this into is this:
B: "Hello, my Danny and I Haskell"
So I want to strip everything in between the given symbol, $, or my first preference was ~/, if I can figure it out. What I tried was this:
s1 :: String -> String
s1 xs = takeWhile (/= '$') xs
s2 :: String -> String
s2 xs = dropWhile (/= '$') xs
s3 :: String -> String
s3 xs = s3 $ s2 $ s1 xs
This solution seems to just bug my IDE out (possibly infinite looping).
Solution:
s3 :: String -> String
s3 xs
|'$' `notElem` xs = xs
|otherwise = takeWhile (/= '$') xs ++ (s3 $ s1 xs)
s1 :: String -> String
s1 xs = drop 1 $ dropWhile (/= '$') $ tail $ snd $ break ('$'==) xs
This seems like a nice application for parsers. A solution using trifecta:
import Control.Applicative
import Data.Foldable
import Data.Functor
import Text.Trifecta
input :: String
input = "Hello, my $name is$ Danny and I $like$ Haskell"
cutChunk :: CharParsing f => f String
cutChunk = "" <$ (char '$' *> many (notChar '$') <* char '$')
cutChunk matches $, followed by 0 or more (many) non-$ characters, then another $. Then we use ("" <$) to make this parser's value always be the empty string, thus discarding all the characters that this parser matches.
includeChunk :: CharParsing f => f String
includeChunk = some (notChar '$')
includeChunk matches the text that we want to include in the result, which is anything that's not the $ character. It's important that we use some (matching one or more characters) and not many (matching zero or more characters) because we're going to include this parser within another many expression next; if this parser matched on the empty string, then that could loop infinitely.
chunks :: CharParsing f => f String
chunks = fold <$> many (cutChunk <|> includeChunk)
chunks is the parser for everything. Read <|> as "or", as in "parse either a cutChunk or an includeChunk". many (cutChunk <|> includeChunk) is a parser that produces a list of chunks e.g. Success ["Hello, my ",""," Danny and I ",""," Haskell"], so we fold the output to concatenate those chunks together into a single string.
result :: Result String
result = parseString chunks mempty input
The result:
Success "Hello, my Danny and I Haskell"
Your infinite loop comes from calling s3 recursively with no base case:
s3 :: String -> String
s3 xs = s3 $ s2 $ s1 xs
Adding a base case corrects the infinite loop:
s3 xs
| '$' `notElem` xs = xs
| otherwise = ...
This is not the whole answer. Think about what s1 actually does and where you use its return value:
s1 "hello $my name is$ ThreeFx" == "hello "
For further reference, see the break function:
break :: (a -> Bool) -> [a] -> ([a], [a])
I think your logic is wrong, perhaps easier to write it in an elementary way
Prelude> let pr xs = go xs True
Prelude| where go [] _ = []
Prelude| go (x:xs) f | x=='$' = go xs (not f)
Prelude| | f = x : go xs f
Prelude| | otherwise = go xs f
Prelude|
Prelude> pr "Hello, my $name is$ Danny and I $like$ Haskell"
"Hello, my Danny and I Haskell"
Explanation The flag f keeps track of the state (either pass mode or not). If the current char is a token skip and switch state.

Improve a haskell script

I'm a newbie in Haskell and I'd like some opinions about improving this script. This is a code generator and requires a command line argument to generate the sql script.
./GenCode "people name:string age:integer"
Code:
import Data.List
import System.Environment (getArgs)
create_table :: String -> String
create_table str = "CREATE TABLE " ++ h (words str)
where h (x:xs) = let cab = x
final = xs
in x ++ "( " ++ create_fields xs ++ ")"
create_fields (x:xs) = takeWhile (/=':') x ++ type x ++ sig
where sig | length xs > 0 = "," ++ create_fields xs
| otherwise = " " ++ create_fields xs
create_fields [] = ""
type x | isInfixOf "string" x = " CHARACTER VARYING"
| isInfixOf "integer" x = " INTEGER"
| isInfixOf "date" x = " DATE"
| isInfixOf "serial" x = " SERIAL"
| otherwise = ""
main = mainWith
where mainWith = do
args <- getArgs
case args of
[] -> putStrLn $ "You need one argument"
(x:xs) -> putStrLn $ (create_table x)
I think you understand how to write functional code already. Here are some small style notes:
Haskell usually uses camelCase, not under_score_separation
In create_table, cabo and final are not used.
Usually a list-recursive function like create_fields puts the empty list case first.
I would not make create_fields recursive anyway. The comma-joining code is quite complicated and should be separated from the typing code. Instead do something like Data.List.intercalate "," (map create_field xs). Then create_field x can just be takeWhile (/=':') x ++ type x
Especially if there are a lot of types to be translated, you might put them into a map
Like so:
types = Data.Map.fromList [("string", "CHARACTER VARYING")
,("integer", "INTEGER")
-- etc
]
Then type can be Data.Maybe.fromMaybe "" (Data.Map.lookup x types)
Code can appear in any order, so it's nice to have main up front. (This is personal preference though)
You don't need mainWith.
Just say
main = do
args <- getArgs
case args of
[] -> ...
You don't need the dollar for the calls to putStrLn. In the first call, the argument wouldn't require parentheses anyway, and in the second, you supply the parentheses. Alternatively, you could keep the second dollar and drop the parentheses.
Don't use length xs > 0 (in sig); it needlessly counts the length of xs when all you really wanted to know is whether it's empty. Use null xs to check for a non-empty list:
...
where sig | null xs = ... -- Empty case
| otherwise = ... -- Non-empty case
or add an argument to sig and pattern match:
...
where sig (y:ys) = ...
sig [] = ...
Although Nathan Sanders' advice to replace the whole recursive thing with intercalate is excellent and makes this a moot point.
You're also identifying the type by passing the whole "var:type" string into type, so it is testing
"string" `isInfixOf` "name:string"
etc.
You could use break or span instead of takeWhile to separate the name and type earlier:
create_fields (x:xs) = xname ++ type xtype ++ sig
where
(xname, _:xtype) = break (==':') x
sig = ...
and then type can compare for string equality, or look up values using a Map.
A quick explanation of that use of break:
break (==':') "name:string" == ("name", ":string")
Then when binding
(xname, _:xtype) to ("name", ":string"),
xname -> "name"
_ -> ':' (discarded)
xtype -> "string"

Resources