Haskell extract substring within a string - string

My goal is to find the number of times a substring exists within a string.
The substring I'm looking for will be of type "[n]", where n can be any variable.
My attempt involved splitting the string up using the words function,
then create a new list of strings if the 'head' of a string was '[' and
the 'last' of the same string was ']'
The problem I ran into was that I entered a String which when split using
the function words, created a String that looked like this "[2],"
Now, I still want this to count as an occurrence of the type "[n]"
An example would be I would want this String,
asdf[1]jkl[2]asdf[1]jkl
to return 3.
Here's the code I have:
-- String that will be tested on references function
txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
-- Function that will take a list of Strings and return a list that contains
-- any String of the type [n], where n is an variable
ref :: [String] -> [String]
ref [] = []
ref xs = [x | x <- xs, head x == '[', last x == ']']
-- Function takes a text with references in the format [n] and returns
-- the total number of references.
-- Example : ghci> references txt -- -> 3
references :: String -> Integer
references txt = len (ref (words txt))
If anyone can enlighten me on how to search for a substring within a string
or how to parse a string given a substring, that would be greatly appreciated.

I would just use a regular expression, and write it like this:
import Text.Regex.Posix
txt :: String
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
-- references counts the number of references in the input string
references :: String -> Int
references str = str =~ "\\[[0-9]*\\]"
main = putStrLn $ show $ references txt -- outputs 3

regex is huge overkill for such a simple problem.
references = length . consume
consume [] = []
consume ('[':xs) = let (v,rest) = consume' xs in v:consume rest
consume (_ :xs) = consume xs
consume' [] = ([], [])
consume' (']':xs) = ([], xs)
consume' (x :xs) = let (v,rest) = consume' xs in (x:v, rest)
consume waits for a [ , then calls consume', which gathers everything until a ].

Here's a solution with
sepCap.
import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Either
import Data.Maybe
txt = "[1] and [2] both feature characters who will do whatever it takes to " ++
"get to their goal, and in the end the thing they want the most ends " ++
"up destroying them. In case of [2], this is a whale..."
pattern = single '[' *> anySingle <* single ']' :: Parsec Void String Char
length $ rights $ fromJust $ parseMaybe (sepCap pattern) txt
3

Related

Is there a type that includes strings and lists in haskell?

Right now I'm trying to make a basic function that removes any spaces or commas from a sentence.
stringToIntList :: [Char] -> [Char]
stringToIntList inpt = [ a | a <- inpt, a `elem` [" ",","]]
The problem I'm experiencing is that every type I've tried doesn't work, if I put [Char] it freaks out at commas, if I put [string] it freaks out at spaces, and if I put string it just doesn't recognize a and says it's an error. so I was wondering if there was some type that could work as both a [Char] and a [string].
With the current type, the definition needs to be
stringToIntList inpt = [ a | a <- inpt, a `elem` [' ',',']]
(single quotes, because these are Char literals, not String ones!),or alternatively
stringToIntList inpt = [ a | a <- inpt, a `elem` " ,"]
(using the fact that a string is just a list of characters),or simply
stringToIntList = filter (`elem` " ,")
Note that this doesn't remove spaces and commas, on the contrary those are the only characters it keeps. To remove them instead, you need to invert the predicate:
stringToIntList = filter $ not . (`elem` " ,")
As Iceland_jack comments, there is actually a standard function for this combination:
stringToIntList = filter (`notElem` " ,")
If you really did want a `elem` [" ",","] then the type of your function would need to be
stringToIntList :: [String] -> [String]
or equivalently [[Char]] -> [[Char]].

Haskell: successive modifications of a text

I want to know how to make modifications to a text that is full of special characters and codes and replace those codes with strings.
I have the following text:
text=
"#chomsky/syntactic structures/chomskySynt/: published in 1957. #bloomfield/language/bloomfieldLan/: published in 1933. #chomsky/aspects of a theory of syntax/chomskyAsp/: published in 1965. ... #see/chomskySynt/ is considered the starting point of generative linguistics.... Another hypothesis was introduced in #see/chomskyAsp/."
I want to turn it into=
"Chomsky 1: Syntactic structures : published in 1957. Bloomfield 1: Language : published in 1933. Chomsky 2: Aspects of a theory of syntax : published in 1965. ... Chomsky 1 is considered the starting point of generative linguistics ... Another hypothesis was introduced in Chomsky 2..."
Explanation of the special characters and codes: the information on a book starts with # followed by the name of the author (chomsky for example) followed by / then title of the book / then the special code for the book (chomskyAsp) then /
The citation of a book starts with #see followed by / then the special code of the book (ex. chomskySyn) /
The modifications are:
To count how many times an author is cited and concatenate the number to the name: Chomsky 1, for example.
Author name will start with a capital letter
Remove the special code : chomskySynt which serves only as an identification code.
Replace the reference : #see/chomskyAsp with the Chomsky 2. That is replace the reference with the actual author and number.
Here is my code:
RemoveSlash = myReplace "/"" " text
removeDash = map lines $ (filter(any isLetter) . groupBy ( (==) `on` (=='#'))) $ removeSlash
flattenList= concat removeDash
splitIntoWords = map words flattenList
And here is the myReplace function:
myReplace _ _ [] = []
myReplace a b s#(x:xs)= if isPrefixOf a s
then b++myReplace a b (drop(length a)s)
else x: myReplace a b xs
Here is the result so far:
[["chomsky syntactic structures chomskySynt published in 1957. "], ["bloomfield language bloomfieldLan published in 1933. "],["chomsky aspects of a theory of syntax chomskyAsp published in 1965. ... "],["see chomskySynt is considered the starting point of generative linguistics.... Another hypothesis was introduced in "],["see chomskyAsp"]]
The reason I flattened the list and split it into words is now if I do:
map head splitIntoWords
I get ["chomsky","bloomfield","chomsky","see","see"]
I am stuck at this stage. How do I count how many times an author is cited and concatenate the number to the name. I thought of using the zip function:
zipChomsky =zip [1, 2][x | x <- diviser,(head x) == "chomsky"]
This gives:
[(1["chomsky","syntactic","structures","chomskySynt","published","in","1957."]),(2,["chomsky","aspects","of","a","theory","of","syntax","chomskyAsp","published","in","1965.","..."])]
But the result is very different from: Chomsky 1: ...
EDIT: I didn't mean to make the answer this long, but the problem turned out a non-trivial task, and I'm not quite sure how much detail I should put in the answer. In case you understand all the tools I'm using, the full code is just at the end of this answer.
In your case, you'll need:
an approach to parse your input document
a suitable data structure to store the input information
displaying the data as output format
For the parsing part, perhaps Regex is enough (maybe), but I guess the Parsec library is a better choice. For detailed usage of Parsec please refer to the link, and I'll only try to show how to use it in your case:
First, import Text.ParserCombinators.Parsec.
A document is a list of
a literal string
a definition, with format #<Author>/<Title>/<Code>/, as in "#chomsky/syntactic structures/chomskySynt/"
a citation, with format #see/<Code>/, as in "#see/chomskyAsp/"
Hence we define
data Index = Index {
getAuthor :: String,
getTitle :: String,
getSpecialCode :: String,
getAuthorCount :: Int
-- For counting author later.
} deriving (Show)
data Content = Def Index
| Cite String Index
-- We'll fill in Index later.
| Literal String
deriving (Show)
and our input document will just be turned into [Content].
Correspondingly, we'll use the following function (actually, parser) to parse the input:
document = many (try def <|> try cite <|> literal)
literal = Literal <$> many1 (noneOf "#")
def = do
char '#'
author <- many1 $ noneOf "/"
char '/'
title <- many1 $ noneOf "/"
char '/'
code <- many1 $ noneOf "/"
char '/'
return $ Def author title code
cite = do
try $ string "#see/"
code <- many1 $ noneOf "/"
char '/'
return $ Cite code nullIndex
A short explanation:
A document is many (def or cite or literal), with operator <|> combining parsers.
A literal is a string, stopping at '#', with at least 1 char (using many1); a parser inside many should not accept empty input, think of why!
A def is #<Author>/<Title>/<Code>/, and we can write in do-notation since Parser is a monad.
A cite goes similarly.
A def, cite, or string "#see/" parse multiple characters, hence is possible to fail when they have consumed some chars; therefore, we use the combinator try.
By the way, nullIndex is just a placeholder before we actually fill this record:
nullIndex :: Index
nullIndex = Index "" "" "" 0
Now we only need a function with signiture [Content] -> String.
We can start with captializing the author name:
capitalizeAuthor :: Content -> Content
capitalizeAuthor (Def x) = Def (x {getAuthor = author'}) where
author' = toUpper (head author) : tail author
author = getAuthor x
capitalizeAuthor y = y
The other tasks are not local, since the relation between Contents should be observed, hence we will use a foldl across the list.
Define
import Data.Map.Strict ((!))
import qualified Data.Map.Strict as M
type CodeDict = M.Map String Index
-- Map Code Index
type AuthorDict = M.Map String Int
-- Map Author Count
type Fold = (CodeDict, AuthorDict, [Content])
emptyFold :: Fold
emptyFold = (M.empty, M.empty, [])
The Fold type will store the state when we modify along the original [Content].
(I realize that the code will be much clearer if I use the State monad, but I'm not sure if I need to explain it then ...)
In addition, a folding function for foldl
accum :: Fold -> Content -> Fold
accum (c,a,ls) (Def x) = (c',a',Def x':ls) where
a' = M.insertWith (+) author 1 a
c' = M.insert code x' c
x' = x {getAuthorCount = count}
count = maybe 1 (+1) $ a !? author
author = getAuthor x
code = getSpecialCode x
accum (c,a,ls) (Cite code _) = (c,a,Cite code (c ! code) : ls)
accum (c,a,ls) y = (c,a,y:ls)
After foldr, the resulted list will contain the contents with
getAuthorCount correctly filled
Cites transferred into Defs, since they have the same outputting format.
The resulted list is reversed, so you'll need Data.List.reverse.
Finally, you can define your own version of Show for Content. For example,
instance Show Index where
show x = getAuthor x ++ " "
++ show (getAuthorCount x) ++ ": "
++ getTitle x ++ " "
instance Show Content where
show (Def idx) = show idx
show (Cite x idx) = getAuthor idx ++ " "
++ show (getAuthorCount idx)
show (Literal x) = x
as I figured out from your output sample.
The full length code:
import Data.Char
import Data.List (reverse)
import Data.Map.Strict ((!),(!?))
import qualified Data.Map.Strict as M
import Text.ParserCombinators.Parsec
data Index = Index {
getAuthor :: String,
getTitle :: String,
getSpecialCode :: String,
getAuthorCount :: Int
-- For counting author later.
}
nullIndex :: Index
nullIndex = Index "" "" "" 0
instance Show Index where
show x = getAuthor x ++ " "
++ show (getAuthorCount x) ++ ": "
++ getTitle x ++ " "
data Content = Def Index
| Cite String Index
| Literal String
instance Show Content where
show (Def idx) = show idx
show (Cite x idx) = getAuthor idx ++ " "
++ show (getAuthorCount idx)
show (Literal x) = x
document = many (try cite <|> try def <|> literal)
literal = Literal <$> many1 (noneOf "#")
def = do
char '#'
author <- many1 $ noneOf "/"
char '/'
title <- many1 $ noneOf "/"
char '/'
code <- many1 $ noneOf "/"
char '/'
return $ Def $ Index author title code 0
cite = do
try $ string "#see/"
code <- many1 $ noneOf "/"
char '/'
return $ Cite code nullIndex
capitalizeAuthor :: Content -> Content
capitalizeAuthor (Def x) = Def (x {getAuthor = author'}) where
author' = toUpper (head author) : tail author
author = getAuthor x
capitalizeAuthor y = y
type CodeDict = M.Map String Index
-- Map Code Index
type AuthorDict = M.Map String Int
-- Map Author Count
type Fold = (CodeDict, AuthorDict, [Content])
emptyFold :: Fold
emptyFold = (M.empty, M.empty, [])
accum :: Fold -> Content -> Fold
accum (c,a,ls) (Def x) = (c',a',Def x':ls) where
a' = M.insertWith (+) author 1 a
c' = M.insert code x' c
x' = x {getAuthorCount = count}
count = maybe 1 (+1) $ a !? author
author = getAuthor x
code = getSpecialCode x
accum (c,a,ls) (Cite code _) = (c,a,Cite code (c ! code) : ls)
accum (c,a,ls) y = (c,a,y:ls)
main :: IO ()
main = do
line <- getLine
let parsed = parse document "" line
case parsed of
Left x -> print x
Right cs -> do
let cs1 = map capitalizeAuthor cs
let (_,_,cs2) = foldl accum emptyFold cs1
let output = concatMap show $ reverse cs2
putStrLn output

Cutting specific chunks from a Haskell String

I'm trying to cut chunks from a list, with a given predicate. I would have preferred to use a double character, e.g. ~/, but have resolved to just using $. What I essentially want to do is this...
A: "Hello, my $name is$ Danny and I $like$ Haskell"
What I want to turn this into is this:
B: "Hello, my Danny and I Haskell"
So I want to strip everything in between the given symbol, $, or my first preference was ~/, if I can figure it out. What I tried was this:
s1 :: String -> String
s1 xs = takeWhile (/= '$') xs
s2 :: String -> String
s2 xs = dropWhile (/= '$') xs
s3 :: String -> String
s3 xs = s3 $ s2 $ s1 xs
This solution seems to just bug my IDE out (possibly infinite looping).
Solution:
s3 :: String -> String
s3 xs
|'$' `notElem` xs = xs
|otherwise = takeWhile (/= '$') xs ++ (s3 $ s1 xs)
s1 :: String -> String
s1 xs = drop 1 $ dropWhile (/= '$') $ tail $ snd $ break ('$'==) xs
This seems like a nice application for parsers. A solution using trifecta:
import Control.Applicative
import Data.Foldable
import Data.Functor
import Text.Trifecta
input :: String
input = "Hello, my $name is$ Danny and I $like$ Haskell"
cutChunk :: CharParsing f => f String
cutChunk = "" <$ (char '$' *> many (notChar '$') <* char '$')
cutChunk matches $, followed by 0 or more (many) non-$ characters, then another $. Then we use ("" <$) to make this parser's value always be the empty string, thus discarding all the characters that this parser matches.
includeChunk :: CharParsing f => f String
includeChunk = some (notChar '$')
includeChunk matches the text that we want to include in the result, which is anything that's not the $ character. It's important that we use some (matching one or more characters) and not many (matching zero or more characters) because we're going to include this parser within another many expression next; if this parser matched on the empty string, then that could loop infinitely.
chunks :: CharParsing f => f String
chunks = fold <$> many (cutChunk <|> includeChunk)
chunks is the parser for everything. Read <|> as "or", as in "parse either a cutChunk or an includeChunk". many (cutChunk <|> includeChunk) is a parser that produces a list of chunks e.g. Success ["Hello, my ",""," Danny and I ",""," Haskell"], so we fold the output to concatenate those chunks together into a single string.
result :: Result String
result = parseString chunks mempty input
The result:
Success "Hello, my Danny and I Haskell"
Your infinite loop comes from calling s3 recursively with no base case:
s3 :: String -> String
s3 xs = s3 $ s2 $ s1 xs
Adding a base case corrects the infinite loop:
s3 xs
| '$' `notElem` xs = xs
| otherwise = ...
This is not the whole answer. Think about what s1 actually does and where you use its return value:
s1 "hello $my name is$ ThreeFx" == "hello "
For further reference, see the break function:
break :: (a -> Bool) -> [a] -> ([a], [a])
I think your logic is wrong, perhaps easier to write it in an elementary way
Prelude> let pr xs = go xs True
Prelude| where go [] _ = []
Prelude| go (x:xs) f | x=='$' = go xs (not f)
Prelude| | f = x : go xs f
Prelude| | otherwise = go xs f
Prelude|
Prelude> pr "Hello, my $name is$ Danny and I $like$ Haskell"
"Hello, my Danny and I Haskell"
Explanation The flag f keeps track of the state (either pass mode or not). If the current char is a token skip and switch state.

Replacing Ints with Strings

I need help with an exercise. I'm trying to write a function called "refer" that will take a list of novels and any given text with citations ([n]) and will return that text with the citations replaced by the novel's name.
refer will have the following signature of:
refer :: [(String, String, Int)] -> String -> String
For example, this is how it will be ran:
> refer [("author1", "novel1", 1999),("author2", "novel2", 2000)] txt
> "novel1(author1, 1999) and novel2(author2, 2000) are my favorite books of all time, but I like novel2(author2, 2000) the most!"
I wrote a function called, txt, which will show my text that I will use.
txt :: String
txt = "[1] and [2] are my favorite books of all time, but I like [2] the most!"
I wrote a helper function called, format, which will help me format the novels from ("author1", "novel1", 1999) to "novel1(author1, 1999)"
format :: (String, String, Int) -> String
format (name, novel, yearInt) = novel ++ " (" ++ name ++
", " ++ (show yearInt) ++ ")"
WHAT I THINK I NEED TO DO:
Step 1: I need to use words to break the input into a list of strings.
Step 2: I should make a helper function to parse through the list and if I find a citation, I should use format to replace that citation and recursively parse through the rest of the list until I've checked everything.
Step 3: Make a helper function to convert the string representation of the citation number into an Int (possibly, unwords) since I have to replace the citation with its corresponding element in the given list.
Step 4: Then I need to use rewords to turn my updated list back into a string.
WHAT I HAVE SO FAR:
refer :: [(String, String, Int)] -> String -> String
refer [] "" = ""
refer books txt = [string'| y <- words txt, ........... ]
-- I'm trying to say that I want to take the inputted list of
-- novels and text and turn them all into strings and store
-- them into y which will be represented by string'. I am not
-- sure what to do after this.
You could use words, but then you lose information about the white space between the words - i.e. words "a b" equals words "a b". Maybe this is not important, but it is something to keep in mind.
Without providing the exact solution, here is a function which replaces a with a' and b with b' in a list:
replace a a' b b' [] = [] -- base case
replace a a' b b' (x:xs) = c : replace a a' b b' xs
where c = if x == a then a' else if x == b then b' else x
Perhaps you can figure out how to adapt this to your problem.
Note also that this function may be written using map:
replace a a' b b' xs = map f xs
where f x = if x == a then a' else if x == b then b' else x
Another approach to this kind of string processing is to pattern match against the characters. Here is a function which removes all occurrences of "cat" from a string:
removeCat :: String -> String
removeCat ('c':'a':'t':rest) = rest -- "cat" found so remove it
removeCat (x:rest) = x : removeCat rest
If this is for a homework problem please don't copy verbatim.
It's easier to solve the generic problem.
First, you need a replace function to replace a substring. A straightforward implementation can be
replace :: String -> String -> String -> String
replace old new [] = []
replace old new input = if (take n input == old) then new++(recurse n)
else (take 1 input)++(recurse 1)
where n = length old
recurse k = replace old new $ drop k input
tries to match the "old" from the beginning of input and if matched replace and skip the length of old, if not matched move one position and repeat.
So far so good, this will replace all occurrences of "old" to "new". However, you need multiple different replacements. For that let's write another function. To minimize validations assume we paired all replacements.
replaceAll :: [(String,String)] -> String -> String
replaceAll [] input = input
replaceAll ((old,new):xs) input = replaceAll xs $ replace old new input
You can make the top level signature better by defining a type such as
type Novel = (String,String,Int)
finally,
refer :: [Novel] -> String -> String
refer [] text = text
refer ns text = replaceAll p text
where p = [ ("["++show i++"]",format n) | (i,n) <- zip [1..] ns]
notice that the citation input is derived from the position of the Novels.

Haskell expand function

I was recently handed an assignment I have almost completed and I am currently in need of some help.
The first functions I needed to implement were lookUp, split, combine and keyWordDefs.
I then had to implement a function expand :: FileContents -> FileContents -> FileContents that takes the contents of a text file and an info file and combines them using the above functions to build a string representing the output file.
Here is my code so far:
module MP where
import System.Environment
type FileContents = String
type Keyword = String
type KeywordValue = String
type KeywordDefs = [(Keyword, KeywordValue)]
separators :: String
separators
= " \n\t.,:;!\"\'()<>/\\"
lookUp :: String -> [(String, a)] -> [a]
-- Given a search string and a list of string/item pairs, returns
-- the list of items whose associated string matches the search string.
lookUp x y = [a|(b,a) <- y, x==b]
split :: String -> String -> (String, [String])
-- Breaks up a string.
split as [] = ("",[""])
split as (b:bs)
| elem b as = (b:xs,"":y:ys)
| otherwise = (xs, (b:y):ys)
where
(xs,y:ys) = split as bs
combine :: [Char] -> [String] -> [String]
-- Combines the components of a string from its constituent separator
-- characters and words, as generated by a call to split.
combine [] y = y
combine (x:xs)(y:ys) = y : [x] : combine xs ys
getKeywordDefs :: [String] -> KeywordDefs
-- Takes the contents of an information file in the form of a list
-- of lines and which returns a list of keyword/definition pairs.
getKeywordDefs [] = []
getKeywordDefs (x:xs) = (keyword, concat defs) : getKeywordDefs xs
where
(_, (keyword : def)) = split " " x
defs = combine spaces def
spaces = [ ' ' | s <- [2..length def]]
expand :: FileContents -> FileContents -> FileContents
An example of the function expand is this:
expand "The capital of $1 is $2" "$1 Peru\n$2 Lima."
"The capital of Peru is Lima."
I suppose that this is going to work by 1st looking up (with function lookUp) if there is a "$" in the input string, then split the words, then replacing words that begin with "$" with the second input string, then combining them again all together? I am really confused actually, and I would like to know if anyone here understand how function expand will work.
Any help is welcome :)
Your expand function should look something like this:
-- It's probably better to change the type signature a little bit
-- because we're not returning the contents of a file, we're returning a string.
expand :: FileContents -> FileContents -> String
expand fc1 fc2 = let
keywordDefs = getKeywordDefs fc2
in replaceSymbols fc1 keywordDefs
Then you need a function named replaceSymbols, which splits up fc1 whenever it sees a $X, and then substitutes that $X for the result of looking up $X in keywordDefs.
replaceSymbols :: FileContents -> KeywordDefs -> String
Have a go at implementing that function and reply to this answer if you still need help :).

Resources