Haskell: get list of strings from file - haskell

My input is:
Plaster ["BD..", ".GA.D", ".FEG", "ABDCF", "E..."]
What I'm trying to get:
["BD..", ".GA.D", ".FEG", "ABDCF", "E..."]
My code for now:
go = do --print "Enter file name"
--path <- getLine
file <- (readFile "1.txt")
print file
let list = consume file
print list
let content = (wordsWhen (=='"') list)
print content
print (content !! 0)
print (content !! 1)
print (content !! 2)
wordsWhen :: (Char -> Bool) -> String -> [String]
wordsWhen p s = case dropWhile p s of
"" -> []
", " -> []
s' -> w : wordsWhen p s''
where (w, s'') = break p s'
consume [] = []
consume ('[':xs) = consume' xs
consume (_ :xs) = consume xs
consume' [] = []
consume' (']':xs) = []
consume' (x :xs) = x : consume' xs
So what I'm doing is
Read file from destination (now hardcoded for testing)
Get rid of the word "Plaster" with consume
Get all strings from file with wordsWhen
I tried different separators for wordsWhen, but I can't get what I need. In current form the output is:
"Plaster [\"BD..\", \".GA.D\", \".FEG\", \"ABDCF\", \"E...\"]"
"\"BD..\", \".GA.D\", \".FEG\", \"ABDCF\", \"E...\""
["BD..",", ",".GA.D",", ",".FEG",", ","ABDCF",", ","E..."]
"BD.."
", "
".GA.D"
Which is quite accurate, but I want to get rid of this words that contains only commas. I can change the separator to comma (which it's supposed to be, I think), but then output is given with all this slashes and quotation marks, like this:
"Plaster [\"BD..\", \".GA.D\", \".FEG\", \"ABDCF\", \"E...\"]"
"\"BD..\", \".GA.D\", \".FEG\", \"ABDCF\", \"E...\""
["\"BD..\""," \".GA.D\""," \".FEG\""," \"ABDCF\""," \"E...\""]
"\"BD..\""
" \".GA.D\""
" \".FEG\""
Is there a way to fix my code? Or should I do it in a different way?
EDIT: As it is my exercise, I can only use standard types and functions.

Well, you could cheat by defining a datatype with a Read instance that matches your existing input:
{-# OPTIONS_GHC -Wall -Werror -Wno-name-shadowing #-}
module Main where
data Input = Plaster [String] deriving (Read, Show)
main :: IO ()
main = do
Plaster xs <- readIO =<< readFile "1.txt"
_ <- traverse print (zip [0 :: Int ..] xs)
return ()
This works perfectly for me with ghc-8.0.2:
$ cat "1.txt"
Plaster ["BD..", ".GA.D", ".FEG", "ABDCF", "E..."]
$ ghc --make SO44269043.hs && ./SO44269043
(0,"BD..")
(1,".GA.D")
(2,".FEG")
(3,"ABDCF")
(4,"E...")
Alternately, you could define your own read instance:
data Input = Plaster [String]
instance Read Input where
readsPrec p = readParen (p >= 10) . runR $ do
Plaster <$> (string "Plaster" *> many1 whitespace *> R readList)
If you're unfamiliar with <$> and *>, it may be a little easier to read this as
readsPrec p = readParen (p >= 10) . runR $ do
_ <- string "Plaster"
_ <- many1 whitespace
xs <- R readList
return (Plaster xs)
Even without imports, it's not a lot of code to define a parser type R, basically just a wrapper that lets you define a monad instance for String -> [(a, String)]:
newtype R a = R { runR :: ReadS a }
instance Functor R where
fmap f = R . fmap (map (\(a, s) -> (f a, s))) . runR
instance Applicative R where
pure a = R $ \s -> [(a, s)]
mf <*> ma = R $ \s -> do
(f, s) <- runR mf s
(a, s) <- runR ma s
return (f a, s)
instance Monad R where
m >>= f = R $ \s -> do
(a, s) <- runR m s
runR (f a) s
The >>= (or bind) operator just means "parse some of the string
with the parser on the left, then parse the rest of the string
with the after passing the resulting value to the function on the right."
We get R readList :: R [String] for free now, so all we need to do
is skip the initial "Plaster" and any whitespace between that and the
list of strings:
string :: String -> R String
string = traverse char
many1 :: R a -> R [a]
many1 r = loop where
loop = (:) <$> r <*> (loop <|> return [])
whitespace :: R Char
whitespace = char ' ' <|> char '\t' <|> char '\n' <|> char '\r'
Normally we'd use <|> from GHC.Base, but it's not hard to define a one-off here. Basically
r <|> r' means "try to parse with r, and if it fails, try to parse with r' instead"
(<|>) :: R a -> R a -> R a
r <|> r' = R $ \s -> runR r s ++ runR r' s
And now all we need is the ability to match a single character:
char :: Char -> R Char
char c = R $ \s -> case s of
(c' : s) | c == c' -> [(c, s)]
_ -> []
If even using Prelude.readList is too easy, we can define our own parsers for lists and quoted strings:
readsPrec p = readParen (p >= 10) . runR $ do
Plaster <$> (string "Plaster" *> many1 whitespace *> listOf quotedString)
Where lists just have a leading '[', a trailing ']', and some number of delimited terms:
listOf :: R a -> R [a]
listOf term = char '[' *> (term `sepBy` string ", ") <* char ']'
sepBy :: R a -> R b -> R [a]
sepBy term delim = sepBy1 term delim <|> return []
sepBy1 :: R a -> R b -> R [a]
sepBy1 term delim = loop where
loop = (:) <$> term <*> ((delim *> loop) <|> return [])
Similarly, a quoted string just has a leading '"', a trailing '"', and some number of escaped characters:
quotedString :: R String
quotedString = char '"' *> many escapedChar <* char '"'
many :: R a -> R [a]
many r = many1 r <|> return []
escapedChar :: R Char
escapedChar = R $ \s -> case s of
'\\' : '\\' : s -> [('\\', s)]
'\\' : '"' : s -> [('"', s)]
c : s | c /= '\\' && c /= '"' -> [(c, s)]
_ -> []
It's worth noting the similarity between many/many1 and sepBy/sepBy1 - if we were really
lazy, we could define one in terms of the other:
many1 r = r `sepBy1` return ()
term `sepBy1` delim = (:) <$> term <*> many (delim *> term)

This is how it could be done:
import System.Environment
import System.IO
import Data.Maybe
import Text.Read
readStringList :: String -> Maybe [String]
readStringList = readMaybe
main = do --print "Enter file name"
handle <- openFile "1.txt" ReadMode
hSeek handle AbsoluteSeek 8
file <- hGetContents handle
let list = fromJust (readStringList file )
print list
let filterThis = "," :: String
let filtered = filter (/=filterThis) list
print filtered
To get first command line argument, use getArgs.

Here is a quick and dirty parser.
Be careful, it only works for well formed input, is not performant and code is not factorised. But there is no cheating ;)
Maybe it could give you some inspiration to solve your exercise.
plaster :: String -> String
plaster ('P':'l':'a':'s':'t':'e':'r':' ':xs) = xs
plaster s = undefined
brackets :: String -> String
brackets ('[':xs) = brackets xs
brackets (x:']':_) = [x]
brackets (x:xs) = x:brackets xs
quotes :: String -> String
quotes ('"':xs) = quotes xs
quotes (x:'"':_) = [x]
quotes (x:xs) = x:quotes xs
sepByComma :: String -> [String]
sepByComma s = go s ""
where
go [] acc = [acc]
go (',':' ':xs) acc = [acc] ++ go xs ""
go (x:xs) acc = go xs (acc ++ [x])
parse :: String -> [String]
parse s = map quotes . sepByComma . brackets . plaster $ s

Here is an alternative using only the very basics, no Monads, Functors or Applicative operators.
main :: IO()
main = do
input <- getLine
let output = parse input
print output
parse :: String -> [String]
parse = map stripQuotes . parse' . tokenize []
where
parse' :: [String] -> [String]
-- If the input matches the pattern, call parseList on the inner tokens.
-- Does not nest brackets! This is a simple regex match.
parse' ("Plaster":"[":tokens) | last tokens == "]" =
parseList [] (removeLast tokens)
parse' _ = error "The input does not have the form \"Plaster [...]\"."
parseList :: [String] -> [String] -> [String]
-- Empty list.
parseList tokens [] = tokens
-- Unexpected tokens.
parseList _ (",":_) = error "Unexpected comma."
parseList _ ("[":_) = error "No support for nested brackets."
parseList _ ("]":_) = error "Unexpected input after \"]\"."
-- One-element list.
parseList tokens [x] = tokens ++ [x]
-- Comma-separated list with at least two elements.
parseList tokens (x:",":y:ys) = parseList (tokens ++ [x]) (y:ys)
-- Comma at end of list, so we don’t want to give the "expected comma" error!
parseList _ [_,","] = error "Extra comma at end of list."
-- More than one element not separated by commas.
parseList _ (x:_) = error $ "Expected comma after \"" ++ x ++ "\"."
stripQuotes :: String -> String
stripQuotes ('"':xs) | last xs == '"' = removeLast xs
stripQuotes xs = error $ "Expected string literal instead of " ++ xs ++ "."
removeLast :: [a] -> [a]
removeLast xs = take ((length xs) - 1) xs
whitespace :: [Char]
whitespace = [' ', '\n', '\t'] -- Incomplete, but sufficient.
isWhitespace :: Char -> Bool
isWhitespace c = elem c whitespace
tokenize :: [String] -> String -> [String]
-- If we’ve consumed all the input, we’re done.
tokenize tokens [] = tokens
-- We’d need something a little more complicated for longer operators:
tokenize tokens ('[':xs) = tokenize (tokens ++ ["["]) xs
tokenize tokens (']':xs) = tokenize (tokens ++ ["]"]) xs
tokenize tokens (',':xs) = tokenize (tokens ++ [","]) xs
-- Not currently processing a token, so skip whitespace.
-- Otherwise, start a new token.
tokenize tokens (x:xs) | isWhitespace x = tokenize tokens xs
| otherwise = tokenize' tokens [x] xs
where
tokenize' :: [String] -> String -> String -> [String]
-- If we’ve consumed all the input, the current token is the last.
tokenize' ts t [] = ts ++ [t]
-- If we encounter an operator, it is the token after the current one.
tokenize' ts t ('[':ys) = tokenize (ts ++ [t] ++ ["["]) ys
tokenize' ts t (']':ys) = tokenize (ts ++ [t] ++ ["]"]) ys
tokenize' ts t (',':ys) = tokenize (ts ++ [t] ++ [","]) ys
-- Whitespace means the current token is complete.
-- Otherwise, append y to the current token and continue.
tokenize' ts t (y:ys) | isWhitespace y = tokenize (ts ++ [t]) ys
| otherwise = tokenize' ts (t ++ [y]) ys
You wouldn’t do this in production code; this is simple enough to do with a regex, and parsing is (more or less) a solved problem. Parser combinators are the trendy way to go.

Related

String to List of Int

I would like to incorporate Maybe [int] into this.
The code should take a string and filter out spaces turning it into a list of integers and if their are letters return Nothing.
text2digits :: String -> [Int]
text2digits s = case s of
[] -> []
x:xs
|isDigit x -> digitToInt x :text2digits (filter (/= ' ') xs)
|otherwise -> undefined
input "1233 5687" output: [1,2,3,3,5,6,8,7]
input "a89" required output : Nothing
current output: undefined
I have tried this but it shows up a list of errors
text2digits :: String -> Maybe [Int]
text2digits s = case s of
[] -> Just []
x:xs
|isDigit x -> Just digitToInt x :text2digits (filter (/= ' ') xs)
|otherwise -> Nothing
What is wrong with the code, that you've specified for text2digits :: String -> Maybe [Int]?
The problem is in this line:
digitToInt x :text2digits (filter (/= ' ') xs)
text2digits returns value of Maybe [Int] type, but (:) expects it to be [Int].
In order to fix it, you can use fmap or <$> to apply a function to a structure inside the functor Maybe:
import Data.Char
text2digits :: String -> Maybe [Int]
text2digits s = case s of
[] -> Just []
x:xs
|isDigit x -> ((digitToInt x) :) <$> text2digits (filter (/= ' ') xs)
|otherwise -> Nothing
main = print $ text2digits "1233 5687"
Or probably you can use traverse to refactor the function a bit:
import Data.Char
text2digits :: String -> Maybe [Int]
text2digits s =
traverse digitToMaybeInt $ filter (/= ' ') s
where
digitToMaybeInt x
| isDigit x = Just $ digitToInt x
| otherwise = Nothing
main = print $ text2digits "89"

Haskell "\n" shown as string in IO String

imP:: Int -> IO String
imP n = do
x <- getLine
if n >= 0 then return ( (concat (replicate n " ")) ++ fun1 x) else return ( fun2 n x)
where
fun1 [] = ""
fun1 (x:xs)
| isAlpha x = [x] ++ fun1 xs
| otherwise = "\n" ++ fun1 xs
fun2 n [] = ""
fun2 n (x:xs)
| isAlpha x = [x] ++ fun2 n xs
| otherwise = "\n" ++ (concat (replicate (abs n) " ")) ++ fun2 n xs
I have this code. And given an input "hello3mello" to getLine it returns:
"hello\nmello"
But I need:
"hello
mello"
EDIT:
<interactive>:32:9:
Couldn't match type `IO String' with `[Char]'
Expected type: String
Actual type: IO String
In the first argument of `putStr', namely `(imP 3)'
In the expression: putStr (imP 3)
The type of putStr is String -> IO (), you can't apply it to imP 3 :: IO String because putStr expects a pure String not an IO String. Which is exactly what GHC's error message is reporting.
I assume you're not familiar with monads, so I'd recommend reading any of the many tutorials. In the meantime, use \x -> imP x >>= putStr

Remove characters from String in Haskell

I am creating a program that reads a text file and splits up words and stores them in a list. I have been trying to create a function that takes in a String which is the whole text String from the file and remove punctuation e.g. ";", ",", "." but unfortunately haven't had any luck yet. The program works without the punctuation function, but not when I include it to (toWords fileContents) Please can someone look at what I have done and see what I am doing wrong.
Here is the code that I have so far:
main = do
contents <- readFile "LargeTextFile.txt"
let lowContents = map toLower contents
let outStr = countWords (lowContents)
let finalStr = sortOccurrences (outStr)
let reversedStr = reverse finalStr
putStrLn "Word | Occurrence "
mapM_ (printList) reversedStr
-- Counts all the words.
countWords :: String -> [(String, Int)]
countWords fileContents = countOccurrences (toWords (removePunc fileContents))
-- Splits words and removes linking words.
toWords :: String -> [String]
toWords s = filter (\w -> w `notElem` ["an","the","for"]) (words s)
-- Remove punctuation from text String.
removePunc :: String -> String
removePunc xs = x | x <- xs, not (x `elem` ",.?!-:;\"\'")
-- Counts, how often each string in the given list appears.
countOccurrences :: [String] -> [(String, Int)]
countOccurrences xs = map (\xs -> (head xs, length xs)) . group . sort $ xs
-- Sort list in order of occurrences.
sortOccurrences :: [(String, Int)] -> [(String, Int)]
sortOccurrences sort = sortBy (comparing snd) sort
-- Prints the list in a format.
printList a = putStrLn((fst a) ++ " | " ++ (show $ snd a))
You probably want:
removePunc xs = [ x | x <- xs, not (x `elem` ",.?!-:;\"\'") ]
with the brackets.

Don't know how to use . and $ operator in Haskell

I am not sure how to implement the . and the $ operators to simplify the following definitions:
compress :: [Char] -> [Char]
compress [] = []
compress as
| g as 1 == 1 = [head as] ++ compress (drop 1 as)
| otherwise = [head as] ++ show (g as 1) ++ compress (drop (g as 1) as)
g :: [Char] -> Int -> Int
g [] i = i
g (a:[]) i = i
g (a:as) i
| a == head as = g as (i + 1)
| otherwise = i
main = getLine >>= \str -> putStrLn $ compress str
I've read that the . operator is a functional composition so that the output of one function goes to the input of another, and $ is a substitute for a parenthesis.
Therefore, I tried changing it to
compress :: [Char] -> [Char]
compress [] = []
compress as
| g as 1 == 1 = [head as] ++ compress . drop 1 as
| otherwise = [head as] ++ show (g as 1) ++ compress . drop (g as 1) as
g :: [Char] -> Int -> Int
g [] i = i
g (a:[]) i = i
g (a:as) i
| a == head as = g as (i + 1)
| otherwise = i
main = getLine >>= \str -> putStrLn $ compress str
But I get type errors saying
could not match '[Char]' with a0 -> [Char]
I am a bit confused on how to use those operators.
I do not see a way of using ($) and (.) in this code.
However, you can simplify your code as this:
compress :: [Char] -> [Char]
compress [] = []
compress as#(x:xs)
| g as 1 == 1 = x : compress xs
| otherwise = x : show (g as 1) ++ compress (drop (g as 1) as)
g :: [Char] -> Int -> Int
g (a:as) i
| a == head as = g as (i + 1)
| otherwise = i
g _ i = i
main = getLine >>= putStrLn . compress
For instance, this:
[head as] ++ compress (drop 1 as)
is the same as this:
head as : compress (drop 1 as)
And by using pattern matching, it becomes even shorter:
x : compress xs
The operators you want to use are commonly use to write a shorter version (with less parentheses) of a function. For instance, your compress function could be written this way:
compress :: [Char] -> [Char]
compress = concatMap (\x -> head x : show (length x)) . group
instead of this:
compress :: [Char] -> [Char]
compress xs = concat $ map (\x -> head x : show (length x)) $ group xs
or even this
compress :: [Char] -> [Char]
compress xs = concatMap (\x -> head x : show (length x)) (group xs)
Here is a simpler example:
capitalizeWords :: String -> String
capitalizeWords string = unwords (map (\(f:rest) -> toUpper f : rest) (words string))
main = putStrLn (capitalizeWords "here you are")
can be rewritten to:
capitalizeWords :: String -> String
capitalizeWords = unwords . map (\(f:rest) -> toUpper f : rest) . words
main = putStrLn $ capitalizeWords "here you are"
Here are the explanations:
The ($) can be used in the main function because this operator can be viewed as wrapping in parentheses what is on the right of it.
For the capitalizeWords function, it can first be simplify to this:
capitalizeWords string = unwords $ map (\(f:rest) -> toUpper f : rest) (words string)
using the previous explanation.
Again, we can use ($):
capitalizeWords string = unwords $ map (\(f:rest) -> toUpper f : rest) $ words string
And as the string parameter is on the right of both side of the equality, we can use composition to remove this parameter. So we get the final capitalizeWords function shown above.
You can learn more about the ($) and (.) operators here.
There are tools that can help you writing point-free functions like hlint and pointfree.

Filter by length

How I can make here filter (x:xs) = (x, length (x:xs)) that puts length when length > 1?
Currently, if input is abcaaabbb output is [('a',1),('b',1),('c',1),('a',3),('b',3)], but I'm looking for abca3b3.
My code:
import Data.List
encode :: [Char] -> [(Char, Int)]
encode s = map go (group s)
where go (x:xs) = (x, length (x:xs))
main = do
s <- getLine
print (encode s)
Last string will be putStrLn (concat (map (\(x,y) -> x : [y]) (encode s))) for convert list to string.
As I am a newbie myself, this is probably not very haskellian. But you can do it about like this (xs as would be the list [('a', 1), ('b', 2), ('a', 3)]):
Create "a1b2a3":
concat $ map (\(c, l) -> c:(show l)) xs
Filter out 1s:
filter (\x -> x /= '1') "a1b2a3"
will give you "ab2a3"
You can't have a list like this in Haskell:
[('a'),('b'),('c'),('a',3),('b',3)]
Each element if a list needs to have the same type in haskell, and ('c') [('a') :: Char] and ('b',3) [('a',1) :: Num t => (Char, t)] are different types.
Maybe also have a look at List of different types?
I would suggest, that you change your list to a (Char, Maybe num) datastructure.
Edit:
From your new question, I think you have been searching for this:
import Data.List
encode :: [Char] -> [(Char, Int)]
encode s = map go (group s)
where go (x:xs) = (x, length (x:xs))
f :: (Char, Int) -> String
f (a, b) = if b == 1 then [a] else [a] ++ show b
encode2 :: [(Char, Int)] -> String
encode2 [] = []
encode2 (x:xs) = f(x) ++ encode2 xs
main = do
s <- getLine
putStrLn $ encode2 $ encode s
Not sure if this suits your needs, but if you do not need filtering, this does the work:
encode::String -> String
encode "" = ""
encode (x:xs) = doIt0 xs x 1 where
doIt0 [] ch currentPos = [ch]++showPos currentPos
doIt0 (x:xs) ch currentPos
|x==ch = doIt0 xs ch $ currentPos+1
|otherwise= [ch]++ (showPos currentPos) ++ (doIt0 xs x 1)
showPos pos = if pos> 1 then show pos else ""
main = do
s <- getLine
print (encode s)

Resources