read data from a list of strings

read data from a list of strings - haskell

i have a data structure like this
data Something = Something Integer String String
and i want to convert
["Something", "3", "text", "42"]
to the data.
for now, i have
altRead :: Read a => [String] -> a
altRead = read . unwords . hack
where
hack = map (\x -> if isNumber x then x else "\"" ++ x ++ "\"")
isNumber = foldl (\b c -> isDigit c && b) True
but i forgot, that some numbers could be strings in the data structure.
is there a simple solution for this or do i need to write a alternative read typeclass?

You're writing a tiny parser atop some lexed tokens. You can't really implement a Read instance since read :: Read a => String -> a and you want to do [String] -> a for a == Something. You can take advantage of Read instances that already exist, though, to bootstrap parsing your Integer, for instance.
So let's try it. We'll parse a Something from the list of tokens.
import Safe -- gives us readMay :: Read a => String -> Maybe a
parseSomething :: [String] -> Maybe Something
parseSomething ("Something":strInt:stra:strb:_) =
do int <- readMay strInt
return $ Something int stra strb
parseSomething _ = Nothing
We could do it a little more compactly using Maybe as an Applicative, too
import Control.Applicative
parseSomething :: [String] -> Maybe Something
parseSomething ("Something":strInt:stra:strb:_) =
Something <$> readMay strInt <*> pure stra <*> pure strb
parseSomething _ = Nothing
Really, we should probably return any unconsumed tokens as well so we can continue parsing.
parseSomething :: [String] -> (Maybe Something, [String])
parseSomething ("Something":strInt:stra:strb:rest) =
(Something <$> readMay strInt <*> pure stra <*> pure strb, rest)
parseSomething rest = (Nothing, rest)
The reason I bring in all this structure to your parse is that this starts to head toward the space of parser combinators like Parsec. Whenever you've got a need for a complicated Read it begins to become useful to look at some of the really nice parsing libraries in Haskell.

With what you have, you don't really need to make it a typeclass. You can just do:
readSomething :: [String] -> Maybe Something
readSomething [_, n, s1, s2] = Just $ Something (read n) s1 s2
readSomething _ = Nothing
or, if you want to disambiguate on the first word:
data Something = Something Integer String String
| SomethingToo String Integer
readSomething :: [String] -> Maybe Something
readSomething ["Something", n, s1, s2] = Just $ Something (read n) s1 s2
readSomething ["SomethingToo", s, n] = Just $ SomethingToo s (read n)
readSomething _ = Nothing

In GHCI:
data Something = Something Integer String String deriving (Read, Show)
let somethingStrings = ["Something", "3", "text", "42"]
let escapeForSomething [a,b,c,d] = [a, b, "\""++c++"\"", "\""++d++"\""]
let something = read (unwords (escapeForSomething somethingStrings)) :: Something

Related

Monadic Parser - handling string with one character

I was reading this Monadic Parsing article while I was trying to implement a pretty simple string parser in Haskell and also get a better understanding of using monads. Down below you can see my code, implementing functions for matching a single character or a whole string. It works as expected, but I observed two strange behaviors that I can't explain.
I have to handle single characters in string, otherwise, the parser will return only empty lists. To be exact, if I remove this line string [c] = do char c; return [c] it won't work anymore. I was expecting that string (c:s) would handle string (c:[]) properly. What could be the cause here?
In my opinion, string definition should be equivalent to string s = mapM char s as it would create a list of [Parser Char] for each character in s and collect the results as Parser [Char]. If I use the definition based on mapM, the program would get stuck in an infinite loop and won't print anything. Is something about lazy evalutation that I miss here?
.
module Main where
newtype Parser a = Parser { apply :: String->[(a, String)] }
instance Monad Parser where
return a = Parser $ \s -> [(a, s)]
ma >>= k = Parser $ \s -> concat [apply (k a) s' | (a, s') <- apply ma s]
instance Applicative Parser where
pure = return
mf <*> ma = do { f <- mf; f <$> ma; }
instance Functor Parser where
fmap f ma = f <$> ma
empty :: Parser a
empty = Parser $ const []
anychar :: Parser Char
anychar = Parser f where
f [] = []
f (c:s) = [(c, s)]
satisfy :: (Char -> Bool) -> Parser Char
satisfy prop = do
c <- anychar
if prop c then return c
else empty
char :: Char -> Parser Char
char c = satisfy (== c)
string :: String -> Parser String
string [] = empty
string [c] = do char c; return [c] --- if I remove this line, all results will be []
string (c:s) = do char c; string s; return (c:s)
main = do
let s = "12345"
print $ apply (string "123") s
print $ apply (string "12") s
print $ apply (string "1") s
print $ apply (string []) s
PS. I think the title of the question is not suggestive enough, please propose an edit if you have a better idea.

Since you did string [] = empty instead of string [] = return [], you can't use it as a base case for recursion that builds up a list.
fmap f ma = f <$> ma is wrong, since <$> is defined in terms of fmap. If you want to define fmap in terms of your other instances, then do fmap = liftA or fmap = liftM. Since mapM uses fmap internally but your original string didn't, this problem didn't come up in your first simple test.

string [] = empty
means: "If you need to parse an empty string, fail -- it can not be parsed at all, no matter what's the input string".
By comparison,
string [] = return ""
means: "If you need to parse an empty string, succeed and return the empty string -- it can always be parsed, no matter what's the input string".
By using the first equation, when you recurse in the case string (c:cs) you need to stop at one character (string [c]) since reaching zero characters will run empty and make the whole parser fail.
Hence, you need to either use that string [c] = return [c] equation, or modify the base "empty string" case so that it succeeds. Arguably, the latter would be more natural.

Haskell: Convert String to [(String,Double)]

I parse an XML and get an String like this:
"resourceA,3-resourceB,1-,...,resourceN,x"
I want to map that String into a list of tuples (String,Double), like this:
[(resourceA,3),(resourceB,1),...,(resourceN,x)]
How is it possible to do this? I ve looked into the map function and also the split one. I am able to split the string by "-" but anything else...
This is the code i have so far:
split :: Eq a => a -> [a] -> [[a]]
split d [] = []
split d s = x : split d (drop 1 y) where (x,y) = span (/= d) s
it is just a function to split my string into a list of Stirng, but then i dont know how to continue.
What I want to do know is to loop over that new list that i have created with the split method and for each element create a tuple. I hace tried with the map function but i dont get it to compile even

So in Haskell you dont really mutate any value, instead you'll create a new list of pairs from the string you've described, so the solution would look something similar to the following:
import Data.List.Split
xmlList = splitOn "-" "resourceA,3-resourceB,4-resourceC,6"
commaSplit :: String -> [String]
commaSplit = splitOn ","
xmlPair :: [String] -> [(String, Double)] -- might be more efficient to use Text instead of String
xmlPair [x] = [(\x' -> ((head x') :: String, (read (last x')) :: Double )) (commaSplit x)]
xmlPair (x:xs) = xmlPair [x] ++ xmlPair xs
main :: IO ()
main = mapM_ (\(a,b) -> putStrLn (show a++" = "++ show b)) (xmlPair $ xmlList)
This is my quick and dirty way of showing things but I'm sure someone can always add a more detailed answer.

How far does "try" back track?

So ... I messed up a recording in CSV format:
23,95489,0,20,9888
Due to language settings floating point numbers were written with commas as seperator ... in a comma separated value file ...
Problem is that the file does not have a nice formatting for every float. Some have no point at all and the number of numbers behind the point varies too.
My idea was to build a MegaParsec parser that would try to read every possible floating point formatting, move on and if back track if it finds an error.
Eg for the example above:
read 23,95489 -> good
read 0,20 -> good (so far)
read 9888 -> error (because value is too high for column (checked by guard))
(back tracking to 2.) read 0 -> good again
read 20,9888 -> good
done
I've implemented that as (pseudo code here):
floatP = try pointyFloatP <|> unpointyFloatP
lineP = (,,) <$> floatP <* comma <*> floatP <* comma <*> floatP <* comma
My problem is that apparently the try only works in the 'current' float. There is no backtracking to previous positions. Is this correct?
And if so ... how would I go about implementing further back tracking?

How far does “try” back track?
The parser try p consumes exactly as much input as p if p parses successfully, otherwise it does not consume any input at all. So if you look at that in terms of backtracking, it backtracks to the point where you were when you invoked it.
My problem is that apparently the try only works in the 'current' float. There is no backtracking to previous positions. Is this correct?
Yes, try does not "unconsume" input. All it does is to recover from a failure in the parser you give it without consuming any input. It does not undo the effects of any parsers that you've applied previously, nor does it affect subsequent parsers that you apply after try p succeeded.
And if so ... how would I go about implementing further back tracking?
Basically what you want is to not only know whether pointyFloatP succeeds on the current input, but also whether the rest of your lineP would succeed after successfully pointyFloatP - and if it doesn't you want to backtrack back to before you applied pointyFloatP. So basically you want the parser for the whole remaining line in the try, not just the float parser.
To achieve that you can make floatP take the parser for the remaining line as an argument like this:
floatP restP = try (pointyFloatP <*> restP) <|> unpointyFloatP <*> restP
Note that this kind of backtracking isn't going to be very efficient (but I assume you knew that going in).

Update: Include a custom monadic parser for more complex rows.
Using the List Monad for Simple Parsing
The list monad makes a better backtracking "parser" than Megaparsec. For example, to parse the cells:
row :: [String]
row = ["23", "95489", "0", "20", "9888"]
into exactly three columns of values satisfying a particular bound (e.g., less than 30), you can generate all possible parses with:
{-# OPTIONS_GHC -Wall #-}
import Control.Monad
import Control.Applicative
rowResults :: [String] -> [[Double]]
rowResults = cols 3
where cols :: Int -> [String] -> [[Double]]
cols 0 [] = pure [] -- good, finished on time
cols 0 _ = empty -- bad, didn't use all the data
-- otherwise, parse exactly #n# columns from cells #xs#
cols n xs = do
-- form #d# from one or two cells
(d, ys) <- num1 xs <|> num2 xs
-- only accept #d < 30#
guard $ d < 30
ds <- cols (n-1) ys
return $ d : ds
-- read number from a single cell
num1 (x:xs) | ok1 x = pure (read x, xs)
num1 _ = empty
-- read number from two cells
num2 (x:y:zs) | ok1 x && ok2 y = pure (read (x ++ "." ++ y), zs)
num2 _ = empty
-- first cell: "0" is okay, but otherwise can't start with "0"
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- second cell: can't end with "0" (or *be* "0")
ok2 xs = last xs /= '0'
The above list-based parser tries to reduce ambiguity by assuming that if "xxx,yyy" is a number, the "xxx" won't start with zeros (unless it's just "0"), and the "yyy" won't end with a zero (or, for that matter, be a single "0"). If this isn't right, just modify ok1 and ok2 as appropriate.
Applied to row, this gives the single unambiguous parse:
> rowResults row
[[23.95489,0.0,20.9888]]
Applied to an ambiguous row, it gives all parses:
> rowResults ["0", "12", "5", "0", "8601"]
[[0.0,12.5,0.8601],[0.0,12.5,0.8601],[0.12,5.0,0.8601]]
Anyway, I'd suggest using a standard CSV parser to parse your file into a matrix of String cells like so:
dat :: [[String]]
dat = [ ["23", "95489", "0", "20", "9888"]
, ["0", "12", "5", "0", "8601"]
, ["23", "2611", "2", "233", "14", "422"]
]
and then use rowResults above get the row numbers of rows that were ambiguous:
> map fst . filter ((>1) . snd) . zip [1..] . map (length . rowResults) $ dat
[2]
>
or unparsable:
> map fst . filter ((==0) . snd) . zip [1..] . map (length . rowResults) $ dat
[]
>
Assuming there are no unparsable rows, you can regenerate one possible fixed file, even if some rows are ambiguous, but just grabbing the first successful parse for each row:
> putStr $ unlines . map (intercalate "," . map show . head . rowResults) $ dat
23.95489,0.0,20.9888
0.0,12.5,0.8601
23.2611,2.233,14.422
>
Using a Custom Monad based on the List Monad for More Complex Parsing
For more complex parsing, for example if you wanted to parse a row like:
type Stream = [String]
row0 :: Stream
row0 = ["Apple", "15", "1", "5016", "2", "5", "3", "1801", "11/13/2018", "X101"]
with a mixture of strings and numbers, it's actually not that difficult to write a monadic parser, based on the list monad, that generates all possible parses.
The key idea is to define a parser as a function that takes a stream and generates a list of possible parses, with each possible parse represented as a tuple of the object successfully parsed from the beginning of the stream paired with the remainder of the stream. Wrapped in a newtype, our parallel parser would look like:
newtype PParser a = PParser (Stream -> [(a, Stream)]) deriving (Functor)
Note the similarity to the type ReadS from Text.ParserCombinators.ReadP, which is also technically an "all possible parses" parser (though you usually only expect one, unambiguous parse back from a reads call):
type ReadS a = String -> [(a, String)]
Anyway, we can define a Monad instance for PParser like so:
instance Applicative PParser where
pure x = PParser (\s -> [(x, s)])
(<*>) = ap
instance Monad PParser where
PParser p >>= f = PParser $ \s1 -> do -- in list monad
(x, s2) <- p s1
let PParser q = f x
(y, s3) <- q s2
return (y, s3)
There's nothing too tricky here: pure x returns a single possible parse, namely the result x with an unchanged stream s, while p >>= f applies the first parser p to generate a list of possible parses, takes them one by one within the list monad to calculate the next parser q to use (which, as per usual for a monadic operation, can depend on the result of the first parse), and generates a list of possible final parses that are returned.
The Alternative and MonadPlus instances are pretty straightforward -- they just lift emptiness and alternation from the list monad:
instance Alternative PParser where
empty = PParser (const empty)
PParser p <|> PParser q = PParser $ \s -> p s <|> q s
instance MonadPlus PParser where
To run our parser, we have:
parse :: PParser a -> Stream -> [a]
parse (PParser p) s = map fst (p s)
and now we can introduce primitives:
-- read a token as-is
token :: PParser String
token = PParser $ \s -> case s of
(x:xs) -> pure (x, xs)
_ -> empty
-- require an end of stream
eof :: PParser ()
eof = PParser $ \s -> case s of
[] -> pure ((), s)
_ -> empty
and combinators:
-- combinator to convert a String to any readable type
convert :: (Read a) => PParser String -> PParser a
convert (PParser p) = PParser $ \s1 -> do
(x, s2) <- p s1 -- for each possible String
(y, "") <- reads x -- get each possible full read
-- (normally only one)
return (y, s2)
and parsers for various "terms" in our CSV row:
-- read a string from a single cell
str :: PParser String
str = token
-- read an integer (any size) from a single cell
int :: PParser Int
int = convert (mfilter ok1 token)
-- read a double from one or two cells
dbl :: PParser Double
dbl = dbl1 <|> dbl2
where dbl1 = convert (mfilter ok1 token)
dbl2 = convert $ do
t1 <- mfilter ok1 token
t2 <- mfilter ok2 token
return $ t1 ++ "." ++ t2
-- read a double that's < 30
dbl30 :: PParser Double
dbl30 = do
x <- dbl
guard $ x < 30
return x
-- rules for first cell of numbers:
-- "0" is okay, but otherwise can't start with "0"
ok1 :: String -> Bool
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- rules for second cell of numbers:
-- can't be "0" or end in "0"
ok2 :: String -> Bool
ok2 xs = last xs /= '0'
Then, for a particular row schema, we can write a row parser as we normally would with a monadic parser:
-- a row
data Row = Row String Int Double Double Double
Int String String deriving (Show)
rowResults :: PParser Row
rowResults = Row <$> str <*> int <*> dbl30 <*> dbl30 <*> dbl30
<*> int <*> str <*> str <* eof
and get all possible parses:
> parse rowResults row0
[Row "Apple" 15 1.5016 2.0 5.3 1801 "11/13/2018" "X101"
,Row "Apple" 15 1.5016 2.5 3.0 1801 "11/13/2018" "X101"]
>
The full program is:
{-# LANGUAGE DeriveFunctor #-}
{-# OPTIONS_GHC -Wall #-}
import Control.Monad
import Control.Applicative
type Stream = [String]
newtype PParser a = PParser (Stream -> [(a, Stream)]) deriving (Functor)
instance Applicative PParser where
pure x = PParser (\s -> [(x, s)])
(<*>) = ap
instance Monad PParser where
PParser p >>= f = PParser $ \s1 -> do -- in list monad
(x, s2) <- p s1
let PParser q = f x
(y, s3) <- q s2
return (y, s3)
instance Alternative PParser where
empty = PParser (const empty)
PParser p <|> PParser q = PParser $ \s -> p s <|> q s
instance MonadPlus PParser where
parse :: PParser a -> Stream -> [a]
parse (PParser p) s = map fst (p s)
-- read a token as-is
token :: PParser String
token = PParser $ \s -> case s of
(x:xs) -> pure (x, xs)
_ -> empty
-- require an end of stream
eof :: PParser ()
eof = PParser $ \s -> case s of
[] -> pure ((), s)
_ -> empty
-- combinator to convert a String to any readable type
convert :: (Read a) => PParser String -> PParser a
convert (PParser p) = PParser $ \s1 -> do
(x, s2) <- p s1 -- for each possible String
(y, "") <- reads x -- get each possible full read
-- (normally only one)
return (y, s2)
-- read a string from a single cell
str :: PParser String
str = token
-- read an integer (any size) from a single cell
int :: PParser Int
int = convert (mfilter ok1 token)
-- read a double from one or two cells
dbl :: PParser Double
dbl = dbl1 <|> dbl2
where dbl1 = convert (mfilter ok1 token)
dbl2 = convert $ do
t1 <- mfilter ok1 token
t2 <- mfilter ok2 token
return $ t1 ++ "." ++ t2
-- read a double that's < 30
dbl30 :: PParser Double
dbl30 = do
x <- dbl
guard $ x < 30
return x
-- rules for first cell of numbers:
-- "0" is okay, but otherwise can't start with "0"
ok1 :: String -> Bool
ok1 "0" = True
ok1 (c:_) | c /= '0' = True
ok1 _ = False
-- rules for second cell of numbers:
-- can't be "0" or end in "0"
ok2 :: String -> Bool
ok2 xs = last xs /= '0'
-- a row
data Row = Row String Int Double Double Double
Int String String deriving (Show)
rowResults :: PParser Row
rowResults = Row <$> str <*> int <*> dbl30 <*> dbl30 <*> dbl30
<*> int <*> str <*> str <* eof
row0 :: Stream
row0 = ["Apple", "15", "1", "5016", "2", "5", "3", "1801", "11/13/2018", "X101"]
main = print $ parse rowResults row0
Off-the-shelf Solutions
I find it a little surprising I can't find an existing parser library out there that provides this kind of "all possible parses" parser. The stuff in Text.ParserCombinators.ReadP takes the right approach, but it assumes that you're parsing characters from a String rather than arbitrary tokens from some other stream (in our case, Strings from a [String]).
Maybe someone else can point out an off-the-shelf solution that would save you from having to role your own parser type, instances, and primitives.

Haskell: Elegant solution for integer parsing

I want to write a function that gets the first integer in a string, if the integer is at the beginning of the string and is followed by space or nothing.
For example, "12" and "12 kids" would be valid strings, but "12kids", "a12" are invalid.
This is my function:
getFirstInteger :: String -> Int
getFirstInteger [] = error "No integer"
getFirstInteger str
| dropWhile (Char.isNumber) str == [] = read str :: Int
| Char.isSpace $ head $ dropWhile (Char.isNumber) str = read (takeWhile (Char.isNumber) str) :: Int
| otherwise = error "No integer found"
The problem is that if the string is actually an integer, head will raise an exception, so that is why the first condition exists. Is there any elegant solution to this problem?

getFirstInteger :: String -> Int
getFirstInteger = read . head . words
words will split the String into a list of Strings, which were originally separated by whitespace. head will take the first (or error if the original string was empty), and read will parse the string as usual (and error if the first word wasn't a valid Int).
However, I prefer a variant that doesn't use error on empty Strings or unparseable ones, e.g.
import Text.Read (readMaybe)
getFirstInteger :: String -> Maybe Int
getFirstInteger [] = Nothing
getFirstInteger xs = readMaybe . head . words $ xs
One could write this completely point-free with listToMaybe from Data.Maybe, but that's probably an overkill:
import Data.Maybe (listToMaybe)
import Text.Read (readMaybe)
import Control.Monad ((>=>))
getFirstInteger :: String -> Maybe Int
getFirstInteger = listToMaybe . words >=> readMaybe

If you want to parse strings without using parser combinator libraries or any of the machinery in Text.Read, have a look at the functions break and span:
span :: (a -> Bool) -> [a] -> ([a], [a])
break :: (a -> Bool) -> [a] -> ([a], [a])
The nice thing is that both of these functions not only return what they match but also the remainder of the string allowing you to continue your parsing.
To solve your problem:
import Data.Char
getFirstInteger :: String -> Maybe Int
getFirstInteger str
let (digs, rest1) = span isDigit str
endsok = case rest1 of
[] -> True
(c:_) -> c == ' '
in
if not (null digs) && endsok
then Just (read digs)
else Nothing -- either no digits or doesn't end properly
This version does not allow a leading minus sign. This next version allows an optional leading minus sign for the integer:
getFirstInteger' str =
let (minus,rest1) = span (=='-') str
(digs, rest2) = span isDigit rest1
endsok = case rest2 of
[] -> True
(c:_) -> c == ' '
in
if length minus <= 1 && not (null digs) && endsok
then Just (read (minus ++ digs))
else Nothing
Yes - this does not terminate as early as possible on bad input. I provide it mainly as an example of how to chain together calls to span and break.

Use reads. For example:
type Unit = String
readUnit :: String -> Maybe (Int, Maybe Unit)
readUnit s = case reads s of -- the integer is at the beginning of the string and...
(n, ' ':unit):_ -> Just (n, Just unit) -- is followed by space...
(n, "" ):_ -> Just (n, Nothing) -- or nothing.
_ -> Nothing
In ghci:
> readUnit "12"
Just (12,Nothing)
> readUnit "12 kids"
Just (12,Just "kids")
> readUnit "12kids"
Nothing
> readUnit "a12"
Nothing
However, there are a few minor considerations to keep in mind. It's possible that read does not restrict the syntax as much as you might want; for example, the following answer may surprise you:
> readUnit " ((-0x5)) kids"
Just (-5,Just "kids")
You may also want to drop extraneous spaces in the unit; for example, you could change the first clause above to
(n, ' ':unit):_ -> Just (n, Just (dropWhile isSpace unit))
or similar. And as a final variation on this theme, note that while the standard instances of Read never return lists with more than one element from reads, it is technically possible that some user-supplied type may do so. So if you were ever to use reads to parse types other than Int, you may want to either demand an unambiguous parse or consider all the parses before deciding what to do; the above code bakes in the assumption that the first parse is as good as any.

Repeat character a random number of times in Haskell

I'm trying to create a function for a silly IRC bot that will return a phrase where some of the letters are repeated a random number of times. The problem I'm having is that I can't find a way to use random numbers that ghc likes. It seems that even using this answer isn't being particularly helpful for getting my code to compile.
import System.Random
-- Write bad
baaad x y = "B" ++ (repeatA x) ++ "D " ++ (exclaim y)
-- StartHere
randomBad :: String
randomBad = do
x <- randomRIO(5,10) :: IO Int
y <- randomRIO(0,6) :: IO Int
return $ baaad x y
repeatA :: Int -> String
repeatA x = rptChr "A" x
exclaim :: Int -> String
exclaim x = rptChr "!" x
rptChr :: String -> Int -> String
rptChr x y = take y (cycle x)
Even with the trick of using a do block and passing the IO Ints to the function that way, I'm still getting compile errors that it found an IO Int when expecting Int.

randomBad is not in the IO monad.... It is type String, but you are defining it to be type IO String
Change this
randomBad :: String
to this
randomBad :: IO String
Then you should be able to use this in another IO action, like main:
main = do
theString <- randomBad
putStrLn theString

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

read data from a list of strings - haskell

In GHCI: data Something = Something Integer String String deriving (Read, Show) let somethingStrings = ["Something", "3", "text", "42"] let escapeForSomething [a,b,c,d] = [a, b, "\""++c++"\"", "\""++d++"\""] let something = read (unwords (escapeForSomething somethingStrings)) :: Something

Related

Monadic Parser - handling string with one character

Haskell: Convert String to [(String,Double)]

How far does "try" back track?

Haskell: Elegant solution for integer parsing

Repeat character a random number of times in Haskell

Categories

Resources