So I have a file where the first line has the format ([String], [(Int, Int)], [(Int, Int)], Int) and the rest of the lines have the format [((Int, Int), (Int, Int), String)]. I managed to get the the input and parse it with the following function:
someFunction :: String -> IO (String)
someFunction fileName = do
handle <- openFile fileName ReadMode
contents <- hGetLine handle
let firstLine = read contents :: ([String], [(Int, Int)], [(Int, Int)], Int)
restOfLines <- map read <$> lines <$> hGetContents handle :: IO [((Int, Int), (Int, Int), String)]
...
The thing is, that I want to print a custom error if the file has the wrong format. So if something is missing or what not, it should only print "some error". Otherwise, the lines have to be parsed so I can do some other things with the content. I would appreciate it if someone could help me with this.
The simplest way to detect whether a Read parser failed is to use readMaybe. E.g. in GHCi:
> import Text.Read (readMaybe)
> :t readMaybe
readMaybe :: Read a => String -> Maybe a -- Defined in ‘Text.Read’
> readMaybe "1" :: Maybe Int
Just 1
> readMaybe ":(" :: Maybe Int
Nothing
You can pattern-match on the result of readMaybe with case like any other use of Maybe:
case readMaybe input of
Just parsed -> … -- Use parsed value
Nothing -> … -- Report error
This can only report a generic failure; for more complex parsing and validation, you should use a proper parsing library. One included with base is Text.ParserCombinators.ReadP:
import Control.Applicative (some)
import Data.Char (isDigit)
import Text.ParserCombinators.ReadP (ReadP)
import qualified Text.ParserCombinators.ReadP as ReadP
-- Parse one or more digits.
number :: ReadP Int
number = read <$> some (ReadP.satisfy isDigit)
These parsers can be executed on some input with readP_to_S; by default they enumerate all possible parses, which you can constrain with functions like eof (require complete input) or <++ (biased choice).
> import Text.ParserCombinators.ReadP (readP_to_S)
> readP_to_S number "123"
[(1,"23"),(12,"3"),(123,"")]
> readP_to_S (number <* ReadP.eof) "123"
[(123,"")]
You can pattern-match on the resulting list to validate, for example, that there’s only one result and extract what you want or report the error. Another popular choice of parsing library is the megaparsec package, which can additionally provide nice custom error messages with source locations.
Related
I am trying to compose a parser with ReadP, and I wanted to use read to parse numbers. But I must be missing something because even the most trivial example misbehaves:
λ import Text.ParserCombinators.ReadP
λ (readP_to_S . readS_to_P $ (read :: ReadS Int)) "123" :: [(Int, String)]
*** Exception: Prelude.read: no parse
I even specified the types everywhere, but it just does not work. What am I doing wrong?
You need to replace your use of read with reads:
> (readP_to_S . readS_to_P $ (reads :: ReadS Int)) "123" :: [(Int, String)]
^^^^^
[(123,"")]
It's unfortunate that it typechecked with read, but it's because the type signature for read is so general. In fact, the parser read :: ReadS Int is trying to parse a list of tuples of integers and strings!!!
> (read :: ReadS Int) "[(1,\"hi\"),(2,\"bye\")]"
[(1,"hi"),(2,"bye")]
which is definitely not what you want.
I'm trying to get a number from IO like this:
numberString <- getLine
print 3 + read numberString
This works if numberString is a good string of number (like "3241"), but when it's not something that good (like "124gjh"), it throws an exception:
*** Exception: Prelude.read: no parse
There's a reads function which returns a [(a0, String)] and when nothing is matched this would be a []. Is there an easy way that I have something like this:
read' :: String -> Maybe a
so that I just get a Nothing if things doesn't work instead of just stopping abruptly?
There is readMaybe right in Text.Read which should do exactly what you asked for:
Prelude> import Text.Read(readMaybe)
Prelude Text.Read> readMaybe "3241" :: Maybe Int
Just 3241
Prelude Text.Read> readMaybe "Hello" :: Maybe Int
Nothing
I have a question. There is any solution for reading from file list of tuples ? Depends on content ?
I know that if i need to read integers i do something like that:
toTuple :: [String] -> [(Int,Int)]
toTuple = map (\y -> read y ::(Int,Int))
But in file i can have tuples this kind (int,int) or (char, int). Is any way to do this nice ?
I was trying to do this at first in finding sign " ' " . If it was, then reading chars, but it doesn't work for some reason.
[Edit]
To function to tuple, i give strings with tuples, before that i splits lines by space sign.
INPUT EXAMPLE:
Case 1 : ["(1,2)", "(1,3)" ,"(3,4)" ,"(1,4)"]
Case 2 : ["('a',2)", "('b',3)", "('g',8)", "('h',2)", "('r',4)"]
Just try both and choose the successful:
import Text.Read
import Control.Applicative
choose :: Maybe a -> Maybe b -> Maybe (Either a b)
choose x y = fmap Left x <|> fmap Right y
readListMaybe :: Read a => [String] -> Maybe [a]
readListMaybe = mapM readMaybe
toTuple :: [String] -> Maybe (Either [(Int, Int)] [(Char, Int)])
toTuple ss = readListMaybe ss `choose` readListMaybe ss
main = do
-- Just (Left [(1,2),(1,3),(3,4),(1,4)])
print $ toTuple ["(1,2)", "(1,3)" ,"(3,4)" ,"(1,4)"]
-- Just (Right [('a',2),('b',3),('g',8),('h',2),('r',4)])
print $ toTuple ["('a',2)", "('b',3)", "('g',8)", "('h',2)", "('r',4)"]
Here is a far more efficient (and unsafe) version:
readListWithMaybe :: Read a => String -> [String] -> Maybe [a]
readListWithMaybe s ss = fmap (: map read ss) (readMaybe s)
toTuple :: [String] -> Either [(Int, Int)] [(Char, Int)]
toTuple [] = Left []
toTuple (s:ss) = fromJust $ readListWithMaybe s ss `choose` readListWithMaybe s ss
In the first definition of toTuple
toTuple :: [String] -> Maybe (Either [(Int, Int)] [(Char, Int)])
toTuple ss = readListMaybe ss `choose` readListMaybe ss
readListMaybe is too strict:
readListMaybe :: Read a => [String] -> Maybe [a]
readListMaybe = mapM readMaybe
mapM is defined in terms of sequence which is defined in terms of (>>=) which is strict for the Maybe monad. And also the reference to ss is keeped for too long. The second version doesn't have these problems.
As I said it may be a good idea to consider using a parsing library, if the task at hand gets a bit more complicated.
First of all you have the benefit of getting error messages and if you decide to switch to a self declared data Type it is still easily applicable (with slight modifications of course).
Also switching from ByteString to Text (which are both preferable to working with String anyways) is just a matter of (un)commenting 4 lines
Here is some example if you have not had the pleasure to work with it.
I'll explain it some time later today - for I have to leave now.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Attoparsec.ByteString.Char8
import Data.ByteString.Char8 as X
-- import Data.Attoparsec.Text
-- import Data.Text as X
main :: IO ()
main = do print <$> toTuples $ X.unlines ["(1,2)","(1,3)","(3,4)","(1,4)"]
print <$> toTuples $ X.unlines ["('a',2)","('h',2)","('r',4)"]
print <$> toTuples $ X.unlines ["('a',2)","(1,3)","(1,4)"] --works
print <$> toTuples $ "('a',2)" -- yields Right [Right ('a',2)]!!
print <$> toTuples $ "(\"a\",2)" -- yields Right []!!
toTuples = parseOnly (myparser `sepBy` skipSpace :: Parser [Either (Int,Int) (Char,Int)])
where myparser :: Parser (Either (Int,Int) (Char,Int))
myparser = eitherP (tupleP decimal decimal)
(tupleP charP decimal)
charP = do char '\''
c <- notChar '\''
char '\''
return c
tupleP :: Parser a -> Parser b -> Parser (a, b)
tupleP a b = do char '('
a' <- a
skipSpace
char ','
skipSpace
b' <- b
char ')'
return (a',b')
Edit: Explanation
Parser is a monad, so it comes with do-notation which enables us to write the tupleP function in this very convenient form. Same goes for charP - we describe what to parse in the primitives given by the attoparsec library
and it reads something like
first expect a quote
then something that is not allowed to be a quote
and another quote
return the not quote thingy
if you can write down the parser informally you're most likely halfway through writing the haskell code, the only thing left to do is find the primitives in the library or write some auxilary function like tupleP.
A nice thing is that Parsers (being monads) compose nicely so we get our desired parser eitherP (tupleP ..) (tupleP ..).
The only magic that happens in the print <$>.. lines is that Either is a functor and every function using <$> or fmap uses the Right side of the Eithers.
Last thing to note is sepBy returns a list - so in the case where the parsing fails we still get an empty list as a result, if you want to see the failing use sepBy1 instead!
I need to parse and process a text file that is a nested list of integer. The file is about 250mb large. This already leads to performace problems my naive solution takes 20GB or more of RAM.
The question is related to another question.
I have written about the memory problems and the suggestion was to use Data.Vector to get rtid of the memory problems.
So the goal is to process a nested list of integers and, say, filter the values so that only values larger than 30 get printed out.
Test file "myfile.tx":
11,22,33,44,55
66,77,88,99,10
Here is my code using Attoparsec, adapted from attoparsec-csv:
{-# Language OverloadedStrings #-}
-- adapted from https://github.com/robinbb/attoparsec-csv
module Text.ParseCSV
(
parseCSV
) where
import Prelude hiding (concat, takeWhile)
import Control.Applicative ((<$>), (<|>), (<*>), (<*), (*>), many)
import Control.Monad (void, liftM)
import Data.Attoparsec.Text
import qualified Data.Text as T (Text, concat, cons, append, pack, lines)
import qualified Data.Text.IO as IO (readFile, putStr)
import qualified Data.ByteString.Char8 as BSCH (readInteger)
lineEnd :: Parser ()
lineEnd =
void (char '\n') <|> void (string "\r\n") <|> void (char '\r')
<?> "end of line"
parserInt :: Parser Integer
parserInt = (signed decimal)
record :: Parser [Integer]
record =
parserInt `sepBy1` char ','
<?> "record"
file :: Parser [[Integer]]
file =
(:) <$> record
<*> manyTill (lineEnd *> record)
(endOfInput <|> lineEnd *> endOfInput)
<?> "file"
parseCSV :: T.Text -> Either String [[Integer]]
parseCSV =
parseOnly file
getValues :: Either String [[Integer]] -> [Integer]
getValues (Right [x]) = x
getValues _ = []
getLines :: FilePath -> IO [T.Text]
getLines = liftM T.lines . IO.readFile
parseAndFilter :: T.Text -> [Integer]
parseAndFilter = ((\x -> filter (>30) x) . getValues . parseCSV)
main = do
list <- getLines "myfile.txt"
putStr $ show $ map parseAndFilter list
But instead of using a list [Integer] I would like to use Data.Vector.
I found a relevant part in the Data.Vector tutorial:
--The simplest way to parse a file of Int or Integer types is with a strict or lazy --ByteString, and the readInt or readInteger functions:
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.Vector as U
import System.Environment
main = do
[f] <- getArgs
s <- L.readFile f
print . U.sum . parse $ s
-- Fill a new vector from a file containing a list of numbers.
parse = U.unfoldr step
where
step !s = case L.readInt s of
Nothing -> Nothing
Just (!k, !t) -> Just (k, L.tail t)
However, this is regular, not a nested list of integers.
I tried to adapt my code but it did not work.
How can I change my code to
use a nested Vector (or Vector of Vectors) instead of [Integer] (i.e., while also running the Filter of >30 on the Vector).
There is an important question you don't mention in the posting.... Do you need everything in memory at once. If the processing is local, or if you can summarize all the data up to a point in the file with a few values, you can solve the performance problems by streaming the data through and throwing away all but the current line. This will usually run way faster and allow you to process orders of magnitude larger files. And it usually doesn't even matter (as much) what data structure you use to parse the values.
Here is an example:
import Text.Regex
process::[Int]->String
process = (++"\n") . show . sum --put whatever you want here.
main = interact (concat . map (process . map read . splitRegex (mkRegex ",")) . lines)
The whole program runs lazily, so it processes line by line as the data comes in and frees up the memory for old data (you can check this by typing in data by hand and watch the output come out). There is a performance hit by using the unpacked structures, but this isn't as big a problem as pulling everything into memory.
Many problems that don't seem to fit this criteria at first can be modified to do so (you may have to sort the data first, but there are many performance effective ways to do this).... I rewrote the full online stats system for a gaming company once following this principle, and was able to take a stats crunching time from hours to a couple of minutes (with even more metrics).
Because of its lazy nature, Haskell is a good language to stream data through.
I found a post that there is no easy way to parse with attoparsec to a vector.
See this forum post and thread.
But the good new is that the overhead of Data.Vector.fromList isn't so bad.
Attoparsec seems to be quite fast for parsing.
I keep the whole data in memory and this doesn't seem a speed overhead. It's more flexible, as perhaps later I need to have the whole data in memory, altough currently it is not needed per se for my problem.
Currently the code runs in ~30 seconds and about 1.5GB RAM for a 150MB text file. Now the memory consumption is quite little versus 20GB of before and I only need to focus on improving the speed.
Here are the changes from the code of my question my post, commented out code is using lists, functions with Vector in the type are new (this is not production code or meant to be good code yet):
{-
getValues :: Either String [[Integer]] -> [Integer]
getValues (Right [x]) = x
getValues _ = []
-}
getValues :: Either String [[Integer]] -> Vector Integer
getValues (Right [x]) = V.fromList x
getValues _ = V.fromList [999999,9999999,99999,999999] --- represents an ERROR
getLines :: FilePath -> IO [T.Text]
getLines = liftM T.lines . IO.readFile
{-
parseAndFilter :: T.Text -> [Integer]
parseAndFilter = ((\x -> filter (>30) x) . getValues . parseCSV)
-}
filterLarger :: Vector Integer -> Vector Integer
filterLarger = \x -> V.filter (>37) x
parseVector :: T.Text -> Vector Integer
parseVector = (getValues . parseCSV)
-- mystr = T.pack "3, 6, 7" --, 13, 14, 15, 17, 21, 22, 23, 24, 25, 28, 29, 30, 32, 33, 35, 36"
main = do
list <- getLines "mydata.txt"
--putStr $ show $ parseCSV $ mystr
putStr $ show $ V.map filterLarger $ V.map parseVector $ V.fromList list
--show $ parseOnly parserInt $ T.pack "123"
Thanks to jamshidh and all the comments that pointed me to the right direction.
Here is the final solution. Switching to ByteString and Int in the code, it now runs twice as fast and a bit less memory consumtion (time is now ~14 Seconds).
{-# Language OverloadedStrings #-}
-- adapted from https://github.com/robinbb/attoparsec-csv
module Main
(
parseCSV, main
) where
import Data.Vector as V (Vector, fromList, map, head, filter)
import Prelude hiding (concat, takeWhile)
import Control.Applicative ((<$>), (<|>), (<*>), (<*), (*>), many)
import Control.Monad (void, liftM)
import Data.Attoparsec.Char8
import qualified Data.ByteString.Char8 as B
lineEnd :: Parser ()
lineEnd =
void (char '\n') <|> void (string "\r\n") <|> void (char '\r')
<?> "end of line"
parserInt :: Parser Int
parserInt = skipSpace *> signed decimal
record :: Parser [Int]
record =
parserInt `sepBy1` char ','
<?> "record"
file :: Parser [[Int]]
file =
(:) <$> record
<*> manyTill (lineEnd *> record)
(endOfInput <|> lineEnd *> endOfInput)
<?> "file"
parseCSV :: B.ByteString -> Either String [[Int]]
parseCSV =
parseOnly file
getValues :: Either String [[Int]] -> Vector Int
getValues (Right [x]) = V.fromList x
getValues _ = error "ERROR in getValues function!"
filterLarger :: Vector Int -> Vector Int
filterLarger = \x -> V.filter (>36) x
parseVector :: B.ByteString -> Vector Int
parseVector = (getValues . parseCSV)
-- MAIN
main = do
fContent <- B.readFile "myfile.txt"
putStr $ show $ V.map filterLarger $ V.map parseVector $ V.fromList $ B.lines fContent
I want to parse String to get Int and I use this:
string2int :: String -> Int
string2int str = read str::Int
Now I want to catch paring exception/error as SIMPLY as possible.
I tried:
import qualified Control.Exception as E
eVal <- try (print (string2int "a")) :: IO (Either E.SomeException ())
case eVal of
Left e -> do { putStrLn "exception"; }
Right n -> do { putStrLn "good"; }
But compiler says couldn't match expected type 'E.SomeException()'
with actual type E.IOException.
What am I doing wrong?
Ok I don't know how to use it for my problem: I want somthing like this:
loadfunction = do
{
x <- string2int getLine
if( failed parsing int ) call somefunction
y <- string2int getLine
if( failed parsing int ) call somefunction
otherfunction x y
}
I dont know how to do it using your answers...
You're using try imported from the old exceptions mechanism, but are trying to use its result type as if it was using the new extensible Control.Exception mechanism. Use E.try instead.
You should ideally import Control.Exception like this:
import Prelude hiding (catch)
import Control.Exception
and remove all imports of Control.OldException. Then you can use its functions directly without worrying about any clashes.
By the way, you don't have to use IO exceptions to handle read errors; you can use reads instead:
reads :: (Read a) => String -> [(a, String)]
Here's how I'd write your code with reads:
case reads "a" of
[(a, "")] -> do
print a
putStrLn "good"
_ -> putStrLn "exception"
The fact that reads returns a list is a little confusing; practically, you can think of it as returning Maybe (a, String) instead. If you want a version using Maybe, you can define it like this:
readMaybe :: (Read a) => String -> Maybe a
readMaybe s =
case reads s of
[(a, "")] -> Just a
_ -> Nothing
which makes your code become:
case readMaybe "a" of
Just a -> do
print a
putStrLn "good"
Nothing -> putStrLn "exception"
(You can also define readMaybe as listToMaybe . map fst . filter (null . snd) . reads like dave4420 did; they'll be equivalent in practice, since none of the standard Read instances ever return lists of more than one element.)
In general, you should try and use pure error-handling methods like this whenever possible, and only use IO exceptions when there's really no other option, or you're dealing with IO-specific code (like file/network handling, etc.). However, if you want to stick with exceptions, using E.try instead should fix your error.
Based on your updated question, however, exceptions might be the right way to go after all; something like ErrorT would also work, but if you're already doing everything in IO to start with, then there's no harm in using exceptions. So I would write your example like this:
loadfunction = do
line1 <- getLine
x <- string2int line1
line2 <- getLine
y <- string2int line2
otherfunction x y
and use E.catch to handle the exceptions it throws; take a look at the documentation for catch to see how to do that.