Error handling in pipes - haskell

Backstory
I have a number of data files, each of them containing a list of data records (one per line).
Similar to CSV but sufficiently different that I'd prefer to write my own parser rather than using a CSV library.
For the purpose of this question I will use a simplified data file that contains just one number per line:
1
2
3
error
4
As you can see it is possible that a file contains malformed data in which case the whole file should be considered malformed.
The kind of data-processing I want to do can be expressed in terms of maps and folds.
So, I thought this would be a good opportunity to learn how to use the pipes library.
{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.Except
import Pipes ((>->))
import qualified Pipes as P
import qualified Pipes.Prelude as P
import qualified Pipes.Safe as P
import qualified System.IO as IO
First, I create a producer of lines in the text file.
This is very similar to the example in the docs of Pipes.Safe.
getLines = do
P.bracket (IO.openFile "data.txt" IO.ReadMode) IO.hClose P.fromHandle
Next, I need a function to parse each of these lines.
As I mentioned before, this might fail, which I will represent with Either.
type ErrMsg = String
parseNumber :: String -> Either ErrMsg Integer
parseNumber s = case reads s of
[(n, "")] -> Right n
_ -> Left $ "Parse Error: \"" ++ s ++ "\""
For simplicity, as a first step, I want to collect all data records into a list of records.
The most straight-forward approach is to pipe all the lines through the parser and just collect the whole thing into a list.
readNumbers1 :: IO [Either ErrMsg Integer]
readNumbers1 = P.runSafeT $ P.toListM $
getLines >-> P.map parseNumber
Unfortunately, that creates a list of eithers of records.
However, if the file contains one wrong record then the whole file should be considered wrong.
What I really want is an either of a list of records.
Of course I can just use sequence to transpose the list of eithers.
readNumbers2 :: IO (Either ErrMsg [Integer])
readNumbers2 = sequence <$> readNumbers1
But, that would read the whole file even if the first line is already malformed.
These files can be large and I have many of them, so, it would be better if the reading would stop at the first error.
Question
My Question is how to achieve that.
How to abort parsing upon the first malformed record?
What I got so far
My first thought was to use the monad instance of Either ErrMsg and P.mapM instead of P.map.
Since we are reading from a file we already have IO and SafeT in our monad stack, so, I guess I'll need ExceptT to get error handling into that monad stack.
This is the point where I'm stuck.
I tried many different combinations and always ended up being yelled at by the type-checker.
The following is the closest I can get to it compiles.
readNumbers3 = P.runSafeT $ runExceptT $ P.toListM $
getLines >-> P.mapM (ExceptT . return . parseNumber)
The infered type of readNumbers3 reads
*Main> :t readNumbers3
readNumbers3
:: (MonadIO m, P.MonadSafe (ExceptT ErrMsg (P.SafeT m)),
P.MonadMask m, P.Base (ExceptT ErrMsg (P.SafeT m)) ~ IO) =>
m (Either ErrMsg [Integer])
which looks close to what I want:
readNumbers3 :: IO (Either ErrMsg [Integer])
However, as soon as I try to actually execute that action I get the following error message in ghci:
*Main> readNumbers3
<interactive>:7:1:
Couldn't match expected type ‘IO’
with actual type ‘P.Base (ExceptT ErrMsg (P.SafeT m0))’
The type variable ‘m0’ is ambiguous
In the first argument of ‘print’, namely ‘it’
In a stmt of an interactive GHCi command: print it
If I try to apply the following type-signature:
readNumbers3 :: IO (Either ErrMsg [Integer])
Then I get the following error message:
error.hs:108:5:
Couldn't match expected type ‘IO’
with actual type ‘P.Base (ExceptT ErrMsg (P.SafeT IO))’
In the first argument of ‘(>->)’, namely ‘getLines’
In the second argument of ‘($)’, namely
‘getLines >-> P.mapM (ExceptT . return . parseNumber)’
In the second argument of ‘($)’, namely
‘P.toListM $ getLines >-> P.mapM (ExceptT . return . parseNumber)’
Failed, modules loaded: none.
Aside
Another motivation for moving the error handling into the pipe's base monad is that it would make further data processing much easier if I wouldn't have to juggle with eithers in my maps and folds.

Here is an incremental approach to solving the problem.
Following Tekmo's suggestion in this SO answer
we aim to operate in the following monad:
ExceptT String (Pipe a b m) r
We begin with imports and the definition of parseNumber:
import Control.Monad.Except
import Pipes ((>->))
import qualified Pipes as P
import qualified Pipes.Prelude as P
parseNumber :: String -> Either String Integer
parseNumber s = case reads s of
[(n, "")] -> Right n
_ -> Left $ "Parse Error: \"" ++ s ++ "\""
Here is a plain Producer of Strings in the IO-monad we'll use as our input:
p1 :: P.Producer String IO ()
p1 = P.stdinLn >-> P.takeWhile (/= "quit")
To lift it to the ExceptT monad we just use lift:
p2 :: ExceptT String (P.Producer String IO) ()
p2 = lift p1
Here is a pipeline segment which converts Strings to Integers in the ExceptT monad:
p4 :: ExceptT String (P.Pipe String Integer IO) a
p4 = forever $
do s <- lift P.await
case parseNumber s of
Left e -> throwError e
Right n -> lift $ P.yield n
The probably can be written more combinatorially, but I've left it very explicit for clarity.
Next we join p2 and p4 together. The result is also in the ExceptT monad.
-- join together p2 and p4
p7 :: ExceptT String (P.Producer Integer IO) ()
p7 = ExceptT $ runExceptT p2 >-> runExceptT p4
Tekmo's SO answer suggests creating a new operator for this.
Finally, we can use toListM' to run this pipeline. (I've included the definition of toListM' here because it doesn't appear in my installed version of Pipes.Prelude)
p8 :: IO ([Integer], Either String ())
p8 = toListM' $ runExceptT p7
toListM' :: Monad m => P.Producer a m r -> m ([a], r)
toListM' = P.fold' step begin done
where
step x a = x . (a:)
begin = id
done x = x []
Examples of how p8 works:
ghci> p8
4
5
6
quit
([4,5,6],Right ())
ghci> p8
5
asd
([5],Left "Parse Error: \"asd\"")
Update
You can simplify the code by generalizing parseNumber like this:
parseNumber' :: (MonadError [Char] m) => String -> m Integer
parseNumber' s = case reads s of
[(n, "")] -> return n
_ -> throwError $ "Parse Error: \"" ++ s ++ "\""
Then p4 may be written:
p4' :: ExceptT String (P.Pipe String Integer IO) a
p4' = forever $ lift P.await >>= parseNumber' >>= lift . P.yield

Related

Haskell: Handling resulting Either from computations

I have revisited Haskell lateley and constructed a toy programming language parser/interpreter. Using Parsec for lexing and parsing and a separate interpreter. I'm running in to some issues with feeding the result from the parser to my interpreter and handle the potential error from both the interpreter and parser. I end up with something like this:
main = do
fname <- getArgs
input <- readFile (head fname)
case lparse (head fname) input of
Left msg -> putStrLn $ show msg
Right p -> case intrp p of
Left msg -> putStrLn $ show msg
Right r -> putStrLn $ show r
This dosn't look pretty at all. My problem is that lparse returns Either ParseError [(String, Stmt)] and itrp returns the type Either ItrpError Stmt so I'm having a real hard time feeding the Right result from the parser to the interpreter and at the same time bail and print the possible ParseError or IntrpError.
The closest to what i want is something like this
main = do
fname <- getArgs
input <- readFile (head fname)
let prog = lparse (head fname) input
(putStrLn . show) (intrp <$> prog)
But this will not surprisingly yield a nested Either and not print pretty either.
So are there any nice Haskell ideomatic way of doing this threading results from one computation to another and handling errors (Lefts) in a nice way without nesting cases?
Edit
adding types of lparse and itrp
lparse :: Text.Parsec.Pos.SourceName -> String -> Either Text.Parsec.Error.ParseError [([Char], Stmt)]
intrp :: [([Char], Stmt)] -> Either IntrpError Stmt
While not perfect, I'd create a helper function for embedding any Showable error from Either into MonadError:
{-# LANGUAGE FlexibleContexts #-}
import Control.Monad.Except
strErr :: (MonadError String m, Show e) => Either e a -> m a
strErr = either (throwError . show) return
Then if you have a computation that can fail with errors, like
someFn :: ExceptT String IO ()
someFn = strErr (Left 42)
you can run it (printing errors to stdout) as
main :: IO ()
main = runExceptT someFn >>= either putStrLn return
In your case it'd be something like
main = either putStrLn return <=< runExceptT $ do
fname <- liftIO getArgs
input <- liftIO $ readFile (head fname)
prog <- strErr $ lparse (head fname) input
r <- strErr $ interp prog
print r
Well, if you want to chain successful computations, you can always use >>= to do that. For instance in your case:
lparse (head fname) input >>= intrp
And if you want to print out either your error message you can use the either class that takes two handler functions, one for the case when you have Left a (error in your case) and another for Right b (in your case a successful thing). An example:
either (putStrLn . show) (putStrLn . show) (lparse (head fname) input >>= intrp)
And if anything fails in your chain (any step of your monadic chain becomes Left a) it stops and can for instance print out the error message in the above case.

Reading from file list of chars or list of ints

I have a question. There is any solution for reading from file list of tuples ? Depends on content ?
I know that if i need to read integers i do something like that:
toTuple :: [String] -> [(Int,Int)]
toTuple = map (\y -> read y ::(Int,Int))
But in file i can have tuples this kind (int,int) or (char, int). Is any way to do this nice ?
I was trying to do this at first in finding sign " ' " . If it was, then reading chars, but it doesn't work for some reason.
[Edit]
To function to tuple, i give strings with tuples, before that i splits lines by space sign.
INPUT EXAMPLE:
Case 1 : ["(1,2)", "(1,3)" ,"(3,4)" ,"(1,4)"]
Case 2 : ["('a',2)", "('b',3)", "('g',8)", "('h',2)", "('r',4)"]
Just try both and choose the successful:
import Text.Read
import Control.Applicative
choose :: Maybe a -> Maybe b -> Maybe (Either a b)
choose x y = fmap Left x <|> fmap Right y
readListMaybe :: Read a => [String] -> Maybe [a]
readListMaybe = mapM readMaybe
toTuple :: [String] -> Maybe (Either [(Int, Int)] [(Char, Int)])
toTuple ss = readListMaybe ss `choose` readListMaybe ss
main = do
-- Just (Left [(1,2),(1,3),(3,4),(1,4)])
print $ toTuple ["(1,2)", "(1,3)" ,"(3,4)" ,"(1,4)"]
-- Just (Right [('a',2),('b',3),('g',8),('h',2),('r',4)])
print $ toTuple ["('a',2)", "('b',3)", "('g',8)", "('h',2)", "('r',4)"]
Here is a far more efficient (and unsafe) version:
readListWithMaybe :: Read a => String -> [String] -> Maybe [a]
readListWithMaybe s ss = fmap (: map read ss) (readMaybe s)
toTuple :: [String] -> Either [(Int, Int)] [(Char, Int)]
toTuple [] = Left []
toTuple (s:ss) = fromJust $ readListWithMaybe s ss `choose` readListWithMaybe s ss
In the first definition of toTuple
toTuple :: [String] -> Maybe (Either [(Int, Int)] [(Char, Int)])
toTuple ss = readListMaybe ss `choose` readListMaybe ss
readListMaybe is too strict:
readListMaybe :: Read a => [String] -> Maybe [a]
readListMaybe = mapM readMaybe
mapM is defined in terms of sequence which is defined in terms of (>>=) which is strict for the Maybe monad. And also the reference to ss is keeped for too long. The second version doesn't have these problems.
As I said it may be a good idea to consider using a parsing library, if the task at hand gets a bit more complicated.
First of all you have the benefit of getting error messages and if you decide to switch to a self declared data Type it is still easily applicable (with slight modifications of course).
Also switching from ByteString to Text (which are both preferable to working with String anyways) is just a matter of (un)commenting 4 lines
Here is some example if you have not had the pleasure to work with it.
I'll explain it some time later today - for I have to leave now.
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Data.Attoparsec.ByteString.Char8
import Data.ByteString.Char8 as X
-- import Data.Attoparsec.Text
-- import Data.Text as X
main :: IO ()
main = do print <$> toTuples $ X.unlines ["(1,2)","(1,3)","(3,4)","(1,4)"]
print <$> toTuples $ X.unlines ["('a',2)","('h',2)","('r',4)"]
print <$> toTuples $ X.unlines ["('a',2)","(1,3)","(1,4)"] --works
print <$> toTuples $ "('a',2)" -- yields Right [Right ('a',2)]!!
print <$> toTuples $ "(\"a\",2)" -- yields Right []!!
toTuples = parseOnly (myparser `sepBy` skipSpace :: Parser [Either (Int,Int) (Char,Int)])
where myparser :: Parser (Either (Int,Int) (Char,Int))
myparser = eitherP (tupleP decimal decimal)
(tupleP charP decimal)
charP = do char '\''
c <- notChar '\''
char '\''
return c
tupleP :: Parser a -> Parser b -> Parser (a, b)
tupleP a b = do char '('
a' <- a
skipSpace
char ','
skipSpace
b' <- b
char ')'
return (a',b')
Edit: Explanation
Parser is a monad, so it comes with do-notation which enables us to write the tupleP function in this very convenient form. Same goes for charP - we describe what to parse in the primitives given by the attoparsec library
and it reads something like
first expect a quote
then something that is not allowed to be a quote
and another quote
return the not quote thingy
if you can write down the parser informally you're most likely halfway through writing the haskell code, the only thing left to do is find the primitives in the library or write some auxilary function like tupleP.
A nice thing is that Parsers (being monads) compose nicely so we get our desired parser eitherP (tupleP ..) (tupleP ..).
The only magic that happens in the print <$>.. lines is that Either is a functor and every function using <$> or fmap uses the Right side of the Eithers.
Last thing to note is sepBy returns a list - so in the case where the parsing fails we still get an empty list as a result, if you want to see the failing use sepBy1 instead!

Can I drop the IO monad on this pure function prettily?

It is quite hard to formulate good questions titles as a newbie. Please make this question search friendly =)
Trying to write my first "real" Haskell program (i.e. not only Project Euler stuff), I am trying to read and parse my configuration file with nice error messages. So far, I have this:
import Prelude hiding (readFile)
import System.FilePath (FilePath)
import System.Directory (doesFileExist)
import Data.Aeson
import Control.Monad.Except
import Data.ByteString.Lazy (ByteString, readFile)
-- Type definitions without real educational value here
loadConfiguration :: FilePath -> ExceptT String IO Configuration
loadConfiguration path = do
fileContent <- readConfigurationFile "C:\\Temp\\config.json"
configuration <- parseConfiguration fileContent
return configuration
readConfigurationFile :: FilePath -> ExceptT String IO ByteString
readConfigurationFile path = do
fileExists <- liftIO $ doesFileExist path
if fileExists then do
fileContent <- liftIO $ readFile path
return fileContent
else
throwError $ "Configuration file not found at " ++ path ++ "."
parseConfiguration :: ByteString -> ExceptT String IO Configuration
parseConfiguration raw = do
let result = eitherDecode raw :: Either String Configuration
case result of
Left message -> throwError $ "Error parsing configuration file: " ++ message
Right configuration -> return configuration
This works, but the IO monad in parseConfiguration is not necessary, and should go away. But I can't just drop it, of course, and I have not yet found a way to change parseConfiguration to something pure while keeping the prettyness of loadConfiguration.
What is the correct way to write this? If this is answered in the documentation, I am sorry, but I did not find it. I think reading the hackage documentation is a skill that grows as slowly as the rest of my Haskell skills. =)
P.S.: Comments on other style mistakes are, of course, very welcome!
If you are already using mtl, then the solution given by bheklilr in his comment is a good one. Make parseConfiguration work on any monad that implements MonadError.
If for whatever reason you are not using mtl, but only transformers, then you need'll a function with a type like Monad n => Except e a -> ExceptT e n a that "hoists" an Except into an ExceptT over some monad.
We can build this function using mapExceptT :: (m (Either e a) -> n (Either e' b)) -> ExceptT e m a -> ExceptT e' n b, a function that can change the base monad of an ExceptT transformer.
Except is really ExceptT Identity, so what we want is to unwrap the Identity and return the value in the new monad:
hoistExcept :: Monad n => Except e a -> ExceptT e n a
hoistExcept = mapExceptT (return . runIdentity)
You could also define it this way:
hoistExcept :: Monad n => Except e a -> ExceptT e n a
hoistExcept = ExceptT . return . runIdentity . runExceptT

don't understand this liftM2 behaviour in Fay

I have this haskell code which behaves as expected:
import Control.Monad
getVal1 :: Maybe String
getVal1 = Just "hello"
getVal2 :: Maybe String
getVal2 = Just "World"
main = process >>= putStrLn
process :: IO String
process = case liftM2 operation getVal1 getVal2 of
Nothing -> error "can't run operation, one of the params is Nothing"
Just result -> result
operation :: String -> String -> IO String
operation a b = return $ a ++ b
However when transposed to Fay, it doesn't typecheck:
{-# LANGUAGE NoImplicitPrelude, EmptyDataDecls #-}
import Prelude
import FFI
liftM2 f m1 m2 = do { x1 <- m1; x2 <- m2; return (f x1 x2) }
getVal1 :: Maybe String
getVal1 = Just "hello"
getVal2 :: Maybe String
getVal2 = Just "World"
main = process >>= putStrLn
process :: Fay String
process = case liftM2 operation getVal1 getVal2 of
Nothing -> error "can't run operation, one of the params is Nothing"
Just result -> result
operation :: String -> String -> Fay String
operation a b = return $ a ++ b
The compile error is:
fay: ghc:
TestFay.hs:17:33:
Couldn't match expected type `Fay String'
with actual type `Maybe String'
In the second argument of `liftM2', namely `getVal1'
In the expression: liftM2 operation getVal1 getVal2
In the expression:
case liftM2 operation getVal1 getVal2 of {
Nothing
-> error "can't run operation, one of the params is Nothing"
I'm not exactly following the problem here. Actually I even tried to remove the import for Control.Monad in the GHC code and paste the liftM2 as in the Fay code, but it still typechecks properly... Any option of using such functions such as liftMx in Fay, or am I missing something completely here?
This is Fay 0.16.0.3... Maybe I should try upgrading to 0.17?
I suspect that do notation in Fay works for the Fay monad only, because AFAIK Fay does not support type classes. Looking at the Fay Prelude, I see that (>>=) and return are monomorphic, specialized to the Fay monad.

Unwrapping a monad

Given the below program, I am having issues dealing with monads.
module Main
where
import System.Environment
import System.Directory
import System.IO
import Text.CSV
--------------------------------------------------
exister :: String -> IO Bool
exister path = do
fileexist <- doesFileExist path
direxist <- doesDirectoryExist path
return (fileexist || direxist )
--------------------------------------------------
slurp :: String -> IO String
slurp path = do
withFile path ReadMode (\handle -> do
contents <- hGetContents handle
last contents `seq` return contents )
--------------------------------------------------
main :: IO ()
main = do
[csv_filename] <- getArgs
putStrLn (show csv_filename)
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
printCSV csv_data -- unable to compile.
csv_data is an Either (parseerror) CSV type, and printCSV takes only CSV data.
Here's the ediff between the working version and the broken version.
***************
*** 27,30 ****
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
! printCSV csv_data -- unable to compile.
\ No newline at end of file
--- 27,35 ----
csv_raw <- slurp csv_filename
let csv_data = parseCSV csv_filename csv_raw
! case csv_data of
! Left error -> putStrLn $ show error
! Right csv_data -> putStrLn $ printCSV csv_data
!
! putStrLn "done"
!
reference: http://hackage.haskell.org/packages/archive/csv/0.1.2/doc/html/Text-CSV.html
Regarding monads:
Yes, Either a is a monad. So simplifying the problem, you are basically asking for this:
main = print $ magicMonadUnwrap v
v :: Either String Int
v = Right 3
magicMonadUnwrap :: (Monad m) => m a -> a
magicMonadUnwrap = undefined
How do you define magicMonadUnwrap? Well, you see, it's different for each monad. Each one needs its own unwrapper. Many of these have the word "run" in them, for example, runST, runCont, or runEval. However, for some monads, it might not be safe to unwrap them (hence the need for differing unwrappers).
One implementation for lists would be head. But what if the list is empty? An unwrapper for Maybe is fromJust, but what if it's Nothing?
Similarly, the unwrapper for the Either monad would be something like:
fromRight :: Either a b -> b
fromRight (Right x) = x
But this unwrapper isn't safe: what if you had a Left value instead? (Left usually represents an error state, in your case, a parse error). So the best way to act upon an Either value it is to use the either function, or else use a case statement matching Right and Left, as Daniel Wagner illustrated.
tl;dr: there is no magicMonadUnwrap. If you're inside that same monad, you can use <-, but to truly extract the value from a monad...well...how you do it depends on which monad you're dealing with.
Use case.
main = do
...
case csv_data of
Left err -> {- whatever you're going to do with an error -- print it, throw it as an exception, etc. -}
Right csv -> printCSV csv
The either function is shorter (syntax-wise), but boils down to the same thing.
main = do
...
either ({- error condition function -}) printCSV csv_data
You must unlearn what you have learned.
Master Yoda.
Instead of thinking about, or searching for ways to "free", "liberate", "release", "unwrap" or "extract" normal Haskell values from effect-centric (usually monadic) contexts, learn how to use one of Haskell's more distinctive features - functions are first-class values:
you can use functions like values of other types e.g. like Bool, Char, Int, Integer etc:
arithOps :: [(String, Int -> Int -> Int)]
arithOps = zip ["PLUS","MINUS", "MULT", "QUOT", "REM"]
[(+), (-), (*), quot, rem]
For your purposes, what's more important is that functions can also be used as arguments e.g:
map :: (a -> b) -> [a] -> [b]
map f xs = [ f x | x <- xs ]
filter :: (a -> Bool) -> [a] -> [a]
filter p xs = [ x | x <- xs, p x ]
These higher-order functions are even available for use in effect-bearing contexts e.g:
import Control.Monad
liftM :: Monad m => (a -> b) -> (m a -> m b)
liftM2 :: Monad m => (a -> b -> c) -> (m a -> m b -> m c)
liftM3 :: Monad m => (a -> b -> c -> d) -> (m a -> m b -> m c -> m d)
...etc, which you can use to lift your regular Haskell functions:
do .
.
.
val <- liftM3 calculate this_M that_M other_M
.
.
.
Of course, the direct approach also works:
do .
.
.
x <- this_M
y <- that_M
z <- other_M
let val = calculate x y z
.
.
.
As your skills develop, you'll find yourself delegating more and more code to ordinary functions and leaving the effects to a vanishingly-small set of entities defined in terms of functors, applicatives, monads, arrows, etc as you progress towards Haskell mastery.
You're not convinced? Well, here's a brief note of how effects used to be handled in Haskell - there's also a longer description of how Haskell arrived at the monadic interface. Alternately, you could look at Standard ML, OCaml, and other similar languages - who knows, maybe you'll be happier with using them...

Resources