Errorbundles after parsing with megaparsec - haskell

I currently have a working parser in megaparsec, where I build an AST for my program. I now want to do some weeding operations on my AST, while being able to use the same kind of pretty errors as the parser. While this stage is after parsing, I'm wondering if there are general practices for megaparsec in doing so. Is there a way for me to extract every line and comment (used in the bundle) and add it to each item in my AST? Is there any other way that people tackle this problem?
Apologies in advance if this sounds open ended, but I'm mainly wondering is there are some better ideas than getting the line numbers and creating bundles myself. I'm still new to haskell so I haven't been able to navigate properly through all the source code.

This was answered by the megaparsec developer here.
To summarize, parsers have a getOffset function that returns the current char index. You can use that along with an initial PosState to create an error bundle which you can later pretty print.
I have a sample project within the github thread, and pasted again here:
module TestParser where
import Data.List.NonEmpty as NonEmpty
import qualified Data.Maybe as Maybe
import qualified Data.Set as Set
import Data.Void
import Parser
import Text.Megaparsec
data Sample
= Test Int
String
| TestBlock [Sample]
| TestBlank
deriving (Show, Eq)
sampleParser :: Parser Sample
sampleParser = do
l <- many testParser
return $ f l
where
f [] = TestBlank
f [s] = s
f p = TestBlock p
testParser :: Parser Sample
testParser = do
offset <- getOffset
test <- symbol "test"
return $ Test offset test
fullTestParser :: Parser Sample
fullTestParser = baseParser testParser
testParse :: String -> Maybe (ParseErrorBundle String Void)
testParse input =
case parse (baseParser sampleParser) "" input of
Left e -> Just e
Right x -> do
(offset, msg) <- testVerify x
let initialState =
PosState
{ pstateInput = input
, pstateOffset = 0
, pstateSourcePos = initialPos ""
, pstateTabWidth = defaultTabWidth
, pstateLinePrefix = ""
}
let errorBundle =
ParseErrorBundle
{ bundleErrors = NonEmpty.fromList [TrivialError offset Nothing Set.empty]
-- ^ A collection of 'ParseError's that is sorted by parse error offsets
, bundlePosState = initialState
-- ^ State that is used for line\/column calculation
}
return errorBundle
-- Sample verify; throw an error on the second test key
testVerify :: Sample -> Maybe (Int, String)
testVerify tree =
case tree of
TestBlock [_, Test a _, _] -> Just (a, "Bad")
_ -> Nothing
testMain :: IO ()
testMain = do
testExample "test test test"
putStrLn "Done"
testExample :: String -> IO ()
testExample input =
case testParse input of
Just error -> putStrLn (errorBundlePretty error)
Nothing -> putStrLn "pass"
Some parts are from other files, but the important parts are in the code.

Related

How can I write a pattern quasi quoter in Haskell?

I use quasi quoters to create my smart-constructed data types at compile time. This looks something like:
import qualified Data.Text as T
import Language.Haskell.TH.Quote (QuasiQuoter(..))
import Language.Haskell.TH (Q, Exp, Pat(..), Lit(..))
import Language.Haskell.TH.Syntax (Lift(..))
import qualified Language.Haskell.TH.Syntax as TH
import Instances.TH.Lift () -- th-lift-instances package
newtype NonEmptyText = NonEmptyText Text
textIsWhitespace :: Text -> Bool
textIsWhitespace = T.all (== ' ')
mkNonEmptyText :: Text -> Maybe NonEmptyText
mkNonEmptyText t = if textIsWhitespace t then Nothing else (Just (NonEmptyText t))
compileNonEmptyText :: QuasiQuoter
compileNonEmptyText = QuasiQuoter
{ quoteExp = compileNonEmptyText'
, quotePat = error "NonEmptyText is not supported as a pattern"
, quoteDec = error "NonEmptyText is not supported at top-level"
, quoteType = error "NonEmptyText is not supported as a type"
}
where
compileNonEmptyText' :: String -> Q Exp
compileNonEmptyText' s = case mkNonEmptyText (pack s) of
Nothing -> fail $ "Invalid NonEmptyText: " ++ s
Just txt -> [| txt |]
(I can provide a standalone working example if necessary—I just pulled this example out of a larger codebase)
Essentially, by just deriving Lift for my newtypes, I can place the data type in an expression quasi quoter [| txt |] to implement quoteExp.
But I'm having trouble with quotePat. If I do e.g.:
Just txt -> [p| txt |]
Then I get a warning that the first txt is unused, and the second shadows the first. I'm pretty sure that that pattern is just creating a new name txt rather than splicing in the in-scope txt like the expression quasi quoter did, since when I do:
f :: NonEmptyText -> Bool
f [compileNonEmptyText|test|] = True
f _ = False
everything matches the first statement.
Alright I think I've got it. Starting from the base string s, I can wrap that in StringL and LitP to get a literal string, which because of Text's IsString instance will become a Text. From there I need to apply the NonEmptyText constructor using ConP:
compileNonEmptyTextPattern' :: String -> Q TH.Pat
compileNonEmptyTextPattern' s = case mkNonEmptyText (pack s) of
Nothing -> fail $ "Invalid NonEmptyText: " ++ s
Just (NonEmptyText txt) -> pure $ ConP 'NonEmptyText [(LitP (StringL (T.unpack txt)))]
It's unfortunate that this is so much more verbose than the expression version, though! I wonder if there could be a typeclass for Q Pat like Lift is for Q Exp?

How to efficiently work with the nested monads I get from SQLite3 and HDBC

I have to admit that I am still not "there" yet when it comes to working efficiently with monads, so please forgive me if this is an easy question. I also have to apologize for not supplying working code, as this questions more related to a concept than an actual implementation that I am currently working on.
I'm working against an SQLite(3) database, and of course would like to send queries to it and get results back. Being already in IO, the fetchAllRows function returns an [[SqlValue]] which needs to be converted. With SQLite being very liberal in terms of what is text and what is floating point values (and Haskell not being liberal at all when it comes to types), safe conversion using safeFromSql seems appropriate. Now, if you manage to do all this in one function you would end up with that function being
myfunc :: String -> [SqlValue] -> IO [[ Either ConvertError a]]
or something like that, right? It seems to me that working with that structures of nested monads may be common enough (and cumbersome enough) for there to be a standard way of making it easier to work with that I am not aware of?
The issue is, it seems, only solved by some specific functions, and then most clearly in do syntax. The functions below solve the issue within the direct-sqlite3 package access to SQLite database (and also inserts a REGEXP handler).
import Text.Regex.Base.RegexLike
import qualified Text.Regex.PCRE.ByteString as PCRE
import qualified Data.ByteString as BS
import Data.Text (pack,Text,append)
import Data.Text.Encoding (encodeUtf8)
import Data.Int (Int64)
import Database.SQLite3
pcreRegex :: BS.ByteString -> BS.ByteString -> IO Int64
pcreRegex reg b = do
reC <- pcreCompile reg
re <- case reC of
(Right r) -> return r
(Left (off,e) ) -> fail e
reE <- PCRE.execute re b
case reE of
(Right (Just _)) -> return (1 :: Int64)
(Right (Nothing)) -> return (0 :: Int64)
(Left (c,e)) -> fail e -- Not a good idea maybe, but I have no use for error messages.
where pcreCompile = PCRE.compile defaultCompOpt defaultExecOpt
sqlRegexp :: FuncContext -> FuncArgs -> IO ()
sqlRegexp ctx args = do
r <- fmap encodeUtf8 $ funcArgText args 0
t <- fmap encodeUtf8 $ funcArgText args 1
res <- pcreRegex r t
funcResultInt64 ctx res
getRows :: Statement -> [Maybe ColumnType] -> [[SQLData]] -> IO [[SQLData]]
getRows stmt coltypes acc = do
r <- step stmt
case r of
Done -> return acc
Row -> do
out <- typedColumns stmt coltypes
getRows stmt coltypes (out:acc)
runQuery q args columntypes dbFile = do
conn <- open $ pack dbFile
createFunction conn "regexp" (Just 2) True sqlRegexp
statement <- prepare conn q
bind statement args
res <- fmap reverse $ getRows statement (fmap Just columntypes) [[]]
Database.SQLite3.finalize statement
deleteFunction conn "regexp" (Just 2)
close conn
return $ res
Hope this helps someone out here.

Parsec: error message at specific location

Using Parsec how does one indicate an error at a specific position if a semantic rule is violated. I know typically we don't want to do such things, but consider the example grammar.
<foo> ::= <bar> | ...
<bar> ::= a positive integer power of two
The <bar> rule is a finite set (my example is arbitrary), and a pure approach to the above could be a careful application of the choice combinator, but this might be impractical in space and time. In recursive descent or toolkit-generated parsers the standard trick is to parse an integer (a more relaxed grammar) and then semantically check the harder constraints. For Parsec, I could use a natural parser and check the result calling fail when that doesn't match or unexpected or whatever. But if we do that, the default error location is the wrong one. Somehow I need to raise the error at the earlier state.
I tried a brute force solution and wrote a combinator that uses getPosition and setPosition as illustrated by this very similar question. Of course, I was also unsuccessful (the error location is, of course wrong). I've run into this pattern many times. I am kind of looking for this type of combinator:
withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred lbl p = do
ok <- lookAhead $ fmap pred (try p) <|> return False -- peek ahead
if ok then p -- consume the input if the value passed the predicate
else fail lbl -- otherwise raise the error at the *start* of this token
pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])
The above does not work. (I tried variants on this as well.) Somehow the parser backtracks a says it's expecting a digit. I assume it's returning the error that made it the furthest. Even {get,set}ParserState fails erase that memory.
Am I handling this syntactic pattern wrong? How would all you Parsec users approach these type of problems?
Thanks!
I think both your ideas are OK. The other two answers deal with Parsec, but I'd like to note that in both
cases Megaparsec just does the right thing:
{-# LANGUAGE TypeApplications #-}
module Main (main) where
import Control.Monad
import Data.Void
import Text.Megaparsec
import qualified Text.Megaparsec.Char.Lexer as L
type Parser = Parsec Void String
withPredicate1 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate1 f msg p = do
r <- lookAhead p
if f r
then p
else fail msg
withPredicate2 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate2 f msg p = do
mpos <- getNextTokenPosition -- †
r <- p
if f r
then return r
else do
forM_ mpos setPosition
fail msg
main :: IO ()
main = do
let msg = "I only like numbers greater than 42!"
parseTest' (withPredicate1 #Integer (> 42) msg L.decimal) "11"
parseTest' (withPredicate2 #Integer (> 42) msg L.decimal) "22"
If I run it:
The next big Haskell project is about to start!
λ> :main
1:1:
|
1 | 11
| ^
I only like numbers greater than 42!
1:1:
|
1 | 22
| ^
I only like numbers greater than 42!
λ>
Try it for yourself! Works as expected.
† getNextTokenPosition is more correct than getPosition for streams where tokens contain position of their beginning and end in themselves. This may or may not be important in your case.
It's not a solution I like, but you can hypnotize Parsec into believing it's had a single failure with consumption:
failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))
Here's a complete example:
import Control.Monad
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Error
import Text.Parsec.Prim
import Debug.Trace
failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))
type P a = Parsec String () a
withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred msg p = do
pos <- getPosition
x <- p
unless (pred x) $ failAt pos msg
return x
natural = read <$> many1 digit
pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])
main = print $ runParser pPowerOfTwo () "myinput" "4095"
When run, it results in:
Left "myinput" (line 1, column 1):
expecting power of two
I think the problem stems from how Parsec picks the "best error" in the non-deterministic setting. See Text.Parsec.Error.mergeError. Specifically, this selects the longest match when choosing which error is the error to report. I think we need some way to make Parsec order errors differently, which may be too obscure for us solving this problem.
In my case, I here's how I worked around the problem:
I solved stacked an Exception monad within my ParsecT type.
type P m = P.ParsecT String ParSt (ExceptT Diagnostic m)
Then I introduced a pair of combinators:
(Note: Loc is my internal location type)
-- stops hard on an error (no backtracking)
-- which is why I say "semantic" instead of "syntax" error
throwSemanticError :: (MonadTrans t, Monad m) => Loc -> String -> t (ExceptT Diagnostic m) a
throwSemanticError loc msg = throwSemanticErrorDiag $! Diagnostic loc msg
withLoc :: Monad m => (Loc -> P m a) -> P m a
withLoc pa = getLoc >>= pa
Now in parsing I can write:
parsePrimeNumber = withLoc $ \loc ->
i <- parseInt
unless (isPrime i) $ throwSemanticError loc "number is not prime!"
return i
The top level interface to run one of these monads is really nasty.
runP :: Monad m
=> ParseOpts
-> P m a
-> String
-> m (ParseResult a)
runP pos pma inp =
case runExceptT (P.runParserT pma (initPSt pos) "" inp) of
mea -> do
ea <- mea
case ea of
-- semantic error (throwSemanticError)
Left err -> return $! PError err
-- regular parse error
Right (Left err) -> return $ PError (errToDiag err)
-- success
Right (Right a) -> return (PSuccess a [])
I'm not terribly happy with this solution and desire something better.
I wish parsec had a:
semanticCheck :: (a -> Parsec Bool) -> Parsec a -> Parsec a
semanticCheck pred p =
a <- p
z <- pred a
unless z $
... somehow raise the error from the beginning of this token/parse
rather than the end ... and when propagating the error up,
use the end parse position, so this parse error beats out other
failed parsers that make it past the beginning of this token
(but not to the end)
return a
Using lookAhead, we can run a parser without consuming any input or registering any new errors, but record the state that we end up in. We can then apply a guard to the result of the parser. The guard can fail in whatever manner it desires if the value does not pass the semantic check. If the guard fails, then the error is located at the initial position. If the guard succeeds, we reset the parser to the recorded state, avoiding the need to re-execute p.
guardP :: Stream s m t => (a -> ParsecT s u m ()) -> ParsecT s u m a -> ParsecT s u m a
guardP guard p = do
(a, s) <- try . lookAhead $ do
a <- p
s <- getParserState
return (a, s)
guard a
setParserState s
return a
We can now implement pPowerOfTwo:
pPowerOfTwo :: Stream s m Char => ParsecT s u m Integer
pPowerOfTwo = guardP guardPowerOfTwo natural <?> "power of two"
where guardPowerOfTwo s = unless (s `elem` [2^i | i <- [1..20]]) . unexpected $ show s

Using content of a string to call function with same name

I have a main like the following:
main :: IO ()
main = do
args <- getArgs
putStrLn $ functionName args
where
functionName args = "problem" ++ (filter (/= '"') $ show (args!!0))
Instead of putting the name to stdout like I do it right now, I want to call the function.
I am aware of the fact, that I could use hint (as mentioned in Haskell: how to evaluate a String like "1+2") but I think that would be pretty overkill for just getting that simple function name.
At the current stage it does not matter if the program crashes if the function does not exist!
Without taking special measures to preserve them, the names of functions will likely be gone completely in a compiled Haskell program.
I would suggest just making a big top-level map:
import Data.Map ( Map )
import qualified Data.Map as Map
functions :: Map String (IO ())
functions = Map.fromList [("problem1", problem1), ...]
call :: String -> IO ()
call name =
case Map.lookup name of
Nothing -> fail $ name + " not found"
Just m -> m
main :: IO ()
main = do
args <- getArgs
call $ functionName args
where
functionName args = "problem" ++ (filter (/= '"') $ show (args!!0))
If you're going to do this, you have a few approaches, but the easiest by far is to just pattern match on it
This method requires that all of your functions you want to call have the same type signature:
problem1 :: Int
problem1 = 1
problem2 :: Int
problem2 = 2
runFunc :: String -> Maybe Int
runFunc "problem1" = Just problem1
runFunc "problem2" = Just problem2
runFunc _ = Nothing
main = do
args <- getArgs
putStrLn $ runFunc $ functionName args
This requires you to add a line to runFunc each time you add a new problemN, but that's pretty manageable.
You can't get a string representation of an identifier, not without fancy non-standard features, because that information isn't retained after compilation. As such, you're going to have to write down those function names as string constants somewhere.
If the function definitions are all in one file anyway, what I would suggest is to use data types and lambdas to avoid having to duplicate those function names altogether:
Data Problem = {
problemName :: String,
evalProblem :: IO () # Or whatever your problem function signatures are
}
problems = [Problem]
problems = [
Problem {
problemName = "problem1",
evalProblem = do ... # Insert code here
},
Problem
problemName = "problem2",
evalProblem = do ... # Insert code here
}
]
main :: IO ()
main = do
args <- getArgs
case find (\x -> problemName x == (args!!0)) problems of
Just x -> evalProblem x
Nothing -> # Handle error
Edit: Just to clarify, I'd say the important takeaway here is that you have an XY Problem.

simple rss downloader in haskell

Yesterday i tried to write a simple rss downloader in Haskell wtih hte help of the Network.HTTP and Feed libraries. I want to download the link from the rss item and name the downloaded file after the title of the item.
Here is my short code:
import Control.Monad
import Control.Applicative
import Network.HTTP
import Text.Feed.Import
import Text.Feed.Query
import Text.Feed.Types
import Data.Maybe
import qualified Data.ByteString as B
import Network.URI (parseURI, uriToString)
getTitleAndUrl :: Item -> (Maybe String, Maybe String)
getTitleAndUrl item = (getItemTitle item, getItemLink item)
downloadUri :: (String,String) -> IO ()
downloadUri (title,link) = do
file <- get link
B.writeFile title file
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)
I reached a state where i got a list which contains tuples, which contains name and the corresponding link. And i have a downloadUri function which properly downloads the given link to a file which has the name of the rss item title.
I already tried to modify downloadUri to work on (Maybe String,Maybe String) with fmap- ing on get and writeFile but failed with it horribly.
How can i apply my downloadUri function to the result of the getTuples function. I want to implement the following main function
main :: IO ()
main = some magic incantation donwloadUri more incantation getTuples
The character encoding of the result of getItemTitle broken, it puts code points in the places of the accented characters. The feed is utf8 encoded, and i thought that all haskell string manipulation functions are defaulted to utf8. How can i fix this?
Edit:
Thanks for you help, i implemented successfully my main and helper functions. Here comes the code:
downloadUri :: (Maybe String,Maybe String) -> IO ()
downloadUri (Just title,Just link) = do
item <- get link
B.writeFile title item
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = print "Somewhere something went Nothing"
getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> decodeString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)
downloadAllItems :: Maybe [(Maybe String, Maybe String)] -> IO ()
downloadAllItems (Just feedlist) = mapM_ downloadUri $ feedlist
downloadAllItems _ = error "feed does not get parsed"
main = getTuples >>= downloadAllItems
The character encoding issue has been partially solved, i put decodeString before the feed parsing, so the files get named properly. But if i want to print it out, the issue still happens. Minimal working example:
main = getTuples
It sounds like it's the Maybes that are giving you trouble. There are many ways to deal with Maybe values, and some useful library functions like fromMaybe and fromJust. However, the simplest way is to do pattern matching on the Maybe value. We can tweak your downloadUri function to work with the Maybe values. Here's an example:
downloadUri :: (Maybe String, Maybe String) -> IO ()
downloadUri (Just title, Just link) = do
file <- get link
B.writeFile title file
where
get url = let uri = case parseURI url of
Nothing -> error $ "invalid uri" ++ url
Just u -> u in
simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = error "One of my parameters was Nothing".
Or maybe you can let the title default to blank, in which case you could insert this just before the last line in the previous example:
downloadUri (Nothing, Just link) = downloadUri (Just "", Just link)
Now the only Maybe you need to work with is the outer one, applied to the array of tuples. Again, we can pattern match. It might be clearest to write a helper function like this:
downloadAllItems (Just ts) = ??? -- hint: try a `mapM`
downloadAllItems Nothing = ??? -- don't do anything, or report an error, or...
As for your encoding issue, my guesses are:
You're reading the information from a file that isn't UTF-8 encoded, or your system doesn't realise that it's UTF-8 encoded.
You are reading the information correctly, but it gets messed up when you output it.
In order to help you with this problem, I need to see a full code example, which shows how you're reading the information and how you output it.
Your main could be something like the shown below. There may be some more concise way to compose these two operations though:
main :: IO ()
main = getTuples >>= process
where
process (Just lst) = foldl (\s v -> do {t <- s; download v}) (return ()) lst
process Nothing = return ()
download (Just t, Just l) = downloadUri (t,l)
download _ = return ()

Resources