Option.Applicative: How to parse a combined parser with a flag? - haskell

I have complicated command line options, as
data Arguments = Arguments Bool (Maybe SubArguments)
data SubArguments = SubArguments String String
I want to parse these subarguments with a flag:
programName --someflag --subarguments "a" "b"
programName --someflag
I already have
subArgParser = SubArguments <$> argument str <*> argument str
mainParser = MainArgs <$> switch
(long "someflag"
<> help "Some argument flag")
<*> ???
(long "subarguments"
<> help "Sub arguments"
What do I have to write at the ???

Your question turned out to be more complicated than you think. Current optparse-applicative API is not supposed to be used with such cases. So you probably may want to change the way you handle CLI arguments or switch to another CLI parsing library. But I will describe most closest way of achieving your goal.
First, you need to read other two SO questions:
1. How to parse Maybe with optparse-applicative
2. Is it possible to have a optparse-applicative option with several parameters?
From first question you know how to parse optional arguments using optional function. From second you learn some problems with parsing multiple arguments. So I will write here several approaches how you can workaround this problem.
1. Naive and ugly
You can represent pair of strings as pair of String type and use just naive show of this pair. Here is code:
mainParser :: Parser Arguments
mainParser = Arguments
<$> switch (long "someflag" <> help "Some argument flag")
<*> optional (uncurry SubArguments <$>
(option auto $ long "subarguments" <> help "some desc"))
getArguments :: IO Arguments
getArguments = do
(res, ()) <- simpleOptions "main example" "" "desc" mainParser empty
return res
main :: IO ()
main = getArguments >>= print
Here is result in ghci:
ghci> :run main --someflag --subarguments "(\"a\",\"b\")"
Arguments True (Just (SubArguments "a" "b"))
2. Less naive
From answer to second question you should learn how pass multiple arguments inside one string. Here is code for parsing:
subArgParser :: ReadM SubArguments
subArgParser = do
input <- str
-- no error checking, don't actually do this
let [a,b] = words input
pure $ SubArguments a b
mainParser :: Parser Arguments
mainParser = Arguments
<$> switch (long "someflag" <> help "Some argument flag")
<*> optional (option subArgParser $ long "subarguments" <> help "some desc")
And here is ghci output:
ghci> :run main --someflag --subarguments "x yyy"
Arguments True (Just (SubArguments "x" "yyy"))
The only bad thing in second solution is that error checking is absent. Thus you can use another general purpose parsing library, for example megaparsec, instead of just let [a,b] = words input.

It's not possibile, at least not directly. You might find some indirect encoding that works for you, but I'm not sure. Options take arguments, not subparsers. You can have subparsers, but they are introduced by a "command", not an option (i.e. without the leading --).

Related

Optparse-applicative: consecutive parsing (ReadM)

I have a basic command add that takes 2 kind of arguments: a word or a tag. A tag is just a word starting by +. A word is just a String. It can contain at least one argument (I use some for this).
data Arg = Add AddOpts
data AddOpts = AddOpts
{ desc :: String,
tags :: [String]
}
deriving (Show)
addCommand :: Mod CommandFields Arg
addCommand = command "add" (info parser infoMod)
where
infoMod = progDesc "Add a new task"
parser = Add <$> parseDescAndTags <$> partition isTag <$> some (argument str (metavar "DESC"))
parseDescAndTags (_, []) = FAIL HERE
parseDescAndTags (tags, desc) = AddOpts (unwords desc) (map tail tags)
I want to add another rule: the add command should receive at least one word (but 0 or more tags). For this, I need to check after the first parsing the word list. If it's empty, I would like to fail as if the add commands received no argument, but I can't figure out how to do.
parseDescAndTags is currently a pure function, so there’s no way for it to cause parsing to fail. Just to get this out of the way, I should also note that in this code:
Add <$> parseDescAndTags <$> partition isTag <$> some (argument str (metavar "DESC"))
The operator <$> is declared infixl 4, so it’s left-associative, and your expression is therefore equivalent to:
((Add <$> parseDescAndTags) <$> partition isTag) <$> some (argument str (metavar "DESC"))
You happen to be using <$> in the “function reader” functor, (->) a, which is equivalent to composition (.):
Add . parseDescAndTags . partition isTag <$> some (argument str (metavar "DESC"))
If you want to use ReadM, you need to use functions such as eitherReader to construct a ReadM action. But the problem is that you would need to use it as the first argument to argument instead of the str reader, and that’s the wrong place for it, since some is on the outside and you want to fail parsing based on the accumulated results of the whole option.
Unfortunately that kind of context-sensitive parsing is not what optparse-applicative is designed for; it doesn’t offer a Monad instance for parsers.
Currently, your parser allows tags and descriptions to be interleaved, like this (supposing isTag = (== ".") . take 1 for illustration):
add some .tag1 description .tag2 text
Producing "some description text" for the description and [".tag1", ".tag2"] as the tags. Is that what you want, or can you use a simpler format instead, like requiring all tags at the end?
add some description text .tag1 .tag2
If so, the result is simple: parse at least one non-tag with some, then any number of tags with many:
addCommand :: Mod CommandFields Arg
addCommand = command "add" (info parser infoMod)
where
infoMod = progDesc "Add a new task"
parser = Add <$> addOpts
addOpts = AddOpts
<$> (unwords <$> some (argument nonTag (metavar "DESC")))
<*> many (argument tag (metavar "TAG"))
nonTag = eitherReader
$ \ str -> if isTag str
then Left ("unexpected tag: '" <> str <> "'")
else Right str
tag = eitherReader
$ \ str -> if isTag str
then Right $ drop 1 str
else Left ("not a tag: '" <> str <> "'")
As an alternative, you can parse command-line options with optparse-applicative, but do any more complex validation on your options records after running the parser. Then if you want to print the help text manually, you can use:
printHelp :: ParserPrefs -> ParserInfo a -> IO a
printHelp parserPrefs parserInfo = handleParseResult $ Failure
$ parserFailure parserPrefs parserInfo ShowHelpText mempty

Get parameters with value

I would like to get the value of parameters from the input.
I have a program that read input, and I want to be able to get value like: ./test --params value and in my program get the value as an INT.
main = do
args <- getArgs
print args
Without using getOpt()
Thanks
Without using getOpt() Thanks
Huh. You're bound to reimplement at least parts of it, probably badly, but sure, let's assume your code is actually so special that it needs a hand-written arg parsing code.
Since getArgs :: IO [String], we already have the input tokenized by spaces, which is neat. However, in your case you want specifically --params value, and obtain value by Int.
There are numerous problems to solve here:
there might not be --params in the list at all
or there might be multiple instances of it
it might have no following token
or the following token might be another --otherparam
or the following token might not parse as Int
All of the above (and more) are possible to happen, because the input is completely unsanitized.
Solving all of the cases brings us back to using getOpt, so let's assume that there's exactly one --params in the list, and that it's followed by something that parses as an Int.
import System.Environment (getArgs)
main = do
args <- getArgs
let intArg = (read :: String -> Int) . head . tail . dropWhile (/= "--params") $ args
print intArg
If any of those assumptions is broken, this code will fail in numerous ways. Each of the problems requires a careful decision about a failure path. You might want to abort execution, provide a default value, you might want to use exceptions or a Maybe access API. Ultimately, you'll figure out that this is a solved problem and simply use getOpt:
import System.Console.GetOpt
import Data.Maybe (fromMaybe)
import System.Environment (getArgs)
data Arg = Params Int deriving Show
params :: String -> Arg
params = Params . read
options :: [OptDescr Arg]
options = [ Option ['p'] ["params"] (ReqArg params "VALUE") "Pass your params"]
main = do
argv <- getArgs
case getOpt Permute options argv of
(o,_no,[]) -> print o
(_,_,errs) -> ioError (userError (concat errs ++ usageInfo header options))
where header = "Usage:"
You can do this using the following function:
getParam :: [String] -> Maybe String
getParam [] = Nothing
getParam ("--param":next:_) = Just next
getParam (_:xs) = getParam xs
And you can use it as follows:
main = do
args <- getArgs
let param = getParam args
print param
If you’re interested in the details, getParam works by recursion:
The first line is a type signature stating that getParam takes a list of strings as its only argument, and it returns either a string or nothing (that’s what Maybe String means).
The second line states that if there are no arguments, it returns nothing.
The third line states that if the first argument is --param, match the next argument (by assigning it to the identifier next) and return it (albeit wrapped in Just; look up the ‘Maybe data type’ if you want to know more).
The fourth line states that if neither of the previous cases have matched, discard the first item in the list and try again.
There is one slight problem with this implementation of getParam: it returns a String, but you want an Int. You can fix this by using the read function, which can be used to convert a String to many other types, including Int. You could insert read in two places in the program: you could either replace Just next by Just (read next) (to get getParam to return an Int), or you could replace getParam args by read (getParam args) (to get getParam to return an String, and then convert that to an Int outside getParam).

optparse-applicative option with multiple values

I'm using optparse-applicative and I'd like to be able to parse command line arguments such as:
$ ./program -a file1 file2 -b filea fileb
i.e., two switches, both of which can take multiple arguments.
So I have a data type for my options which looks like this:
data MyOptions = MyOptions {
aFiles :: [String]
, bFiles :: [String] }
And then a Parser like this:
config :: Parser MyOptions
config = MyOptions
<$> option (str >>= parseStringList)
( short 'a' <> long "aFiles" )
<*> option (str >>= parseStringList)
( short 'b' <> long "bFiles" )
parseStringList :: Monad m => String -> m [String]
parseStringList = return . words
This approach fails in that it will give the expected result when just one argument is supplied for each switch, but if you supply a second argument you get "Invalid argument" for that second argument.
I wondered if I could kludge it by pretending that I wanted four options: a boolean switch (i.e. -a); a list of strings; another boolean switch (i.e. -b); and another list of strings. So I changed my data type:
data MyOptions = MyOptions {
isA :: Bool
, aFiles :: [String]
, isB :: Bool
, bFiles :: [String] }
And then modified the parser like this:
config :: Parser MyOptions
config = MyOptions
<$> switch
( short 'a' <> long "aFiles" )
<*> many (argument str (metavar "FILE"))
<*> switch
( short 'b' <> long "bFiles" )
<*> many (argument str (metavar "FILE"))
This time using the many and argument combinators instead of an explicit parser for a string list.
But now the first many (argument str (metavar "FILE")) consumes all of the arguments, including those following the -b switch.
So how can I write this arguments parser?
Aside from commands, optparse-applicative follows the getopts convention: a single argument on the command line corresponds to a single option argument. It's even a little bit more strict, since getopts will allow multiple options with the same switch:
./program-with-getopts -i input1 -i input2 -i input3
So there's no "magic" that can help you immediately to use your program like
./program-with-magic -a 1 2 3 -b foo bar crux
since Options.Applicative.Parser wasn't written with this in mind; it also contradicts the POSIX conventions, where options take either one argument or none.
However, you can tackle this problem from two sides: either use -a several times, as you would in getopts, or tell the user to use quotes:
./program-as-above -a "1 2 3" -b "foo bar crux"
# works already with your program!
To enable the multiple use of an option you have to use many (if they're optional) or some (if they aren't). You can even combine both variants:
multiString desc = concat <$> some single
where single = option (str >>= parseStringList) desc
config :: Parser MyOptions
config = MyOptions
<$> multiString (short 'a' <> long "aFiles" <> help "Use quotes/multiple")
<*> multiString (short 'b' <> long "bFiles" <> help "Use quotes/multiple")
which enables you to use
./program-with-posix-style -a 1 -a "2 3" -b foo -b "foo bar"
But your proposed style isn't supported by any parsing library I know, since the position of free arguments would be ambiguous. If you really want to use -a 1 2 3 -b foo bar crux, you have to parse the arguments yourself.

attoparsec: "nested" parsers -- parse a subset of the input with a different parser

Well in fact I'm pretty sure I'm using the wrong terminology. Here is the problem I want to solve: a parser for the markdown format, well a subset of it.
My problem is with the blockquote feature. Each line in a blockquote starts with >; otherwise everything is the normal structure in a markdown file.
You can't look at individual lines separately, because you need to separate paragraphs from normal lines, eg
> a
> b
is not the same as
> a
>
> b
and things like that (same if a list is blockquoted you don't want x lists but one list with x elements). A natural and trivial way is to "take off" the > signs, parse the blockquote on its own, ignoring anything around it, wrap it with a BlockQuote type constructor, put that in the outer AST and resume parsing of the original input. It's what pango does if I'm not wrong:
https://hackage.haskell.org/package/pandoc-1.14.0.4/docs/src/Text-Pandoc-Readers-Markdown.html#blockQuote
blockQuote :: MarkdownParser (F Blocks)
blockQuote = do
raw <- emailBlockQuote
-- parse the extracted block, which may contain various block elements:
contents <- parseFromString parseBlocks $ (intercalate "\n" raw) ++ "\n\n"
return $ B.blockQuote <$> contents
And then:
http://hackage.haskell.org/package/pandoc-1.5.1/docs/src/Text-Pandoc-Shared.html#parseFromString
-- | Parse contents of 'str' using 'parser' and return result.
parseFromString :: GenParser tok st a -> [tok] -> GenParser tok st a
parseFromString parser str = do
oldPos <- getPosition
oldInput <- getInput
setInput str
result <- parser
setInput oldInput
setPosition oldPos
return result
Now parseFromString looks quite hacky to me and besides that it's also Parsec not attoparsec so I can't use it in my project. I'm not sure how I could take that Text from the blockquote and parse it and return the parsing result so that it "fits" within the current parsing. Seems impossible?
I've been googling on the issue and I think that pipes-parse and conduit can help on that area although I struggle to find examples and what I see appears considerably less nice to look at than "pure" parsec/attoparsec parsers.
Other options to parse blockquotes would be to rewrite the usual parsers but with the > catch... Complicating and duplicating a lot. Parsing blockquotes counting each line separately and writing some messy "merge" function. Or parsing to a first AST that would contain the blockquotes as Text inside a first BlockquoteText type constructor waiting for a transformation where they would be parsed separately, not very elegant but it has the benefit of simplicity, which does count for something.
I would probably go for the latter, but surely there's a better way?
I have asked myself the same question. Why is there no standard combinator for nested parsers like how you describe? My default mode is to trust the package author, especially when that author also co-wrote "Real World Haskell". If such an obvious capability is missing, perhaps it is by design and I should look for a better way. However, I've managed to convince myself that such a convenient combinator is mostly harmless. Useful whenever an all-or-nothing type parser is appropriate for the inner parse.
Implementation
import Data.Attoparsec.Text
import qualified Data.Text as T
import Data.Text(Text)
import Control.Applicative
I've divided the required functionality into two parsers. The first, constP, performs an "in place" parse of some given text. It substitutes the constant parser's fail with empty (from Alternative), but otherwise has no other side effects.
constP :: Parser a -> Text -> Parser a
constP p t = case parseOnly p t of
Left _ -> empty
Right a -> return a
The second part comes from parseOf, which performs the constant, inner parse based on the result of the outer parse. The empty alternative here allows a failed parse to return without consuming any input.
parseOf :: Parser Text -> Parser a -> Parser a
parseOf ptxt pa = bothParse <|> empty
where
bothParse = ptxt >>= constP pa
The block quote markdown can be written in the desired fashion. This implementation requires the resulting block to be totally parsed.
blockQuoteMarkdown :: Parser [[Double]]
blockQuoteMarkdown = parseOf blockQuote ( markdownSurrogate <*
endOfInput
)
Instead of the actual markdown parser, I just implemented a quick parser of space separated doubles. The complication of the parser comes from allowing the last, non-empty line, either end in a new line or not.
markdownSurrogate :: Parser [[Double]]
markdownSurrogate = do
lns <- many (mdLine <* endOfLine)
option lns ((lns ++) . pure <$> mdLine1)
where
mdLine = sepBy double (satisfy (==' '))
mdLine1 = sepBy1 double (satisfy (==' '))
These two parsers are responsible for returning the text internal to block quotes.
blockQuote :: Parser Text
blockQuote = T.unlines <$> many blockLine
blockLine :: Parser Text
blockLine = char '>' *> takeTill isEndOfLine <* endOfLine
Finally, a test of the parser.
parseMain :: IO ()
parseMain = do
putStrLn ""
doParse "a" markdownSurrogate a
doParse "_" markdownSurrogate ""
doParse "b" markdownSurrogate b
doParse "ab" markdownSurrogate ab
doParse "a_b" markdownSurrogate a_b
doParse "badMarkdown x" markdownSurrogate x
doParse "badMarkdown axb" markdownSurrogate axb
putStrLn ""
doParse "BlockQuote ab" blockQuoteMarkdown $ toBlockQuote ab
doParse "BlockQuote a_b" blockQuoteMarkdown $ toBlockQuote a_b
doParse "BlockQuote axb" blockQuoteMarkdown $ toBlockQuote axb
where
a = "7 3 1"
b = "4 4 4"
x = "a b c"
ab = T.unlines [a,b]
a_b = T.unlines [a,"",b]
axb = T.unlines [a,x,b]
doParse desc p str = do
print $ T.concat ["Parsing ",desc,": \"",str,"\""]
let i = parse (p <* endOfInput ) str
print $ feed i ""
toBlockQuote = T.unlines
. map (T.cons '>')
. T.lines
*Main> parseMain
"Parsing a: \"7 3 1\""
Done "" [[7.0,3.0,1.0]]
"Parsing _: \"\""
Done "" []
"Parsing b: \"4 4 4\""
Done "" [[4.0,4.0,4.0]]
"Parsing ab: \"7 3 1\n4 4 4\n\""
Done "" [[7.0,3.0,1.0],[4.0,4.0,4.0]]
"Parsing a_b: \"7 3 1\n\n4 4 4\n\""
Done "" [[7.0,3.0,1.0],[],[4.0,4.0,4.0]]
"Parsing badMarkdown x: \"a b c\""
Fail "a b c" [] "endOfInput"
"Parsing badMarkdown axb: \"7 3 1\na b c\n4 4 4\n\""
Fail "a b c\n4 4 4\n" [] "endOfInput"
"Parsing BlockQuote ab: \">7 3 1\n>4 4 4\n\""
Done "" [[7.0,3.0,1.0],[4.0,4.0,4.0]]
"Parsing BlockQuote a_b: \">7 3 1\n>\n>4 4 4\n\""
Done "" [[7.0,3.0,1.0],[],[4.0,4.0,4.0]]
"Parsing BlockQuote axb: \">7 3 1\n>a b c\n>4 4 4\n\""
Fail ">7 3 1\n>a b c\n>4 4 4\n" [] "Failed reading: empty"
Discussion
The notable difference comes in the semantics of failure. For instance, when parsing axb and blockquoted axb, which are the following two strings, respectively
7 3 1
a b c
4 4 4
and
> 7 3 1
> a b c
> 4 4 4
the markdown parse results in
Fail "a b c\n4 4 4\n" [] "endOfInput"
whereas the quoted results in
Fail ">7 3 1\n>a b c\n>4 4 4\n" [] "Failed reading: empty"
The markdown consumes "7 3 1\n", but this is nowhere reported in the quoted failure. Instead, fail becomes all or nothing.
Likewise, there is no allowance for handling unparsed text in the case of partial success. But I don't see a need for this, given the use case. For example, if a parse looked something like the following
"{ <tok> unhandled }more to parse"
where {} denotes the recognized block quote context, and <tok> is parsed within that inner context. A partial success then would have to lift "unhandled" out of that block quote context and somehow combine it with "more to parse".
I see no general way to do this, but it is allowed through choice of inner parser return type. For instance, by some parser parseOf blockP innP :: Parser (<tok>,Maybe Text). However, if this need arises I would expect that there is a better way to handle the situation than with nested parsers.
There may also be concerns about the loss of attoparsec Partial parsing. That is, the implementation of constP uses parseOnly, which collapses the parse return Fail and Partial into a single Left failure state. In other words, we lose the ability to feed our inner parser with more text as it becomes available. However, note that text to parse is itself the result of an outer parse; it will only be available after enough text has been fed to the outer parse. So this shouldn't be an issue either.

Parsec error - try doesn't seem to work

I'm currently using the Text.Parsec.Expr module to parse a subset of a scripting language.
Basically, there are two kinds of commands in this language: Assignment of the form $var = expr and a Command of the form $var = $array[$index] - there are of course other commands, but this suffices to explain my problem.
I've created a type Command, to represent this, along with corresponding parsers, where expr for the assignment is handled by Parsec's buildExpressionParser.
Now, the problem. First the parsing code:
main = case parse p "" "$c = $a[$b]" of
Left err -> putStrLn . show $ err
Right r -> putStrLn . show $ r
where p = (try assignment <|> command) <* eof -- (1)
The whole code (50 lines) is pasted here: Link (should compile if you've parsec installed)
The problem is, that parsing fails, since assignment doesn't successfully parse, even though there is a try before. Reversing the parsing order (try command <|> assignment) solves the problem, but is not possible in my case.
Of course I tried to locate the problem further and it appears to me, that the problem is the expression parser (build by buildExpressionParser), since parsing succeeds if I say expr = fail "". However I can't find anything in the Parsec sources that would explain this behaviour.
You parser fails because in fact assigment does succeeds here consuming $c = $a (try it with plain where p = assignment). Then there is supposed to be eof (or the rest of expr from assigment) hence the error. It seems that the beggining of your 'command' is identical to your 'assignment' in the case when 'assignment''s argument is just a var (like $c = $a).
Not sure why you can't reverse command and assignment but another way to make this particular example work would be:
main = case parse p "" "$c = $a[$b]" of
Left err -> putStrLn . show $ err
Right r -> putStrLn . show $ r
where p = try (assignment <* eof) <|> (command <* eof)

Resources